Re: partition tree inspection functions
On Fri, Oct 19, 2018 at 01:05:52PM +0900, Amit Langote wrote: > As I said above, the price of removing relhassubclass might be a bit > steep. So, the other alternative I mentioned before is to set > relhassubclass correctly even for indexes if only for pg_partition_tree to > be able to use find_inheritance_children unchanged. Playing devil's advocate a bit more... Another alternative here would be to remove the fast path using relhassubclass from find_inheritance_children and instead have its callers check for it :) Anyway, it seems that you are right here. Just setting relhassubclass for partitioned indexes feels more natural with what's on HEAD now. Even if I'd like to see all those hypothetical columns in pg_class go away, that cannot happen without a close lookup at the performance impact. -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
Hello. Thank you for the comment. At Thu, 4 Oct 2018 04:27:04 +, "Ideriha, Takeshi" wrote in <4E72940DA2BF16479384A86D54D0988A6F1BCB6F@G01JPEXMBKW04> > >As a *PoC*, in the attached patch (which applies to current master), size of > >CTups are > >counted as the catcache size. > > > >It also provides pg_catcache_size system view just to give a rough idea of > >how such > >view looks. I'll consider more on that but do you have any opinion on this? > > ... > Great! I like this view. > One of the extreme idea would be adding all the members printed by > CatCachePrintStats(), > which is only enabled with -DCATCACHE_STATS at this moment. > All of the members seems too much for customers who tries to change the cache > limit size > But it may be some of the members are useful because for example cc_hits > would indicate that current > cache limit size is too small. The attached introduces four features below. (But the features on relcache and plancache are omitted). 1. syscache stats collector (in 0002) Records syscache status consists of the same columns above and "ageclass" information. We could somehow triggering a stats report with signal but we don't want take/send/write the statistics in signal handler. Instead, it is turned on by setting track_syscache_usage_interval to a positive number in milliseconds. 2. pg_stat_syscache view. (in 0002) This view shows catcache statistics. Statistics is taken only on the backends where syscache tracking is active. > pid | application_name |relname |cache_name > | size |ageclass | nentries > --+--++---+--+-+--- > 9984 | psql | pg_statistic | pg_statistic_relid_att_inh_index > | 12676096 | {30,60,600,1200,1800,0} | {17660,17310,55870,0,0,0} Age class is the basis of catcache truncation mechanism and shows the distribution based on elapsed time since last access. As I didn't came up an appropriate way, it is represented as two arrays. Ageclass stores maximum age for each class in seconds. Nentries holds entry numbers correnponding to the same element in ageclass. In the above example, age class : # of entries in the cache up to 30s : 17660 up to 60s : 17310 up to 600s : 55870 up to 1200s : 0 up to 1800s : 0 more longer : 0 The ageclass is {0, 0.05, 0.1, 1, 2, 3}th multiples of cache_prune_min_age on the backend. 3. non-transactional GUC setting (in 0003) It allows setting GUC variable set by the action GUC_ACTION_NONXACT(the name requires condieration) survive beyond rollback. It is required by remote guc setting to work sanely. Without the feature a remote-set value within a trasction will disappear involved in rollback. The only local interface for the NONXACT action is set_config(name, value, is_local=false, is_nonxact = true). pg_set_backend_guc() below works on this feature. 4. pg_set_backend_guc() function. Of course syscache statistics recording consumes significant amount of time so it cannot be turned on usually. On the other hand since this feature is turned on by GUC, it is needed to grab the active client connection to turn on/off the feature(but we cannot). Instead, I provided a means to change GUC variables in another backend. pg_set_backend_guc(pid, name, value) sets the GUC variable "name" on the backend "pid" to "value". With the above tools, we can inspect catcache statistics of seemingly bloated process. A. Find a bloated process pid using ps or something. B. Turn on syscache stats on the process. =# select pg_set_backend_guc(9984, 'track_syscache_usage_interval', '1'); C. Examine the statitics. =# select pid, relname, cache_name, size from pg_stat_syscache order by size desc limit 3; pid | relname|cache_name| size --+--+--+-- 9984 | pg_statistic | pg_statistic_relid_att_inh_index | 32154112 9984 | pg_cast | pg_cast_source_target_index | 4096 9984 | pg_operator | pg_operator_oprname_l_r_n_index | 4096 =# select * from pg_stat_syscache where cache_name = 'pg_statistic_relid_att_inh_index'::regclass; -[ RECORD 1 ]- pid | 9984 relname | pg_statistic cache_name | pg_statistic_relid_att_inh_index size| 11026176 ntuples | 77950 searches| 77950 hits| 0 neg_hits| 0 ageclass| {30,60,600,1200,1800,0} nentries| {17630,16950,43370,0,0,0} last_update | 2018-10-17 15:58:19.738164+09 > >> Another option is that users only specify the total memory target size > >> and postgres dynamically change each CatCache memory target size according > >> to a > >certain metric. > >> (, which still seems difficult and expensive to develop per benefit) > >> What do you think about this? > > > >
Problem about partitioned table
Hi everyone, I have a problem with partitioned table in PostgreSql. Actually I use the version 10. I created the partitioned table in test environment but face some problems with partitioned table constraint. I google all about it last week and from the official site I get that version 11 will be released and that feature will be supported as I understand it. >From version 11 documentation "*Add support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on partitioned tables*" I install and configure yesterday as new 11 version released. And test it. Unfortunately I didn't achieve again. Neither I don't understand the new feature nor this case is actually not supported. Please help me about the problem. In my test environment *CASE* is like that (I used the declarative partitioning) I have a *er_doc_to_user_relation* table before. And I partitioned this table by list with column *state*. I have created two partitions as following *CREATE TABLE xx.er_doc_to_user_state_1_3* * PARTITION OF xx.er_doc_to_user_relation (oid,created_date,state,status,updated_date,branch_oid,state_update_date,user_position,* * fk_action_owner,fk_action_owner_org,fk_document,fk_flow,fk_org,fk_prew_doc_user_rel,fk_user,fk_workflow,fk_action_login_type)* * FOR VALUES IN (1,3);* * CREATE TABLE xx.er_doc_to_user_state_2_4_9* * PARTITION OF xx.er_doc_to_user_relation (oid,created_date,state,status,updated_date,branch_oid,state_update_date,user_position,* * fk_action_owner,fk_action_owner_org,fk_document,fk_flow,fk_org,fk_prew_doc_user_rel,fk_user,fk_workflow,fk_action_login_type)* * FOR VALUES IN (2,4,9);* After that I have created constraints and indexes for each partition manually. Everything is OK until here. When I try to create constraint in another table which references *er_doc_to_user_relation* table. Case 1: Try to create foreign key constraint reference to parent table *er_doc_to_user_relation.* * ALTER TABLE xx.er_doc_workflow_action* * ADD CONSTRAINT fk_doc_work_act FOREIGN KEY (fk_to_user_doc_rel)* * REFERENCES xx.er_doc_to_user_relation(oid) MATCH SIMPLE* * ON UPDATE NO ACTION* * ON DELETE NO ACTION;* Following error occurred: *ERROR: cannot reference partitioned table "er_doc_to_user_relation"* * SQL state: 42809* Because it is not supported so I try the second case as following. Case 2: Try to create foreign key constraint reference to each partitioned table separately (*er_doc_to_user_state_1_3, er_doc_to_user_state_2_4_9*). * ALTER TABLE xx.er_doc_workflow_action* * ADD CONSTRAINT fk_doc_work_act_1_3 FOREIGN KEY (fk_to_user_doc_rel)* * REFERENCES xx.er_doc_to_user_state_1_3(oid) MATCH SIMPLE* * ON UPDATE NO ACTION* * ON DELETE NO ACTION;* Following error occurred: * ERROR: insert or update on table "er_doc_workflow_action" violates foreign key constraint "fk_doc_work_act_1_3"* * DETAIL: Key (fk_to_user_doc_rel)=(3hjbzok1mn100g) is not present in table "er_doc_to_user_state_1_3". SQL state: 23503* I think this error is normal because oid which is referenced is in other partitioned table so it can't validate all data. If I try to create foreign key constraint on second partition again same error will be occurred due to same reason. Note: I want to create constraint only one-to-one column (*fk_to_user_doc_rel - oid*) BIG QUESTION IS THAT How can I solve this problem? What is your recommendations? *PLEASE HELP ME !!!* -- *Best Regards,* *Mehman Jafarov* *DBA Aministrator at CyberNet LLC*
Re: Optimze usage of immutable functions as relation
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: not tested Spec compliant: not tested Documentation:not tested Hello, I was trying to review your patch, but I couldn't install it: prepjointree.c: In function ‘pull_up_simple_function’: prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this function); did you mean ‘PGFunction’? Assert(!contain_vars_of_level((Node *) functions, 0)); Was it a typo? The new status of this patch is: Waiting on Author
Re: relhassubclass and partitioned indexes
Thanks for commenting. On 2018/10/19 15:17, Michael Paquier wrote: > On Fri, Oct 19, 2018 at 01:45:03AM -0400, Tom Lane wrote: >> Amit Langote writes: >>> Should relhassubclass be set/reset for partitioned indexes? >> >> Seems like a reasonable idea to me, at least the "set" end of it. >> We don't ever clear relhassubclass for tables, so maybe that's >> not necessary for indexes either. We *do* reset opportunistically during analyze of inheritance parent; the following code in acquire_inherited_sample_rows() can reset it: /* * Check that there's at least one descendant, else fail. This could * happen despite analyze_rel's relhassubclass check, if table once had a * child but no longer does. In that case, we can clear the * relhassubclass field so as not to make the same mistake again later. * (This is safe because we hold ShareUpdateExclusiveLock.) */ if (list_length(tableOIDs) < 2) { /* CCI because we already updated the pg_class row in this command */ CommandCounterIncrement(); SetRelationHasSubclass(RelationGetRelid(onerel), false); ereport(elevel, (errmsg("skipping analyze of \"%s.%s\" inheritance tree --- this inheritance tree contains no child tables", get_namespace_name(RelationGetNamespace(onerel)), RelationGetRelationName(onerel; return 0; } Similarly, perhaps we can make REINDEX on a partitioned index reset the flag if it doesn't find any children. We won't be able to do that today though, because: static void ReindexPartitionedIndex(Relation parentIdx) { ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("REINDEX is not yet implemented for partitioned indexes"))); } > No objections to the proposal. Allowing find_inheritance_children to > find index trees for partitioned indexes could be actually useful for > extensions like pg_partman. Thanks. Attached a patch to set relhassubclass when an index partition is added to a partitioned index. Regards, Amit >From f0c01ab41b35a5f21a90b0294d8216da78eb8882 Mon Sep 17 00:00:00 2001 From: amit Date: Fri, 19 Oct 2018 17:05:00 +0900 Subject: [PATCH 1/2] Set relhassubclass on index parents --- src/backend/catalog/index.c| 5 src/backend/commands/indexcmds.c | 4 +++ src/test/regress/expected/indexing.out | 46 +- src/test/regress/sql/indexing.sql | 11 ++-- 4 files changed, 46 insertions(+), 20 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index f1ef4c319a..62cc6a5bb9 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -996,8 +996,13 @@ index_create(Relation heapRelation, /* update pg_inherits, if needed */ if (OidIsValid(parentIndexRelid)) + { StoreSingleInheritance(indexRelationId, parentIndexRelid, 1); + /* Also, set the parent's relhassubclass. */ + SetRelationHasSubclass(parentIndexRelid, true); + } + /* * Register constraint and dependencies for the index. * diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 3975f62c00..c392352871 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -2608,6 +2608,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid) systable_endscan(scan); relation_close(pg_inherits, RowExclusiveLock); + /* If we added an index partition to parent, set its relhassubclass. */ + if (OidIsValid(parentOid)) + SetRelationHasSubclass(parentOid, true); + if (fix_dependencies) { ObjectAddress partIdx; diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 225f4e9527..710b32192f 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -1,25 +1,35 @@ -- Creating an index on a partitioned table makes the partitions -- automatically get the index create table idxpart (a int, b int, c text) partition by range (a); +-- relhassubclass of a partitioned index is false before creating its partition +-- it will be set below once partitions get created +create index check_relhassubclass_of_this on idxpart (a); +select relhassubclass from pg_class where relname = 'check_relhassubclass_of_this'; + relhassubclass + + f +(1 row) + +drop index check_relhassubclass_of_this; create table idxpart1 partition of idxpart for values from (0) to (10); create table idxpart2 partition of idxpart for values from (10) to (100) partition by range (b); create table idxpart21 partition of idxpart2 for values from (0) to (100); create index on idxpart (a); -select relname, relkind, inhparent::regclass +select relname, relkind, relhassubclass, inhparent::regclass from pg_class left join pg_index ix on (indexrelid =
Re: partition tree inspection functions
On 2018/10/19 16:47, Michael Paquier wrote: > On Fri, Oct 19, 2018 at 01:05:52PM +0900, Amit Langote wrote: >> As I said above, the price of removing relhassubclass might be a bit >> steep. So, the other alternative I mentioned before is to set >> relhassubclass correctly even for indexes if only for pg_partition_tree to >> be able to use find_inheritance_children unchanged. > > Playing devil's advocate a bit more... Another alternative here would > be to remove the fast path using relhassubclass from > find_inheritance_children and instead have its callers check for it :) Yeah, we could make it the responsibility of the callers of find_all_inheritors and find_inheritance_children to check relhassubclass as an optimization and remove any reference to relhassubclass from pg_inherits.c. Although we can write such a patch, it seems like it'd be bigger than the patch to ensure the correct value of relhassubclass for indexes, which I just posted on the other thread [1]. > Anyway, it seems that you are right here. Just setting relhassubclass > for partitioned indexes feels more natural with what's on HEAD now. > Even if I'd like to see all those hypothetical columns in pg_class go > away, that cannot happen without a close lookup at the performance > impact. Okay, I updated the patch on this thread. Since the updated patch depends on the correct value of relhassubclass being set for indexes, this patch should be applied on top of the other patch. I've attached here both. Another change I made is something Robert and Alvaro seem to agree about -- to use regclass instead of oid type as input/output columns. Thanks, Amit [1] https://www.postgresql.org/message-id/85d50b48-1b59-ae6c-e984-dd0b2926be3c%40lab.ntt.co.jp >From f0c01ab41b35a5f21a90b0294d8216da78eb8882 Mon Sep 17 00:00:00 2001 From: amit Date: Fri, 19 Oct 2018 17:05:00 +0900 Subject: [PATCH 1/2] Set relhassubclass on index parents --- src/backend/catalog/index.c| 5 src/backend/commands/indexcmds.c | 4 +++ src/test/regress/expected/indexing.out | 46 +- src/test/regress/sql/indexing.sql | 11 ++-- 4 files changed, 46 insertions(+), 20 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index f1ef4c319a..62cc6a5bb9 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -996,8 +996,13 @@ index_create(Relation heapRelation, /* update pg_inherits, if needed */ if (OidIsValid(parentIndexRelid)) + { StoreSingleInheritance(indexRelationId, parentIndexRelid, 1); + /* Also, set the parent's relhassubclass. */ + SetRelationHasSubclass(parentIndexRelid, true); + } + /* * Register constraint and dependencies for the index. * diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 3975f62c00..c392352871 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -2608,6 +2608,10 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid) systable_endscan(scan); relation_close(pg_inherits, RowExclusiveLock); + /* If we added an index partition to parent, set its relhassubclass. */ + if (OidIsValid(parentOid)) + SetRelationHasSubclass(parentOid, true); + if (fix_dependencies) { ObjectAddress partIdx; diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out index 225f4e9527..710b32192f 100644 --- a/src/test/regress/expected/indexing.out +++ b/src/test/regress/expected/indexing.out @@ -1,25 +1,35 @@ -- Creating an index on a partitioned table makes the partitions -- automatically get the index create table idxpart (a int, b int, c text) partition by range (a); +-- relhassubclass of a partitioned index is false before creating its partition +-- it will be set below once partitions get created +create index check_relhassubclass_of_this on idxpart (a); +select relhassubclass from pg_class where relname = 'check_relhassubclass_of_this'; + relhassubclass + + f +(1 row) + +drop index check_relhassubclass_of_this; create table idxpart1 partition of idxpart for values from (0) to (10); create table idxpart2 partition of idxpart for values from (10) to (100) partition by range (b); create table idxpart21 partition of idxpart2 for values from (0) to (100); create index on idxpart (a); -select relname, relkind, inhparent::regclass +select relname, relkind, relhassubclass, inhparent::regclass from pg_class left join pg_index ix on (indexrelid = oid) left join pg_inherits on (ix.indexrelid = inhrelid) where relname like 'idxpart%' order by relname; - relname | relkind | inhparent --+-+ - idxpart | p | - idxpart1| r | - idxpart1_a_idx | i
Re: Function to promote standby servers
Michael Paquier wrote: > + /* wait for up to a minute for promotion */ > + for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i) > + { > + if (!RecoveryInProgress()) > + PG_RETURN_BOOL(true); > + > + pg_usleep(100L / WAITS_PER_SECOND); > + } > I would recommend to avoid pg_usleep and instead use a WaitLatch() or > similar to generate a wait event. The wait can then also be seen in > pg_stat_activity, which is useful for monitoring purposes. Using > RecoveryInProgress is indeed doable, and that's more simple than what I > thought first. Agreed, done. I have introduced a new wait event, because I couldn't find one that fit. > Something I missed to mention in the previous review: the timeout should > be manually enforceable, with a default at 60s. Ok, added as a new parameter "wait_seconds". > Is the function marked as strict? Per the code it should be, I am not > able to test now though. Yes, it is. > You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in > system_views.sql, or any users could trigger a promotion, no? You are right *blush*. Fixed. Yours, Laurenz Albe From 08951fea7c526450d9a632ef0e6e246dd9dba307 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Fri, 19 Oct 2018 13:24:29 +0200 Subject: [PATCH] Add pg_promote() to promote standby servers --- doc/src/sgml/func.sgml | 21 ++ doc/src/sgml/high-availability.sgml| 2 +- doc/src/sgml/recovery-config.sgml | 3 +- src/backend/access/transam/xlog.c | 6 -- src/backend/access/transam/xlogfuncs.c | 82 ++ src/backend/catalog/system_views.sql | 8 +++ src/backend/postmaster/pgstat.c| 3 + src/include/access/xlog.h | 6 ++ src/include/catalog/pg_proc.dat| 4 ++ src/include/pgstat.h | 3 +- src/test/recovery/t/004_timeline_switch.pl | 6 +- src/test/recovery/t/009_twophase.pl| 6 +- 12 files changed, 138 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5193df3366..88121cdc66 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18731,6 +18731,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_terminate_backend + +pg_promote + signal @@ -18790,6 +18793,16 @@ SELECT set_config('log_statement_stats', 'off', false); however only superusers can terminate superuser backends. + + +pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60) + + boolean + Promote a physical standby server. This function is restricted to +superusers by default, but other users can be granted EXECUTE to run +the function. + + @@ -18827,6 +18840,14 @@ SELECT set_config('log_statement_stats', 'off', false); subprocess. + +pg_promote can only be called on standby servers. +If the argument wait is true, +the function waits until promotion is complete or wait_seconds +seconds have passed, otherwise the function returns immediately after sending +the promotion signal to the postmaster. + + diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index ebcb3daaed..f8e036965c 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1472,7 +1472,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' To trigger failover of a log-shipping standby server, -run pg_ctl promote or create a trigger +run pg_ctl promote, call pg_promote(), or create a trigger file with the file name and path specified by the trigger_file setting in recovery.conf. If you're planning to use pg_ctl promote to fail over, trigger_file is diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 92825fdf19..d06cd0b08e 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -439,7 +439,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows Specifies a trigger file whose presence ends recovery in the standby. Even if this value is not set, you can still promote - the standby using pg_ctl promote. + the standby using pg_ctl promote or calling + pg_promote(). This setting has no effect if standby_mode is off. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7375a78ffc..62fc418893 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -78,12 +78,6 @@ extern uint32 bootstrap_data_checksum_version; -/* File path names (all relative to $PGDATA) */ -#define RECOVERY_COMMAND_FILE "recovery.conf" -#define RECOVERY_COMMAND_DONE "recovery.done" -#d
Speeding up text_position_next with multibyte encodings
Attached is a patch to speed up text_position_setup/next(), in some common cases with multibyte encodings. text_position_next() uses the Boyer-Moore-Horspool search algorithm, with a skip table. Currently, with a multi-byte encoding, we first convert both input strings to arrays of wchars, and run the algorithm on those arrays. This patch removes the mb->wchar conversion, and instead runs the single-byte version of the algorithm directly on the inputs, even with multi-byte encodings. That avoids the mb->wchar conversion altogether, when there is no match. Even when there is a match, we don't need to convert the whole input string to wchars. It's enough to count the characters up to the match, using pg_mblen(). And when text_position_setup/next() are used as part of split_part() or replace() functions, we're not actually even interested in the character position, so we can skip that too. Avoiding the large allocation is nice, too. That was actually why I stated to look into this: a customer complained that text_position fails with strings larger than 256 MB. Using the byte-oriented search on multibyte strings might return false positives, though. To make that work, after finding a match, we verify that the match doesn't fall in the middle of a multi-byte character. However, as an important special case, that cannot happen with UTF-8, because it has the property that the byte sequence of one character never appears as part of another character. I think some other encodings might have the same property, but I wasn't sure. This is a win in most cases. One case is slower: calling position() with a large haystack input, where the match is near the end of the string. Counting the characters up to that point is slower than the mb->wchar conversion. I think we could avoid that regression too, if we had a more optimized function to count characters. Currently, the code calls pg_mblen() in a loop, whereas in pg_mb2wchar_with_len(), the similar loop through all the characters is a tight loop within the function. Thoughts? - Heikki >From 1efb5ace7cf9da63f300942f15f9da2fddfb4de5 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 19 Oct 2018 15:12:39 +0300 Subject: [PATCH 1/1] Use single-byte Boyer-Moore-Horspool search even with multibyte encodings. The old implementation first converted the input strings to arrays of wchars, and performed the on those. However, the conversion is expensive, and for a large input string, consumes a lot of memory. Avoid the conversion, and instead use the single-byte algorithm even with multibyte encodings. That has a couple of problems, though. Firstly, we might get fooled if the needle string's byte sequence appears embedded inside a different string. We might incorrectly return a match in the middle of a multi-byte character. The second problem is that the text_position function needs to return the position of the match, counted in characters, rather than bytes. We can work around both of those problems by an extra step after finding a match. Walk the string one character at a time, starting from the beginning, until we reach the position of the match. We can then check if the match was found at a valid character boundary, which solves the first problem, and we can count the characters, so that we can return the character offset. Walking the characters is faster than the wchar conversion, especially in the case that there are no matches and we can avoid it altogether. Also, avoiding the large allocation allows these functions to work with inputs larger than 256 MB, even with multibyte encodings. For UTF-8, we can even skip the step to verify that the match lands on a character boundary, because in UTF-8, the byte sequence of one character cannot appear in the middle of a different character. I think many other encodings have the same property, but I wasn't sure, so I only enabled that optimization for UTF-8. Most of the callers of the text_position_setup/next functions were actually not interested in the position of the match, counted in characters. To cater for them, refactor the text_position_next() interface into two parts: searching for the next match (text_position_next()), and returning the current match's position as a pointer (text_position_get_match_ptr()) or as a character offset (text_position_get_match_pos()). Most callers of text_position_setup/next are not interested in the character offsets, so getting the pointer to the match within the original string is a more convenient API for them. It also allows skipping the character counting with UTF-8 encoding altogether. --- src/backend/utils/adt/varlena.c | 475 +--- 1 file changed, 253 insertions(+), 222 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index a5e812d026..0c8c8b9a4f 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -43,18 +43,33 @@ int byt
ERROR's turning FATAL in BRIN regression tests
Hi all, I ran into a surprising behavior while hacking on the FSM delay patch. I changed the signature of a freespace.c function that the BRIN code calls, and this change by itself doesn't cause a crash. With the full FSM patch, causing BRIN errors in manual queries in psql doesn't cause a crash. However, during the BRIN regression tests, the queries that purposely cause errors result in FATAL instead, causing a crash. For example, brin_summarize_new_values() eventually leads to calling index_open(): ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", RelationGetRelationName(r; But the elevel is 21 (FATAL) in this partial stack trace: #0 0x5623aff8aed6 in proc_exit_prepare (code=1) at /home/john/pgdev/postgresql/src/backend/storage/ipc/ipc.c:209 __func__ = "proc_exit_prepare" #1 0x5623aff8adbb in proc_exit (code=1) at /home/john/pgdev/postgresql/src/backend/storage/ipc/ipc.c:107 __func__ = "proc_exit" #2 0x5623b01435c3 in errfinish (dummy=0) at /home/john/pgdev/postgresql/src/backend/utils/error/elog.c:540 edata = 0x5623b06f63c0 elevel = 21 oldcontext = 0x5623b2121b10 econtext = 0x0 __func__ = "errfinish" #3 0x5623afbb9ab1 in index_open (relationId=52389, lockmode=4) at /home/john/pgdev/postgresql/src/backend/access/index/indexam.c:159 __errno_location = r = 0x7f03773e1750 __func__ = "index_open" If I comment out the tests that purposely cause errors, the BRIN tests fail normally, but then another test in the same parallel group crashes, causing all later tests to fail. --- 1 ! psql: FATAL: the database system is in recovery mode and *** 65,67 --- 65,74 -- Modify fillfactor in existing index alter index spgist_point_idx set (fillfactor = 90); reindex index spgist_point_idx; + WARNING: terminating connection because of crash of another server process + DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. + HINT: In a moment you should be able to reconnect to the database and repeat your command. + server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. + connection to server was lost I'm thoroughly stumped -- anyone have an idea where to look next? Thanks, -John Naylor
Re: pgsql: Add TAP tests for pg_verify_checksums
On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote: > Fine by me. Thanks. This is now committed after some tweaks to the comments, a bit earlier than I thought first. -- Michael signature.asc Description: PGP signature
Re: lowering pg_regress privileges on Windows
On 10/18/2018 08:25 PM, Thomas Munro wrote: On Fri, Oct 19, 2018 at 1:13 PM Michael Paquier wrote: On Thu, Oct 18, 2018 at 08:31:11AM -0400, Andrew Dunstan wrote: The attached ridiculously tiny patch solves the problem whereby while we can run Postgres on Windows safely from an Administrator account, we can't run run the regression tests from the same account, since it fails on the tablespace test, the tablespace directory having been set up without first having lowered privileges. The solution is to lower pg_regress' privileges in the same way that we do with other binaries. This is useful in setups like Appveyor where running under any other account is ... difficult. For the cfbot Thomas has had to make the script hack the schedule file to omit the tablespace test. This would make that redundant. I propose to backpatch this. It's close enough to a bug and the risk is almost infinitely small. +1. get_restricted_token() refactoring has been done down to REL9_5_STABLE. With 9.4 and older you would need to copy again this full routine into pg_regress.c, which is in my opinion not worth worrying about. FWIW here is a successful Appveyor build including the full test schedule (CI patch attached in case anyone is interested). Woohoo! Thanks for figuring that out Andrew. I will be very happy to remove that wart from my workflows. https://ci.appveyor.com/project/macdice/postgres/builds/19626669 Excellent. I'll apply back to 9.5 as Michael suggests. Having got past that hurdle I encountered another one in the same area. pg_upgrade gives up its privileges and is then unable to write things like log files and analyze scripts. The attached patch cures the problem, but it doesn't seem like the best cure. Maybe there is a more secure way to do it. Essentially it saves out the ACLS for the current directory and its subdirectories and then allows everyone to write to them, right before running pg_upgrade. When pg_upgrade is done it restores the saved ACLs. Maybe someone who understands more about how this all works can suggest a less blunt force approach. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl index ce5c976c16..0f25b44d0a 100644 --- a/src/tools/msvc/vcregress.pl +++ b/src/tools/msvc/vcregress.pl @@ -569,11 +569,14 @@ sub upgradecheck $ENV{PGDATA} = "$data"; print "\nSetting up new cluster\n\n"; standard_initdb() or exit 1; + system('icacls . /save savedacls /t'); + system('icacls . /grant "*S-1-1-0":(OI)(CI)M'); print "\nRunning pg_upgrade\n\n"; @args = ( 'pg_upgrade', '-d', "$data.old", '-D', $data, '-b', $bindir, '-B', $bindir); system(@args) == 0 or exit 1; + system('icacls . /restore savedacls'); print "\nStarting new cluster\n\n"; @args = ('pg_ctl', '-l', "$logdir/postmaster2.log", 'start'); system(@args) == 0 or exit 1;
WAL archive (archive_mode = always) ?
Hi, What is the advantage to use archive_mode = always in a slave server compared to archive_mode = on (shared WAL archive) ? I only see duplication of Wal files, what is the purpose of this feature ? Many thanks in advance, Adelino.
Re: removing unnecessary get_att*() lsyscache functions
On Thu, Oct 18, 2018 at 09:57:00PM +0200, Peter Eisentraut wrote: > I noticed that get_attidentity() isn't really necessary because the > information can be obtained from an existing tuple descriptor in each > case. This one is also recent, so it looks fine to remove it. > Also, get_atttypmod() hasn't been used since 2004. github is not actually reporting areas where this is used. > I propose the attached patches to remove these two functions. > - if (get_attidentity(RelationGetRelid(rel), attnum)) > + if (TupleDescAttr(RelationGetDescr(rel), attnum - 1)->attidentity) I find this style heavy, saving Form_pg_attribute into a different variable would be more readable in my opinion.. -- Michael signature.asc Description: PGP signature
Re: WAL archive (archive_mode = always) ?
On Fri, Oct 19, 2018 at 03:00:15PM +0100, Adelino Silva wrote: > What is the advantage to use archive_mode = always in a slave server > compared to archive_mode = on (shared WAL archive) ? > I only see duplication of Wal files, what is the purpose of this > feature? Some users like having redundancy in their backups and archives, so as all things are on multiple location. archive_mode = always helps in leveraging these needs. -- Michael signature.asc Description: PGP signature
Re: ERROR's turning FATAL in BRIN regression tests
John Naylor writes: > I changed the signature of a freespace.c function that the BRIN code > calls, and this change by itself doesn't cause a crash. With the full > FSM patch, causing BRIN errors in manual queries in psql doesn't cause > a crash. However, during the BRIN regression tests, the queries that > purposely cause errors result in FATAL instead, causing a crash. Sounds like something's messing up the backend's exception stack, so that elog.c has noplace to throw the error to. See the promote-ERROR-to-FATAL logic therein. regards, tom lane
Re: Problem about partitioned table
On 10/19/18 2:03 AM, Mehman Jafarov wrote: Hi everyone, I have a problem with partitioned table in PostgreSql. Actually I use the version 10. I created the partitioned table in test environment but face some problems with partitioned table constraint. I google all about it last week and from the official site I get that version 11 will be released and that feature will be supported as I understand it. From version 11 documentation "/Add support for |PRIMARY KEY|, |FOREIGN KEY|, indexes, and triggers on partitioned tables/" I install and configure yesterday as new 11 version released. And test it. Unfortunately I didn't achieve again. Neither I don't understand the new feature nor this case is actually not supported. Please help me about the problem. As you found out: https://www.postgresql.org/docs/11/static/ddl-partitioning.html 5.10.2.3. Limitations "While primary keys are supported on partitioned tables, foreign keys referencing partitioned tables are not supported. (Foreign key references from a partitioned table to some other table are supported.)" Note: I want to create constraint only one-to-one column (/fk_to_user_doc_rel - oid/) BIG QUESTION IS THAT How can I solve this problem? What is your recommendations? Well a FK is a form of a trigger, so maybe create your own trigger on the child table(s). *PLEASE HELP ME !!!* -- */Best Regards,/* */Mehman Jafarov/* */DBA Aministrator at CyberNet LLC/* */ /* -- Adrian Klaver adrian.kla...@aklaver.com
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote: > > Fine by me. > > Thanks. This is now committed after some tweaks to the comments, a bit > earlier than I thought first. I just saw this committed and I'm trying to figure out why we are creating yet-another-list when it comes to deciding what should be checksum'd and what shouldn't be. Specifically, pg_basebackup (or, really, src/backend/replication/basebackup.c) has 'is_checksummed_file' which operates differently from pg_verify_checksum with this change, and that seems rather wrong. Maybe we need to fix both places but I *really* don't like this approach of "well, we'll just guess if the file should have a checksum or not" and it certainly seems likely that we'll end up forgetting to update this when we introduce things in the future which have checksums (which seems pretty darn likely to happen). I also categorically disagree with the notion that it's ok for extensions to dump files into our tablespace diretories or that we should try to work around random code dumping extra files in the directories which we maintain- it's not like this additional code will actually protect us from that, after all, and it's foolish to think we really could protect against that. Basically, I think this commit should be reverted, we should go back to having a blacklist, update it in both places (and add comments to both places to make it clear that this list exists in two different places) and add code to handle the temp tablespace case explicitly. Even better would be to find a way to expose the list and the code for walking the directories and identifying the files which contain checksums, so that we're only doing that in one place instead of multiple different places. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost: > Greetings, > > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote: > > > Fine by me. > > > > Thanks. This is now committed after some tweaks to the comments, a bit > > earlier than I thought first. > > I just saw this committed and I'm trying to figure out why we are > creating yet-another-list when it comes to deciding what should be > checksum'd and what shouldn't be. > > Specifically, pg_basebackup (or, really, > src/backend/replication/basebackup.c) has 'is_checksummed_file' which > operates differently from pg_verify_checksum with this change, and that > seems rather wrong. To be fair, the list in src/backend/replication/basebackup.c was a copy- paste from the one in pg_verify_checksums (or from other parts of the online activation patch). I agree it makes sense to have both in sync or, better yet, factored out in a central place, but I don't currently have further opinions on whether it should be a black- or whitelist. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: pgsql: Add TAP tests for pg_verify_checksums
Michael Banck writes: > Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost: >> I just saw this committed and I'm trying to figure out why we are >> creating yet-another-list when it comes to deciding what should be >> checksum'd and what shouldn't be. > To be fair, the list in src/backend/replication/basebackup.c was a copy- > paste from the one in pg_verify_checksums (or from other parts of the > online activation patch). Seems like a job for a new(?) module in src/common/. regards, tom lane
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Michael Banck writes: > > Am Freitag, den 19.10.2018, 10:36 -0400 schrieb Stephen Frost: > >> I just saw this committed and I'm trying to figure out why we are > >> creating yet-another-list when it comes to deciding what should be > >> checksum'd and what shouldn't be. > > > To be fair, the list in src/backend/replication/basebackup.c was a copy- > > paste from the one in pg_verify_checksums (or from other parts of the > > online activation patch). > > Seems like a job for a new(?) module in src/common/. Yeah, that would be nice... Even nicer would be a way for non-core PG tools to be able to use it (we have basically the same thing in pgbackrest, unsurprisingly), though I realize that's a much larger step. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Seems like a job for a new(?) module in src/common/. > Yeah, that would be nice... Even nicer would be a way for non-core PG > tools to be able to use it (we have basically the same thing in > pgbackrest, unsurprisingly), though I realize that's a much larger step. We install libpgcommon.a (and, as of HEAD, libpgcommon_shlib.a) for precisely that sort of reason. regards, tom lane
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> Seems like a job for a new(?) module in src/common/. > > > Yeah, that would be nice... Even nicer would be a way for non-core PG > > tools to be able to use it (we have basically the same thing in > > pgbackrest, unsurprisingly), though I realize that's a much larger step. > > We install libpgcommon.a (and, as of HEAD, libpgcommon_shlib.a) for > precisely that sort of reason. Hmm, yeah, we might be able to use that for this.. One challenge we've been thinking about has been dealing with multiple major versions of PG with one build of pgbackrest since we'd really like to do things like inspect the control file, but that changes between versions. We do have multi-version code in some places (quite a bit in pg_dump..) so perhaps we could have libpgcommon support also. Probably a topic for another thread. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-19 10:36:59 -0400, Stephen Frost wrote: > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote: > > > Fine by me. > > > > Thanks. This is now committed after some tweaks to the comments, a bit > > earlier than I thought first. > > I just saw this committed and I'm trying to figure out why we are > creating yet-another-list when it comes to deciding what should be > checksum'd and what shouldn't be. I'm not sure it's fair to blame Michael here. He's picking up slack, because the original authors of the tool didn't even bother to reply to the issue for quite a while. pg_verify_checksum was obviously never tested on windows, it just didn't have tests showing that. > I also categorically disagree with the notion that it's ok for > extensions to dump files into our tablespace diretories or that we > should try to work around random code dumping extra files in the > directories which we maintain- it's not like this additional code will > actually protect us from that, after all, and it's foolish to think we > really could protect against that. I'm unconvinced. There already are extensions doing so, and it's not like we've given them any sort of reasonable alternative. You can't just create relations in a "registered" / "supported" kinda way right now, so uh, yea, extension gotta make do. And that has worked for many years. Also, as made obvious here, it's pretty clear that there's platform differences about which files exists and which don't, so it's not that a blacklist approach automatically is more maintainable. And I fail to see why a blacklist is architecturally better. There's plenty cases where we might want to create temporary files, non-checksummed data or such that we'd need a teach a blacklist about, but there's far fewer cases where add new checksummed files. Basically never since checksums have been introduced. And if checksums were introduced for something new, say slrus, we'd ceertainly use pg_verify_checksum during development. Greetings, Andres Freund
[Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Hi hackers, Currently Postgres has options for continuous WAL files archiving, which is quite often used along with master-replica setup. OK, then the worst is happened and it's time to get your old master back and synchronize it with new master (ex-replica) with pg_rewind. However, required WAL files may be already archived and pg_rewind will fail. You can copy these files manually, but it is difficult to calculate, which ones you need. Anyway, it complicates building failover system with automatic failure recovery. I expect, that it will be a good idea to allow pg_rewind to look for a restore_command in the target data directory recovery.conf or pass it is as a command line argument. Then pg_rewind can use it to get missing WAL files from the archive. I had a few talks with DBAs and came to conclusion, that this is a highly requested feature. I prepared a proof of concept patch (please, find attached), which does exactly what I described above. I played with it a little and it seems to be working, tests were accordingly updated to verify this archive retrieval functionality too. Patch is relatively simple excepting the one part: if we want to parse recovery.conf (with all possible includes, etc.) and get restore_command, then we should use guc-file.l parser, which is heavily linked to backend, e.g. in error reporting part. So I copied it and made frontend-safe version guc-file-fe.l. Personally, I don't think it's a good idea, but nothing else came to mind. It is also possible to leave the only one option -- passing restore_command as command line argument. What do you think? -- Alexey Kondratov Postgres Professional: https://www.postgrespro.com Russian Postgres Company diff --combined src/bin/pg_rewind/Makefile index a22fef1352,2bcfcc61af..00 --- a/src/bin/pg_rewind/Makefile +++ b/src/bin/pg_rewind/Makefile @@@ -20,7 -20,6 +20,7 @@@ LDFLAGS_INTERNAL += $(libpq_pgport OBJS = pg_rewind.o parsexlog.o xlogreader.o datapagemap.o timeline.o \ fetch.o file_ops.o copy_fetch.o libpq_fetch.o filemap.o logging.o \ + guc-file-fe.o \ $(WIN32RES) EXTRA_CLEAN = xlogreader.c diff --combined src/bin/pg_rewind/RewindTest.pm index 8dc39dbc05,1dce56d035..00 --- a/src/bin/pg_rewind/RewindTest.pm +++ b/src/bin/pg_rewind/RewindTest.pm @@@ -40,7 -40,6 +40,7 @@@ use Config use Exporter 'import'; use File::Copy; use File::Path qw(rmtree); +use File::Glob; use IPC::Run qw(run); use PostgresNode; use TestLib; @@@ -249,41 -248,6 +249,41 @@@ sub run_pg_rewin "--no-sync" ], 'pg_rewind remote'); + } + elsif ($test_mode eq "archive") + { + + # Do rewind using a local pgdata as source and + # specified directory with target WALs archive. + my $wals_archive_dir = "${TestLib::tmp_check}/master_wals_archive"; + my $test_master_datadir = $node_master->data_dir; + my @wal_files = glob "$test_master_datadir/pg_wal/000*"; + my $restore_command; + + rmtree($wals_archive_dir); + mkdir($wals_archive_dir) or die; + + # Move all old master WAL files to the archive. + # Old master should be stopped at this point. + foreach my $wal_file (@wal_files) + { + move($wal_file, "$wals_archive_dir/") or die; + } + + $restore_command = "cp $wals_archive_dir/\%f \%p"; + + # Stop the new master and be ready to perform the rewind. + $node_standby->stop; + command_ok( + [ +'pg_rewind', +"--debug", +"--source-pgdata=$standby_pgdata", +"--target-pgdata=$master_pgdata", +"--no-sync", +"-R", $restore_command + ], + 'pg_rewind archive'); } else { diff --combined src/bin/pg_rewind/parsexlog.c index 11a9c26cd2,40028471bf..00 --- a/src/bin/pg_rewind/parsexlog.c +++ b/src/bin/pg_rewind/parsexlog.c @@@ -12,7 -12,6 +12,7 @@@ #include "postgres_fe.h" #include +#include #include "pg_rewind.h" #include "filemap.h" @@@ -46,10 -45,7 +46,10 @@@ static char xlogfpath[MAXPGPATH] typedef struct XLogPageReadPrivate { const char *datadir; + const char *restoreCommand; int tliIndex; + XLogRecPtr oldrecptr; + TimeLineID oldtli; } XLogPageReadPrivate; static int SimpleXLogPageRead(XLogReaderState *xlogreader, @@@ -57,10 -53,6 +57,10 @@@ int reqLen, XLogRecPtr targetRecPtr, char *readBuf, TimeLineID *pageTLI); +static bool RestoreArchivedWAL(const char *path, const char *xlogfname, + off_t expectedSize, const char *restoreCommand, + const char *lastRestartPointFname); + /* * Read WAL from the datadir/pg_wal, starting from 'startpoint' on timeline * index 'tliIndex' in target timeline history, until 'endpoint'. Make note of @@@ -68,19 -60,15 +68,19 @@@ */ void extractPageMap(const char *datadir, XLogRecPtr startpoint, int tliIndex, - XLogRecPtr endpoint) + ControlFileData *targetCF, const char *restore_command) { XLogRecord *record; + XLogRecPtr endpoint = targetCF->checkPoint; XLog
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 10:36:59 -0400, Stephen Frost wrote: > > * Michael Paquier (mich...@paquier.xyz) wrote: > > > On Wed, Oct 17, 2018 at 05:30:05PM -0400, Andrew Dunstan wrote: > > > > Fine by me. > > > > > > Thanks. This is now committed after some tweaks to the comments, a bit > > > earlier than I thought first. > > > > I just saw this committed and I'm trying to figure out why we are > > creating yet-another-list when it comes to deciding what should be > > checksum'd and what shouldn't be. > > I'm not sure it's fair to blame Michael here. He's picking up slack, > because the original authors of the tool didn't even bother to reply to > the issue for quite a while. pg_verify_checksum was obviously never > tested on windows, it just didn't have tests showing that. I wasn't really going for 'blame' here, just that we've now got two different methods for doing the same thing- and those will give different answers; presumably this issue impacts pg_basebackup too, since the logic there wasn't changed. > > I also categorically disagree with the notion that it's ok for > > extensions to dump files into our tablespace diretories or that we > > should try to work around random code dumping extra files in the > > directories which we maintain- it's not like this additional code will > > actually protect us from that, after all, and it's foolish to think we > > really could protect against that. > > I'm unconvinced. There already are extensions doing so, and it's not > like we've given them any sort of reasonable alternative. You can't just > create relations in a "registered" / "supported" kinda way right now, so > uh, yea, extension gotta make do. And that has worked for many years. I don't agree that any of this is an argument for accepting random code writing into arbitrary places in the data directory or tablespace directories as being something which we support or which we write code to work-around. If this is a capability that extension authors need then I'm all for having a way for them to have that capability in a clearly defined and documented way- and one which we could actually write code to handle and work with. > Also, as made obvious here, it's pretty clear that there's platform > differences about which files exists and which don't, so it's not that a > blacklist approach automatically is more maintainable. Platform differences are certainly something we can manage and we have code in a few different places for doing exactly that. A platform difference is a heck of a lot more reasonable than trying to guess at what random code that we've never seen before has done and try to work around that. > And I fail to see why a blacklist is architecturally better. There's > plenty cases where we might want to create temporary files, > non-checksummed data or such that we'd need a teach a blacklist about, > but there's far fewer cases where add new checksummed files. Basically > never since checksums have been introduced. And if checksums were > introduced for something new, say slrus, we'd ceertainly use > pg_verify_checksum during development. In cases where we need something temporary, we're going to need to be prepared to clean those up and we had better make it very plain that they're temporary and easy to write code for. Anything we aren't prepared to blow away on a crash-restart should be checksum'd and in an ideal world, we'd be looking to reduce the blacklist to only those things which are temporary. Of course, we may need different code for calculating the checksum of different types of files, which moves it from really being a 'blacklist' to a list of file-types and their associated checksum'ing mechansim. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-19 13:52:28 -0400, Stephen Frost wrote: > > > I also categorically disagree with the notion that it's ok for > > > extensions to dump files into our tablespace diretories or that we > > > should try to work around random code dumping extra files in the > > > directories which we maintain- it's not like this additional code will > > > actually protect us from that, after all, and it's foolish to think we > > > really could protect against that. > > > > I'm unconvinced. There already are extensions doing so, and it's not > > like we've given them any sort of reasonable alternative. You can't just > > create relations in a "registered" / "supported" kinda way right now, so > > uh, yea, extension gotta make do. And that has worked for many years. > > I don't agree that any of this is an argument for accepting random code > writing into arbitrary places in the data directory or tablespace > directories as being something which we support or which we write code > to work-around. > > If this is a capability that extension authors need then I'm all for > having a way for them to have that capability in a clearly defined and > documented way- and one which we could actually write code to handle and > work with. cstore e.g. does this, and it's already out there. So yes, we should provide better infrastructure. But for now, we gotta deal with reality, unless we just want to gratuituously break things. > > And I fail to see why a blacklist is architecturally better. There's > > plenty cases where we might want to create temporary files, > > non-checksummed data or such that we'd need a teach a blacklist about, > > but there's far fewer cases where add new checksummed files. Basically > > never since checksums have been introduced. And if checksums were > > introduced for something new, say slrus, we'd ceertainly use > > pg_verify_checksum during development. > > In cases where we need something temporary, we're going to need to be > prepared to clean those up and we had better make it very plain that > they're temporary and easy to write code for. Anything we aren't > prepared to blow away on a crash-restart should be checksum'd and in an > ideal world, we'd be looking to reduce the blacklist to only those > things which are temporary. There's pending patches that add support for pg_verify_checksums running against a running cluster. We'll not just need to recognize files that are there after a graceful shutdown, but with anything that a cluster can have there while running. > Of course, we may need different code for > calculating the checksum of different types of files, which moves it > from really being a 'blacklist' to a list of file-types and their > associated checksum'ing mechansim. Which pretty much is a whitelist. Given that we need a filetype->checksumtype mapping or something, how is a blacklist a reasonable approach for this? Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 13:52:28 -0400, Stephen Frost wrote: > > > > I also categorically disagree with the notion that it's ok for > > > > extensions to dump files into our tablespace diretories or that we > > > > should try to work around random code dumping extra files in the > > > > directories which we maintain- it's not like this additional code will > > > > actually protect us from that, after all, and it's foolish to think we > > > > really could protect against that. > > > > > > I'm unconvinced. There already are extensions doing so, and it's not > > > like we've given them any sort of reasonable alternative. You can't just > > > create relations in a "registered" / "supported" kinda way right now, so > > > uh, yea, extension gotta make do. And that has worked for many years. > > > > I don't agree that any of this is an argument for accepting random code > > writing into arbitrary places in the data directory or tablespace > > directories as being something which we support or which we write code > > to work-around. > > > > If this is a capability that extension authors need then I'm all for > > having a way for them to have that capability in a clearly defined and > > documented way- and one which we could actually write code to handle and > > work with. > > cstore e.g. does this, and it's already out there. So yes, we should > provide better infrastructure. But for now, we gotta deal with reality, > unless we just want to gratuituously break things. No, I don't agree that we are beholden to an external extension that decided to start writing into directories that clearly belong to PG. How do we even know that what cstore does in the tablespace directory wouldn't get caught up in the checksum file pattern-matching that this commit put in? What if there was an extension which did create files that would match, what would we do then? I'm happy to go create one, if that'd help make the point that this "pattern whitelist" isn't actually a solution but is really rather just a hack that'll break in the future. > > > And I fail to see why a blacklist is architecturally better. There's > > > plenty cases where we might want to create temporary files, > > > non-checksummed data or such that we'd need a teach a blacklist about, > > > but there's far fewer cases where add new checksummed files. Basically > > > never since checksums have been introduced. And if checksums were > > > introduced for something new, say slrus, we'd ceertainly use > > > pg_verify_checksum during development. > > > > In cases where we need something temporary, we're going to need to be > > prepared to clean those up and we had better make it very plain that > > they're temporary and easy to write code for. Anything we aren't > > prepared to blow away on a crash-restart should be checksum'd and in an > > ideal world, we'd be looking to reduce the blacklist to only those > > things which are temporary. > > There's pending patches that add support for pg_verify_checksums running > against a running cluster. We'll not just need to recognize files that > are there after a graceful shutdown, but with anything that a cluster > can have there while running. Of course- the same is true with a crash/restart case, so I'm not sure what you're getting at here. I wasn't ever thinking of only the graceful shutdown case- certainly pg_basebackup operates when the cluster is running also and that's more what I was thinking about from the start and which is part of what started the discussion about this commit as that was entirely ignored in this change. > > Of course, we may need different code for > > calculating the checksum of different types of files, which moves it > > from really being a 'blacklist' to a list of file-types and their > > associated checksum'ing mechansim. > > Which pretty much is a whitelist. Given that we need a > filetype->checksumtype mapping or something, how is a blacklist a > reasonable approach for this? We only need the mapping for special files- call that list of special files a whitelist or a blacklist or a bluelist, it doesn't matter, what's relevant here is that we don't skip over anything- we account for everything and make sure that everything that would survive a crash/restart is checked or at least accounted for if we can't, yet, check it. Thanks! Stephen signature.asc Description: PGP signature
Re: Re: pgsql: Add TAP tests for pg_verify_checksums
On 10/19/2018 01:17 PM, Andres Freund wrote: > On 2018-10-19 10:36:59 -0400, Stephen Frost wrote: > >> I also categorically disagree with the notion that it's ok for >> extensions to dump files into our tablespace diretories > > I'm unconvinced. There already are extensions doing so, and it's not > like we've given them any sort of reasonable alternative. You can't just > create relations in a "registered" / "supported" kinda way right now, so > uh, yea, extension gotta make do. And that has worked for many years. For some of us following along at home, this got interesting right here. Could someone elaborate on what extensions do this, for what purpose? And what would it mean to create relations in a "registered" / "supported" kinda way? Has that been the topic of a past discussion I could catch up with? -Chap
Re: pgsql: Add TAP tests for pg_verify_checksums
On 2018-10-19 14:08:19 -0400, J Chapman Flack wrote: > On 10/19/2018 01:17 PM, Andres Freund wrote: > > On 2018-10-19 10:36:59 -0400, Stephen Frost wrote: > > > >> I also categorically disagree with the notion that it's ok for > >> extensions to dump files into our tablespace diretories > > > > I'm unconvinced. There already are extensions doing so, and it's not > > like we've given them any sort of reasonable alternative. You can't just > > create relations in a "registered" / "supported" kinda way right now, so > > uh, yea, extension gotta make do. And that has worked for many years. > > For some of us following along at home, this got interesting right here. > Could someone elaborate on what extensions do this, for what purpose? > And what would it mean to create relations in a "registered" / > "supported" kinda way? Has that been the topic of a past discussion > I could catch up with? An fdw, like cstore, might be doing it, to store data, for example. The registered / supported thing would be to have a catalog representation of on-disk files that is represented in the catalogs somewhow. Right now you can't really do that because you need a proper relation (table, index, ...) to get a relfilenode. Greetings, Andres Freund
Re: Optimze usage of immutable functions as relation
Hi, Thank you for the review. I fixed a typo and some comments. Please find new version attached. --- Best regards, Parfenov Aleksandr On Fri, 19 Oct 2018 at 16:40, Anthony Bykov wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, failed > Implements feature: not tested > Spec compliant: not tested > Documentation:not tested > > Hello, > I was trying to review your patch, but I couldn't install it: > > prepjointree.c: In function ‘pull_up_simple_function’: > prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this > function); did you mean ‘PGFunction’? > Assert(!contain_vars_of_level((Node *) functions, 0)); > > Was it a typo? > > The new status of this patch is: Waiting on Author > diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index cd6e119..4a9d74a 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -23,7 +23,9 @@ */ #include "postgres.h" +#include "access/htup_details.h" #include "catalog/pg_type.h" +#include "catalog/pg_proc.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" @@ -35,6 +37,8 @@ #include "parser/parse_relation.h" #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" +#include "utils/syscache.h" +#include "utils/lsyscache.h" typedef struct pullup_replace_vars_context @@ -86,6 +90,8 @@ static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte, bool deletion_ok); static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte); +static Node *pull_up_simple_function(PlannerInfo *root, Node *jtnode, + RangeTblEntry *rte); static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte, bool deletion_ok); static bool is_simple_union_all(Query *subquery); @@ -598,6 +604,54 @@ inline_set_returning_functions(PlannerInfo *root) } } +static bool +is_simple_stable_function(RangeTblEntry *rte) +{ + Form_pg_type type_form; + RangeTblFunction *tblFunction = linitial_node(RangeTblFunction, rte->functions); + FuncExpr *expr = (FuncExpr *) tblFunction->funcexpr; + HeapTuple tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(expr->funcresulttype)); + bool result = false; + + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for type %u", expr->funcresulttype); + + type_form = (Form_pg_type) GETSTRUCT(tuple); + + if (type_form->typtype == TYPTYPE_BASE && + !type_is_array(expr->funcresulttype)) + { + Form_pg_proc func_form; + ListCell *arg; + bool has_nonconst_input = false; + bool has_null_input = false; + + ReleaseSysCache(tuple); + tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(expr->funcid)); + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for function %u", expr->funcid); + func_form = (Form_pg_proc) GETSTRUCT(tuple); + + foreach(arg, expr->args) + { + if (IsA(lfirst(arg), Const)) +has_null_input |= ((Const *) lfirst(arg))->constisnull; + else +has_nonconst_input = true; + } + + result = func_form->prorettype != RECORDOID && + func_form->prokind == PROKIND_FUNCTION && + !func_form->proretset && + func_form->provolatile == PROVOLATILE_IMMUTABLE && + !has_null_input && + !has_nonconst_input; + } + + ReleaseSysCache(tuple); + return result; +} + /* * pull_up_subqueries * Look for subqueries in the rangetable that can be pulled up into @@ -728,6 +782,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, is_simple_values(root, rte, deletion_ok)) return pull_up_simple_values(root, jtnode, rte); + if (rte->rtekind == RTE_FUNCTION && +list_length(rte->functions) == 1 && +is_simple_stable_function(rte)) + return pull_up_simple_function(root, jtnode, rte); + /* Otherwise, do nothing at this node. */ } else if (IsA(jtnode, FromExpr)) @@ -1710,6 +1769,98 @@ pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte) return NULL; } +static Node * +pull_up_simple_function(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte) +{ + Query *parse = root->parse; + int varno = ((RangeTblRef *) jtnode)->rtindex; + List *functions_list; + List *tlist; + AttrNumber attrno; + pullup_replace_vars_context rvcontext; + ListCell *lc; + + Assert(rte->rtekind == RTE_FUNCTION); + Assert(list_length(rte->functions) == 1); + + /* + * Need a modifiable copy of the functions list to hack on, just in case it's + * multiply referenced. + */ + functions_list = copyObject(rte->functions); + + /* + * The FUNCTION RTE can't contain any Vars of level zero, let alone any that + * are join aliases, so no need to flatten join alias Vars. + */ + Assert(!contain_vars_of_level((Node *) functions_list, 0)); + + /* + * Set up required context data for pullup_replace_vars. In particular, + * we have to make the FUNCTION lis
Re: pgsql: Add TAP tests for pg_verify_checksums
On 2018-10-19 14:11:03 -0400, Stephen Frost wrote: > Greetings, > > * Andres Freund (and...@anarazel.de) wrote: > > On 2018-10-19 13:52:28 -0400, Stephen Frost wrote: > > > > > I also categorically disagree with the notion that it's ok for > > > > > extensions to dump files into our tablespace diretories or that we > > > > > should try to work around random code dumping extra files in the > > > > > directories which we maintain- it's not like this additional code will > > > > > actually protect us from that, after all, and it's foolish to think we > > > > > really could protect against that. > > > > > > > > I'm unconvinced. There already are extensions doing so, and it's not > > > > like we've given them any sort of reasonable alternative. You can't just > > > > create relations in a "registered" / "supported" kinda way right now, so > > > > uh, yea, extension gotta make do. And that has worked for many years. > > > > > > I don't agree that any of this is an argument for accepting random code > > > writing into arbitrary places in the data directory or tablespace > > > directories as being something which we support or which we write code > > > to work-around. > > > > > > If this is a capability that extension authors need then I'm all for > > > having a way for them to have that capability in a clearly defined and > > > documented way- and one which we could actually write code to handle and > > > work with. > > > > cstore e.g. does this, and it's already out there. So yes, we should > > provide better infrastructure. But for now, we gotta deal with reality, > > unless we just want to gratuituously break things. > > No, I don't agree that we are beholden to an external extension that > decided to start writing into directories that clearly belong to PG. Did it have an alternative? > How do we even know that what cstore does in the tablespace directory > wouldn't get caught up in the checksum file pattern-matching that this > commit put in? You listen to people? > What if there was an extension which did create files that would > match, what would we do then? I'm happy to go create one, if that'd > help make the point that this "pattern whitelist" isn't actually a > solution but is really rather just a hack that'll break in the future. Oh, for crying out loud. Yes, obviously you can create conflicts, nobody ever doubted that. How on earth is that a useful point for anything? The point isn't that it's a good idea for extensions to do so, but that it has happened because we didn't provide better alternative. > > > > And I fail to see why a blacklist is architecturally better. There's > > > > plenty cases where we might want to create temporary files, > > > > non-checksummed data or such that we'd need a teach a blacklist about, > > > > but there's far fewer cases where add new checksummed files. Basically > > > > never since checksums have been introduced. And if checksums were > > > > introduced for something new, say slrus, we'd ceertainly use > > > > pg_verify_checksum during development. > > > > > > In cases where we need something temporary, we're going to need to be > > > prepared to clean those up and we had better make it very plain that > > > they're temporary and easy to write code for. Anything we aren't > > > prepared to blow away on a crash-restart should be checksum'd and in an > > > ideal world, we'd be looking to reduce the blacklist to only those > > > things which are temporary. > > > > There's pending patches that add support for pg_verify_checksums running > > against a running cluster. We'll not just need to recognize files that > > are there after a graceful shutdown, but with anything that a cluster > > can have there while running. > > Of course- the same is true with a crash/restart case, so I'm not sure > what you're getting at here. pg_verify_checksum doesn't support running on a crashed cluster, and I'm not sure it'd make sense to teach it to so (there's not really much it could do to discern whether a block is a torn page that replay will fix, or corrupted). > I wasn't ever thinking of only the graceful shutdown case- certainly > pg_basebackup operates when the cluster is running also and that's > more what I was thinking about from the start and which is part of > what started the discussion about this commit as that was entirely > ignored in this change. It was mainly ignored in the original pg_verify_checksums change, not in the commit discussed here. That was broken. That built separate logic. Both basebackup an verify checksums already only scan certain directories. They *already* are encoding directory structure information.
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 14:11:03 -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > cstore e.g. does this, and it's already out there. So yes, we should > > > provide better infrastructure. But for now, we gotta deal with reality, > > > unless we just want to gratuituously break things. > > > > No, I don't agree that we are beholden to an external extension that > > decided to start writing into directories that clearly belong to PG. > > Did it have an alternative? Yes, work with us to come up with a way to accomplish what they wanted without creating unknown/untracked files in our tablespaces. Or, have their own mechanism for storing files on other filesystems. And likely other options. Saying that we should account for their putting arbitrary files into the PG tablespace directories because they "didn't have any other choice" seems pretty ridiculous, imv. > > How do we even know that what cstore does in the tablespace directory > > wouldn't get caught up in the checksum file pattern-matching that this > > commit put in? > > You listen to people? > > > What if there was an extension which did create files that would > > match, what would we do then? I'm happy to go create one, if that'd > > help make the point that this "pattern whitelist" isn't actually a > > solution but is really rather just a hack that'll break in the future. > > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody > ever doubted that. How on earth is that a useful point for anything? The > point isn't that it's a good idea for extensions to do so, but that it > has happened because we didn't provide better alternative. The point is that you're making a case because you happen to know of an extension which does this, but neither of us know about every extension out there and I, at least, don't believe we should be coming up with hacks to try and avoid looking at files that some extension has dumped into the PG tablespace directories because that's never been a documented or supported thing to do. > > > > > And I fail to see why a blacklist is architecturally better. There's > > > > > plenty cases where we might want to create temporary files, > > > > > non-checksummed data or such that we'd need a teach a blacklist about, > > > > > but there's far fewer cases where add new checksummed files. Basically > > > > > never since checksums have been introduced. And if checksums were > > > > > introduced for something new, say slrus, we'd ceertainly use > > > > > pg_verify_checksum during development. > > > > > > > > In cases where we need something temporary, we're going to need to be > > > > prepared to clean those up and we had better make it very plain that > > > > they're temporary and easy to write code for. Anything we aren't > > > > prepared to blow away on a crash-restart should be checksum'd and in an > > > > ideal world, we'd be looking to reduce the blacklist to only those > > > > things which are temporary. > > > > > > There's pending patches that add support for pg_verify_checksums running > > > against a running cluster. We'll not just need to recognize files that > > > are there after a graceful shutdown, but with anything that a cluster > > > can have there while running. > > > > Of course- the same is true with a crash/restart case, so I'm not sure > > what you're getting at here. > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm > not sure it'd make sense to teach it to so (there's not really much it > could do to discern whether a block is a torn page that replay will fix, > or corrupted). ... and this isn't at all relevant, because pg_basebackup does run on a running cluster. > > I wasn't ever thinking of only the graceful shutdown case- certainly > > pg_basebackup operates when the cluster is running also and that's > > more what I was thinking about from the start and which is part of > > what started the discussion about this commit as that was entirely > > ignored in this change. > > It was mainly ignored in the original pg_verify_checksums change, not in > the commit discussed here. That was broken. That built separate logic. As pointed out elsewhere on this thread, the logic was the same between the two before this commit... The code in pg_basebackup came from the prior pg_verify_checksums code. Certainly, some mention of the code existing in two places, at least, should have been in the comments. Also, just to be clear, as I tried to say up-thread, I'm not trying to lay blame here or suggest that Michael shouldn't have been trying to fix the issue that came up, but I don't agree that this is the way to fix it, and, in any case, we need to make sure that pg_basebackup behaves correctly as well. As mentioned elsewhere, that'd be best done by adding something to libpgcommon and then changing both pg_basebackup and pg_verify_checksums to use that. > Both basebackup an
Google Code In Mentorship
Hello Sir/ Madam, I participated last year in the Google Code-in Competition and I kept track of the participating organizations. I feel that your organization fits my interests and skill sets perfectly and is the best addition to the participating organizations. I wish to mentor in the GCI this year. am Apurv Shah, a freshman at UMass Amherst pursuing a Bachelor's Degree in Computer Science. I have experience with SQL Server and C language along with Python and Django. I hope I can mentor for your organization. I have attached my resume for more information. -- Best Regards, Apurv Shah. Resume.pdf Description: Adobe PDF document
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-19 14:47:51 -0400, Stephen Frost wrote: > * Andres Freund (and...@anarazel.de) wrote: > > On 2018-10-19 14:11:03 -0400, Stephen Frost wrote: > > > * Andres Freund (and...@anarazel.de) wrote: > > > > cstore e.g. does this, and it's already out there. So yes, we should > > > > provide better infrastructure. But for now, we gotta deal with reality, > > > > unless we just want to gratuituously break things. > > > > > > No, I don't agree that we are beholden to an external extension that > > > decided to start writing into directories that clearly belong to PG. > > > > Did it have an alternative? > > Yes, work with us to come up with a way to accomplish what they wanted > without creating unknown/untracked files in our tablespaces. > > Or, have their own mechanism for storing files on other filesystems. > > And likely other options. Saying that we should account for their > putting arbitrary files into the PG tablespace directories because they > "didn't have any other choice" seems pretty ridiculous, imv. > > > > How do we even know that what cstore does in the tablespace directory > > > wouldn't get caught up in the checksum file pattern-matching that this > > > commit put in? > > > > You listen to people? > > > > > What if there was an extension which did create files that would > > > match, what would we do then? I'm happy to go create one, if that'd > > > help make the point that this "pattern whitelist" isn't actually a > > > solution but is really rather just a hack that'll break in the future. > > > > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody > > ever doubted that. How on earth is that a useful point for anything? The > > point isn't that it's a good idea for extensions to do so, but that it > > has happened because we didn't provide better alternative. > > The point is that you're making a case because you happen to know of an > extension which does this, but neither of us know about every extension > out there and I, at least, don't believe we should be coming up with > hacks to try and avoid looking at files that some extension has dumped > into the PG tablespace directories because that's never been a > documented or supported thing to do. It's not a hack if it's a quite defendable choice on its own, and the presense of such extensions is just one further argument in one direction (for explicit whitelisting here). > > > > > > And I fail to see why a blacklist is architecturally better. There's > > > > > > plenty cases where we might want to create temporary files, > > > > > > non-checksummed data or such that we'd need a teach a blacklist > > > > > > about, > > > > > > but there's far fewer cases where add new checksummed files. > > > > > > Basically > > > > > > never since checksums have been introduced. And if checksums were > > > > > > introduced for something new, say slrus, we'd ceertainly use > > > > > > pg_verify_checksum during development. > > > > > > > > > > In cases where we need something temporary, we're going to need to be > > > > > prepared to clean those up and we had better make it very plain that > > > > > they're temporary and easy to write code for. Anything we aren't > > > > > prepared to blow away on a crash-restart should be checksum'd and in > > > > > an > > > > > ideal world, we'd be looking to reduce the blacklist to only those > > > > > things which are temporary. > > > > > > > > There's pending patches that add support for pg_verify_checksums running > > > > against a running cluster. We'll not just need to recognize files that > > > > are there after a graceful shutdown, but with anything that a cluster > > > > can have there while running. > > > > > > Of course- the same is true with a crash/restart case, so I'm not sure > > > what you're getting at here. > > > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm > > not sure it'd make sense to teach it to so (there's not really much it > > could do to discern whether a block is a torn page that replay will fix, > > or corrupted). > > ... and this isn't at all relevant, because pg_basebackup does run on a > running cluster. I wasn't ever denying that or anything close to it? My point is that pg_verify_checksum needs much more filtering than it has now to deal with that, because it needs to handle all files that could be present, not just files that could be present after a graceful shutdown. pg_basebackup's case is *not* really comparable because basebackup.c already performed a lot of filtering before noChecksumFiles is applied. > > > I wasn't ever thinking of only the graceful shutdown case- certainly > > > pg_basebackup operates when the cluster is running also and that's > > > more what I was thinking about from the start and which is part of > > > what started the discussion about this commit as that was entirely > > > ignored in this change. > > > > It was mainly ignored in the original pg_verify_checksums change, no
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 14:47:51 -0400, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > Oh, for crying out loud. Yes, obviously you can create conflicts, nobody > > > ever doubted that. How on earth is that a useful point for anything? The > > > point isn't that it's a good idea for extensions to do so, but that it > > > has happened because we didn't provide better alternative. > > > > The point is that you're making a case because you happen to know of an > > extension which does this, but neither of us know about every extension > > out there and I, at least, don't believe we should be coming up with > > hacks to try and avoid looking at files that some extension has dumped > > into the PG tablespace directories because that's never been a > > documented or supported thing to do. > > It's not a hack if it's a quite defendable choice on its own, and the > presense of such extensions is just one further argument in one > direction (for explicit whitelisting here). You've not convinced me that an extension dropping files into our tablespace directories is either something we should accept or that we should code around such cases, nor that we are doing anything but putting a hack in place to deal with what is pretty clearly a hack in its own right, imv. > > > > Of course- the same is true with a crash/restart case, so I'm not sure > > > > what you're getting at here. > > > > > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm > > > not sure it'd make sense to teach it to so (there's not really much it > > > could do to discern whether a block is a torn page that replay will fix, > > > or corrupted). > > > > ... and this isn't at all relevant, because pg_basebackup does run on a > > running cluster. > > I wasn't ever denying that or anything close to it? My point is that > pg_verify_checksum needs much more filtering than it has now to deal > with that, because it needs to handle all files that could be present, > not just files that could be present after a graceful shutdown. Perhaps it doesn't today but surely one goal of pg_verify_checksum is to be able to run it on a running cluster eventually. > pg_basebackup's case is *not* really comparable because basebackup.c > already performed a lot of filtering before noChecksumFiles is applied. This all really just points out that we should have the code for handling this somewhere common that both pg_verify_checksum and pg_basebackup can leverage without duplicating all of it. The specific case that started all of this certainly looks pretty clearly like a case that both need to deal with. > > As pointed out elsewhere on this thread, the logic was the same between > > the two before this commit... The code in pg_basebackup came from the > > prior pg_verify_checksums code. Certainly, some mention of the code > > existing in two places, at least, should have been in the comments. > > But the filter for basebackup only comes after the pre-existing > filtering that the basebackup.c code already does. All of: > > /* > * List of files excluded from backups. > */ > static const char *excludeFiles[] = > { > /* Skip auto conf temporary file. */ > PG_AUTOCONF_FILENAME ".tmp", > > /* Skip current log file temporary file */ > LOG_METAINFO_DATAFILE_TMP, > > /* Skip relation cache because it is rebuilt on startup */ > RELCACHE_INIT_FILENAME, > > /* >* If there's a backup_label or tablespace_map file, it belongs to a >* backup started by the user with pg_start_backup(). It is *not* > correct >* for this backup. Our backup_label/tablespace_map is injected into > the >* tar separately. >*/ > BACKUP_LABEL_FILE, > TABLESPACE_MAP, > > "postmaster.pid", > "postmaster.opts", > > /* end of list */ > NULL > }; > > is not applied, for example. Nor is: > > /* Skip temporary files */ > if (strncmp(de->d_name, > PG_TEMP_FILE_PREFIX, > strlen(PG_TEMP_FILE_PREFIX)) == 0) > continue; > > Nor is > /* Exclude temporary relations */ > if (isDbDir && looks_like_temp_rel_name(de->d_name)) > { > elog(DEBUG2, >"temporary relation file \"%s\" excluded from > backup", >de->d_name); > > continue; > } > > So, yea, they had: > > static const char *const skip[] = { > "pg_control", > "pg_filenode.map", > "pg_internal.init", > "PG_VERSION", > NULL, > }; > > in common, but not more. All the above exclusions are strictly > necessary. ... but all of that code doesn't change that pg_basebackup also needs to ignore the EXEC_BACKEND created confi
Re: pgsql: Add TAP tests for pg_verify_checksums
On 2018-10-19 15:57:28 -0400, Stephen Frost wrote: > > > > > Of course- the same is true with a crash/restart case, so I'm not sure > > > > > what you're getting at here. > > > > > > > > pg_verify_checksum doesn't support running on a crashed cluster, and I'm > > > > not sure it'd make sense to teach it to so (there's not really much it > > > > could do to discern whether a block is a torn page that replay will fix, > > > > or corrupted). > > > > > > ... and this isn't at all relevant, because pg_basebackup does run on a > > > running cluster. > > > > I wasn't ever denying that or anything close to it? My point is that > > pg_verify_checksum needs much more filtering than it has now to deal > > with that, because it needs to handle all files that could be present, > > not just files that could be present after a graceful shutdown. > > Perhaps it doesn't today but surely one goal of pg_verify_checksum is to > be able to run it on a running cluster eventually. I was saying *precisely* that above. I give up. > > pg_basebackup's case is *not* really comparable because basebackup.c > > already performed a lot of filtering before noChecksumFiles is applied. > > This all really just points out that we should have the code for > handling this somewhere common that both pg_verify_checksum and > pg_basebackup can leverage without duplicating all of it. I never argued against that. My point is that your argument that they started out the same isn't true. > The specific case that started all of this certainly looks pretty > clearly like a case that both need to deal with. Yep. > > > As pointed out elsewhere on this thread, the logic was the same between > > > the two before this commit... The code in pg_basebackup came from the > > > prior pg_verify_checksums code. Certainly, some mention of the code > > > existing in two places, at least, should have been in the comments. > > > > But the filter for basebackup only comes after the pre-existing > > filtering that the basebackup.c code already does. All of: > > > > /* > > * List of files excluded from backups. > > */ > > static const char *excludeFiles[] = > > { > > /* Skip auto conf temporary file. */ > > PG_AUTOCONF_FILENAME ".tmp", > > > > /* Skip current log file temporary file */ > > LOG_METAINFO_DATAFILE_TMP, > > > > /* Skip relation cache because it is rebuilt on startup */ > > RELCACHE_INIT_FILENAME, > > > > /* > > * If there's a backup_label or tablespace_map file, it belongs to a > > * backup started by the user with pg_start_backup(). It is *not* > > correct > > * for this backup. Our backup_label/tablespace_map is injected into > > the > > * tar separately. > > */ > > BACKUP_LABEL_FILE, > > TABLESPACE_MAP, > > > > "postmaster.pid", > > "postmaster.opts", > > > > /* end of list */ > > NULL > > }; > > > > is not applied, for example. Nor is: > > > > /* Skip temporary files */ > > if (strncmp(de->d_name, > > PG_TEMP_FILE_PREFIX, > > strlen(PG_TEMP_FILE_PREFIX)) == 0) > > continue; > > > > Nor is > > /* Exclude temporary relations */ > > if (isDbDir && looks_like_temp_rel_name(de->d_name)) > > { > > elog(DEBUG2, > > "temporary relation file \"%s\" excluded from > > backup", > > de->d_name); > > > > continue; > > } > > > > So, yea, they had: > > > > static const char *const skip[] = { > > "pg_control", > > "pg_filenode.map", > > "pg_internal.init", > > "PG_VERSION", > > NULL, > > }; > > > > in common, but not more. All the above exclusions are strictly > > necessary. > > ... but all of that code doesn't change that pg_basebackup also needs to > ignore the EXEC_BACKEND created config_exec_params/.new files. Right. > You're right, pg_verify_checksums, with the assumption that it only runs > on a cleanly shut-down cluster, doesn't need the temp-file-skipping > logic, today, but it's going to need it tomorrow, isn't it? No, it needs it today, as explain below in the email you're replaying to. > > FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a > > lot of (unfortunately) pretty normal scenarios right now. Not just on > > windows. Besides the narrow window of crashing while a .tmp file is > > present (and then shutting down normally after a crash restart), it also > > has the much of wider window of crashing while *any* backend has > > temporary files/relations. As crash-restarts don't release temp files, > > they'll still be present after the crash, and a single graceful shutdown > > afterwards will leave them in place. No? > > We do go through and do some cleanup at crash/restart We don't clean up temp files tho. * NOTE: we could, but do
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 15:57:28 -0400, Stephen Frost wrote: > > Perhaps it doesn't today but surely one goal of pg_verify_checksum is to > > be able to run it on a running cluster eventually. > > I was saying *precisely* that above. I give up. I'm glad we agree on that (I thought we did before, in fact, so it was odd to see it coming up again). > > > pg_basebackup's case is *not* really comparable because basebackup.c > > > already performed a lot of filtering before noChecksumFiles is applied. > > > > This all really just points out that we should have the code for > > handling this somewhere common that both pg_verify_checksum and > > pg_basebackup can leverage without duplicating all of it. > > I never argued against that. My point is that your argument that they > started out the same isn't true. I overstated the relationship, clearly, but it hardly matters, as you say.. > > The specific case that started all of this certainly looks pretty > > clearly like a case that both need to deal with. > > Yep. ... and it's in the part of the code that was actually copied and was the same. > > > FWIW, as far as I can tell, pg_verify_checksum appears to be broken in a > > > lot of (unfortunately) pretty normal scenarios right now. Not just on > > > windows. Besides the narrow window of crashing while a .tmp file is > > > present (and then shutting down normally after a crash restart), it also > > > has the much of wider window of crashing while *any* backend has > > > temporary files/relations. As crash-restarts don't release temp files, > > > they'll still be present after the crash, and a single graceful shutdown > > > afterwards will leave them in place. No? > > > > We do go through and do some cleanup at crash/restart > > We don't clean up temp files tho. > > * NOTE: we could, but don't, call this during a post-backend-crash restart > * cycle. The argument for not doing it is that someone might want to examine > * the temp files for debugging purposes. This does however mean that > * OpenTemporaryFile had better allow for collision with an existing temp > * file name. Then, yes, we should go through and fix pg_verify_checksums to work correctly in this case, likely using more-or-less the same code that pg_basebackup has. After that, we can add the remaining code to check the last checkpoint and skip pages which have a more recent LSN... > > As you say, some of those are likely to have checksums, which should be > > handled by pg_basebackup and pg_verify_checksums, and that goes back to > > the point I was making up-thread that we want to make sure an account > > for everything. With this pattern-based approach, we could easily end > > up forgetting to add the correct new pattern into pg_verify_checksums. > > If you're adding checksums for something, you better test it I don't buy > this. In contrast it's much more likely that there's a file that's > short-lived that you won't easily test against pg_verify_checksum > running in that moment. The only justification for *not* doing this is that some extension author might have dumped files into our tablespace directory, something we've never claimed to support nor generally worried about in all the time that I can recall before this. As for saying that someone is obviously going to use pg_verify_checksums to check their checksum code- I simply don't agree that we should be happy to rely on that assumption when the only reason for any of this is that some extension decided to do something that wasn't supported and likely has issues in other parts of the code anyway (pg_basebackup would happily copy those files too, even though there's obviously no code for making sure it's a consistent copy or any such in pg_basebackup or core). > > Playing this further and assuming that extensions dropping files into > > tablespace directories is acceptable, what are we supposed to do when > > there's some direct conflict between a file that we want to create in a > > PG tablespace directory and a file that some extension dropped there? > > Create yet another subdirectory which we call > > "THIS_IS_REALLY_ONLY_FOR_PG"? > > Then it's a buggy extension. And we error out. Extensions writing into directories they shouldn't be makes them buggy even if the core code isn't likely to write to a particular filename, imv. > > What about two different extensions wanting to create files with the > > same names in the tablespace directories..? > > > > Experimentation is fine, of course, this is open source code and people > > should feel free to play with it, but we are not obligated to avoid > > breaking things when an extension author, through their experimentation, > > does something which is clearly not supported, like dropping files into > > PG's tablespace directories. Further, when it comes to our user's data, > > we should be taking a strict approach and accounting for everything, > > something that t
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-19 16:35:46 -0400, Stephen Frost wrote: > The only justification for *not* doing this is that some extension > author might have dumped files into our tablespace directory, something > we've never claimed to support nor generally worried about in all the > time that I can recall before this. No, it's really not the only reason. As I said, testing is much less likely to find cases where we're checksumming a short-lived file even though we shouldn't, than making sure that checksummed files are actually checksummed. > > > Playing this further and assuming that extensions dropping files into > > > tablespace directories is acceptable, what are we supposed to do when > > > there's some direct conflict between a file that we want to create in a > > > PG tablespace directory and a file that some extension dropped there? > > > Create yet another subdirectory which we call > > > "THIS_IS_REALLY_ONLY_FOR_PG"? > > > > Then it's a buggy extension. And we error out. > > Extensions writing into directories they shouldn't be makes them buggy > even if the core code isn't likely to write to a particular filename, > imv. I'll just stop talking to you about this for now. > > > What about two different extensions wanting to create files with the > > > same names in the tablespace directories..? > > > > > > Experimentation is fine, of course, this is open source code and people > > > should feel free to play with it, but we are not obligated to avoid > > > breaking things when an extension author, through their experimentation, > > > does something which is clearly not supported, like dropping files into > > > PG's tablespace directories. Further, when it comes to our user's data, > > > we should be taking a strict approach and accounting for everything, > > > something that this whitelist-of-patterns-based approach to finding > > > files to verify the checksums on doesn't do. > > > > It's not economically feasible to work on extensions that will only be > > usable a year down the road. > > I listed out multiple other solutions to this problem which you > summarily ignored. Except that none of them really achieves the goals you can achieve by having the files in the database (like DROP DATABASE working, for starters). > It's unfortunate that those other solutions weren't used and that, > instead, this extension decided to drop files into PG's tablespace > directories, but that doesn't mean we should condone or implicitly > support that action. This just seems pointlessly rigid. Our extension related APIs aren't generally very good or well documented. Most non-trivial extensions that add interesting things to the postgres ecosystem are going to need a few not so pretty things to get going. > I stand by my position that this patch should be reverted (with no > offense or ill-will towards Michael, of course, I certainly appreciate > his efforts to address the issues with pg_verify_checksums) and that we > should move more of this code to identify files to verify the checksums > on into libpgcommon and then use it from both pg_basebackup and > pg_verify_checksums, to the extent possible, but that we make sure to > account for all of the files in our tablespace and database directories. What does that do, except break things that currently work? Sure, work on a patch that fixes the architectural concerns, but what's the point in reverting it until that's done? > To the extent that this is an issue for extension authors, perhaps it > would encourage them to work with us to have supported mechanisms > instead of using hacks like dropping files into our tablespace > directories and such instead of using another approach to manage files > across multiple filesystems. I'd be kind of surprised if they really > had an issue doing that and hopefully everyone would feel better about > what these extensions are doing once they start using a mechanism that > we actually support. Haribabu has a patch, but it's on-top of pluggable storage, so not exactly a small prerequisite. Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 16:35:46 -0400, Stephen Frost wrote: > > The only justification for *not* doing this is that some extension > > author might have dumped files into our tablespace directory, something > > we've never claimed to support nor generally worried about in all the > > time that I can recall before this. > > No, it's really not the only reason. As I said, testing is much less > likely to find cases where we're checksumming a short-lived file even > though we shouldn't, than making sure that checksummed files are > actually checksummed. While I agree that we might possibly end up trying to checksum a short-lived and poorly-named file, I'd rather that than risk missing the checking of a file which does have checksums that should have been checked and claiming that we checked all of the checksums. > > > > What about two different extensions wanting to create files with the > > > > same names in the tablespace directories..? > > > > > > > > Experimentation is fine, of course, this is open source code and people > > > > should feel free to play with it, but we are not obligated to avoid > > > > breaking things when an extension author, through their experimentation, > > > > does something which is clearly not supported, like dropping files into > > > > PG's tablespace directories. Further, when it comes to our user's data, > > > > we should be taking a strict approach and accounting for everything, > > > > something that this whitelist-of-patterns-based approach to finding > > > > files to verify the checksums on doesn't do. > > > > > > It's not economically feasible to work on extensions that will only be > > > usable a year down the road. > > > > I listed out multiple other solutions to this problem which you > > summarily ignored. > > Except that none of them really achieves the goals you can achieve by > having the files in the database (like DROP DATABASE working, for > starters). The only reason things like DROP DATABASE "work" in this example case is exactly that it assumes that all of the files which exist are ones which we put there. Or, to put it another way, if we're only going to checksum files based on some whitelist of files we expect to be there, shouldn't we go back and make things like DROP DATABASE only remove those files that we expect to be there and not random other files that we have no knowledge of..? > > It's unfortunate that those other solutions weren't used and that, > > instead, this extension decided to drop files into PG's tablespace > > directories, but that doesn't mean we should condone or implicitly > > support that action. > > This just seems pointlessly rigid. Our extension related APIs aren't > generally very good or well documented. Most non-trivial extensions that > add interesting things to the postgres ecosystem are going to need a few > not so pretty things to get going. PostGIS is a fantastic example of an extension that is far from trivial, extends PG in ways which are clearly supported, and has worked with PG to improve things in core to be able to then use in the extension. > > I stand by my position that this patch should be reverted (with no > > offense or ill-will towards Michael, of course, I certainly appreciate > > his efforts to address the issues with pg_verify_checksums) and that we > > should move more of this code to identify files to verify the checksums > > on into libpgcommon and then use it from both pg_basebackup and > > pg_verify_checksums, to the extent possible, but that we make sure to > > account for all of the files in our tablespace and database directories. > > What does that do, except break things that currently work? Sure, work > on a patch that fixes the architectural concerns, but what's the point > in reverting it until that's done? As you pointed out previously, the current code *doesn't* work, before or after this change, and we clearly need to rework this to move things into libpgcommon and also fix pg_basebackup. Reverting this would at least get us back to having similar code between this and pg_basebackup, and then it'll be cleaner and clearer to have one patch which moves that similar logic into libpgcommon and fixes the missing exceptions for the EXEC_BACKEND case. Keeping the patch doesn't do anything for the pg_basebackup case, and confuses the issue by having these filename-pattern-whitelists which weren't there before and that should be removed, imv. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-19 17:32:58 -0400, Stephen Frost wrote: > As you pointed out previously, the current code *doesn't* work, before > or after this change, and we clearly need to rework this to move things > into libpgcommon and also fix pg_basebackup. Reverting this would at > least get us back to having similar code between this and pg_basebackup, > and then it'll be cleaner and clearer to have one patch which moves that > similar logic into libpgcommon and fixes the missing exceptions for the > EXEC_BACKEND case. > > Keeping the patch doesn't do anything for the pg_basebackup case, and > confuses the issue by having these filename-pattern-whitelists which > weren't there before and that should be removed, imv. The current state has pg_verify_checksum work on windows for casual usage, which includes the regression tests that previously were broken. Whereas reverting it has the advantage that a fixup diff would be a few lines smaller. If you want to push a revert together with a fix, be my guest, but until that time it seems unhelpful to revert. Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2018-10-19 17:32:58 -0400, Stephen Frost wrote: > > As you pointed out previously, the current code *doesn't* work, before > > or after this change, and we clearly need to rework this to move things > > into libpgcommon and also fix pg_basebackup. Reverting this would at > > least get us back to having similar code between this and pg_basebackup, > > and then it'll be cleaner and clearer to have one patch which moves that > > similar logic into libpgcommon and fixes the missing exceptions for the > > EXEC_BACKEND case. > > > > Keeping the patch doesn't do anything for the pg_basebackup case, and > > confuses the issue by having these filename-pattern-whitelists which > > weren't there before and that should be removed, imv. > > The current state has pg_verify_checksum work on windows for casual > usage, which includes the regression tests that previously were broken. > Whereas reverting it has the advantage that a fixup diff would be a few > lines smaller. If you want to push a revert together with a fix, be my > guest, but until that time it seems unhelpful to revert. Clearly, pg_basebackup *doesn't* work though, so it's at best only half a fix and only because our regression tests don't cover the pg_basebackup case (which is a sad state of affairs, to say the least, but an independent issue). I'm not in any specific rush on this and I hope to hear Michael's thoughts once he's had a chance to review our discussion; it's his commit, after all. Michael's initial patch is in the archives also, so my suggestion would be to revert this one and then take Michael's prior patch and add to it the fix of pg_basebackup, in the same manner, and commit those changes together with additional comments to note that the code exists in both places. We could then work on moving the logic into libpgcommon, in master. Thanks! Stephen signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
On 10/19/2018 05:32 PM, Stephen Frost wrote: As you pointed out previously, the current code *doesn't* work, before or after this change, and we clearly need to rework this to move things into libpgcommon and also fix pg_basebackup. Reverting this would at least get us back to having similar code between this and pg_basebackup, and then it'll be cleaner and clearer to have one patch which moves that similar logic into libpgcommon and fixes the missing exceptions for the EXEC_BACKEND case. Keeping the patch doesn't do anything for the pg_basebackup case, and confuses the issue by having these filename-pattern-whitelists which weren't there before and that should be removed, imv. I don't think just reverting it is really acceptable. The patch was a response to buildfarm breakage, and moreover was published and discussed before it was applied. If you don't like it I think you need to publish a better solution that will not involve reintroducing the buildfarm error. I don't have a strong opinion about the mechanism. The current conversation does seem to me to be generating more heat than light, TBH. But I do have a strong opinion about not having to enable/disable the TAP test in question constantly. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Add TAP tests for pg_verify_checksums
Andrew Dunstan writes: > I don't think just reverting it is really acceptable. +several. I do not mind somebody writing and installing a better fix. I do object to turning the buildfarm red again. regards, tom lane
Contribution to postgresql
Hi, I am using postgresql at work, and I would like to contribute. >From the todo list, I chose: Allow log_min_messages to be specified on a per-module basis Is this feature always wanted ? That would be my first contribution to postgresql, is it an easy one ? Thanks, Saïd
Re: WAL archive (archive_mode = always) ?
On Fri, Oct 19, 2018 at 10:00 AM Adelino Silva < adelino.j.si...@googlemail.com> wrote: > Hi, > > What is the advantage to use archive_mode = always in a slave server > compared to archive_mode = on (shared WAL archive) ? > I only see duplication of Wal files, what is the purpose of this feature ? > You might not have a shared wal archive in the first place. For example, your only access to the master is through the streaming replication protocol, but you want to maintain a local WAL archive anyway so you can PITR locally for testing or debugging purposes. Cheers, Jeff
vacuum fails with "could not open statistics file" "Device or resource busy"
Hi, buildfarm member lorikeet had an interesting buildfarm failure in the logical decoding test. The failure looks unrelated to logical decoding, but might be more widely relevant: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2018-10-19%2011%3A22%3A34 VACUUM FULL pg_class; + WARNING: could not open statistics file "pg_stat_tmp/global.stat": Device or resource busy Now that animal is cygwin based, so maybe it's just some random weirdness. But ISTM it could also indicate a bug of some sort. Were it native windows, I'd assume we'd keep a file open with the wrong flags or such... Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
On Fri, Oct 19, 2018 at 06:43:18PM -0400, Tom Lane wrote: > Andrew Dunstan writes: >> I don't think just reverting it is really acceptable. > > +several. I do not mind somebody writing and installing a better fix. > I do object to turning the buildfarm red again. I did not expect this thread to turn into a war zone. Anyway, there are a couple of things I agree with on this thread: - I agree with Andres point here: https://postgr.es/m/20181019171747.4uithw2sjkt6m...@alap3.anarazel.de A blacklist is fundamentally more difficult to maintain as there are way more things added in a data folder which do not have data checksums than things which have checksums. So using a blacklist approach looks unmaintainable in the long term. Future patches related to enabling online checksum verification make me worry if we keep the code like that. I can also easily imagine that anybody willing to use the pluggable storage API would like to put new files in tablespace-related data folders, relying on "base/" being the default system tablespace - I agree with Stephen's point that we should decide if a file has checksums or not in a single place, and that we should use the same logic for base backups and pg_verify_checksums. - I agree with not doing a simple revert to not turn the buildfarm red again. This is annoying for animal maintainers. Andrew has done a very nice work in disabling manually those tests temporarily. - The base backup logic deciding if a file has checksums looks broken to me: it misses files generated by EXEC_BACKEND, and any instance of Postgres using an extension with custom files and data checksums has its backups broken. cstore_fdw has been mentioned above, and I recall that Heroku for example enables data checksums. If you combine both, it basically means that such an instance cannot take base backups anymore while it was able to do so with pre-10 with default options. That's not cool. So what I think we ought to do is the following: - Start a new thread, this one about TAP tests is not adapted. - Add in src/common/relpath.c the API from d55241af called isRelFileName(), make use of it in the base backup code, and basically remove is_checksummed_file() and the checksum skip list. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-20 12:39:55 +0900, Michael Paquier wrote: > - I agree with Stephen's point that we should decide if a file has > checksums or not in a single place, and that we should use the same > logic for base backups and pg_verify_checksums. To be clear, I wholeheartedly agree on that. We probably only want to tackle that for "is checksummable file" in a backpatchable fashion, but we probably should move to having more common infrastructure like that. > So what I think we ought to do is the following: > - Start a new thread, this one about TAP tests is not adapted. > - Add in src/common/relpath.c the API from d55241af called > isRelFileName(), make use of it in the base backup code, and basically > remove is_checksummed_file() and the checksum skip list. I think it probably shouldn't quite be that as an API. The code should not just check whether the file matches a pattern, but also importantly needs to exclude files that are in a temp tablespace. isRelFileName() doesn't quite describe an API meaning that. I assume we should keep something like isRelFileName() but use an API ontop of that that also exclude temp files / relations. Greetings, Andres Freund Date: Fri, 19 Oct 2018 20:48:02 -0700 Message-ID: <87in1xmg7h@alap4.lan>
Re: Function to promote standby servers
I wrote: > Fixed. Here is another version, with a fix in pg_proc.dat, an improved comment and "wait_seconds" exercised in the regression test. Yours, Laurenz Albe From a2a7f9fd1b23ad102d11992b22158dab8b5451d5 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Sat, 20 Oct 2018 06:21:00 +0200 Subject: [PATCH] Add pg_promote() to promote standby servers --- doc/src/sgml/func.sgml | 21 ++ doc/src/sgml/high-availability.sgml| 2 +- doc/src/sgml/recovery-config.sgml | 3 +- src/backend/access/transam/xlog.c | 6 -- src/backend/access/transam/xlogfuncs.c | 83 ++ src/backend/catalog/system_views.sql | 8 +++ src/backend/postmaster/pgstat.c| 3 + src/include/access/xlog.h | 6 ++ src/include/catalog/pg_proc.dat| 4 ++ src/include/pgstat.h | 3 +- src/test/recovery/t/004_timeline_switch.pl | 6 +- src/test/recovery/t/009_twophase.pl| 6 +- 12 files changed, 139 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5193df3366..88121cdc66 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18731,6 +18731,9 @@ SELECT set_config('log_statement_stats', 'off', false); pg_terminate_backend + +pg_promote + signal @@ -18790,6 +18793,16 @@ SELECT set_config('log_statement_stats', 'off', false); however only superusers can terminate superuser backends. + + +pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60) + + boolean + Promote a physical standby server. This function is restricted to +superusers by default, but other users can be granted EXECUTE to run +the function. + + @@ -18827,6 +18840,14 @@ SELECT set_config('log_statement_stats', 'off', false); subprocess. + +pg_promote can only be called on standby servers. +If the argument wait is true, +the function waits until promotion is complete or wait_seconds +seconds have passed, otherwise the function returns immediately after sending +the promotion signal to the postmaster. + + diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index ebcb3daaed..f8e036965c 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1472,7 +1472,7 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)' To trigger failover of a log-shipping standby server, -run pg_ctl promote or create a trigger +run pg_ctl promote, call pg_promote(), or create a trigger file with the file name and path specified by the trigger_file setting in recovery.conf. If you're planning to use pg_ctl promote to fail over, trigger_file is diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 92825fdf19..d06cd0b08e 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -439,7 +439,8 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows Specifies a trigger file whose presence ends recovery in the standby. Even if this value is not set, you can still promote - the standby using pg_ctl promote. + the standby using pg_ctl promote or calling + pg_promote(). This setting has no effect if standby_mode is off. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7375a78ffc..62fc418893 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -78,12 +78,6 @@ extern uint32 bootstrap_data_checksum_version; -/* File path names (all relative to $PGDATA) */ -#define RECOVERY_COMMAND_FILE "recovery.conf" -#define RECOVERY_COMMAND_DONE "recovery.done" -#define PROMOTE_SIGNAL_FILE "promote" -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote" - /* User-settable parameters */ int max_wal_size_mb = 1024; /* 1 GB */ diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 9731742978..e97d1b63bc 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -23,6 +23,7 @@ #include "catalog/pg_type.h" #include "funcapi.h" #include "miscadmin.h" +#include "pgstat.h" #include "replication/walreceiver.h" #include "storage/smgr.h" #include "utils/builtins.h" @@ -35,6 +36,7 @@ #include "storage/fd.h" #include "storage/ipc.h" +#include /* * Store label file and tablespace map during non-exclusive backups. @@ -697,3 +699,84 @@ pg_backup_start_time(PG_FUNCTION_ARGS) PG_RETURN_DATUM(xtime); } + +/* + * Promote a standby server. + * + * A result of "true" means that promotion has been completed + * (or initiated if "wait" is false). + */ +Datum +pg_prom
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Fri, Oct 19, 2018 at 23:40 Michael Paquier wrote: > On Fri, Oct 19, 2018 at 06:43:18PM -0400, Tom Lane wrote: > > Andrew Dunstan writes: > >> I don't think just reverting it is really acceptable. > > > > +several. I do not mind somebody writing and installing a better fix. > > I do object to turning the buildfarm red again. > > I did not expect this thread to turn into a war zone. Anyway, there are > a couple of things I agree with on this thread: > - I agree with Andres point here: > https://postgr.es/m/20181019171747.4uithw2sjkt6m...@alap3.anarazel.de > A blacklist is fundamentally more difficult to maintain as there are > way more things added in a data folder which do not have data checksums > than things which have checksums. So using a blacklist approach looks > unmaintainable in the long term. Future patches related to enabling > online checksum verification make me worry if we keep the code like > that. I can also easily imagine that anybody willing to use the > pluggable storage API would like to put new files in tablespace-related > data folders, relying on "base/" being the default system tablespace > If we are going to decide that we only deal with files in our directories matching these whitelisted patterns, then shouldn’t we apply similar logic in things like DROP DATABASE and any other cases where we perform actions in a recursive manner across our database and table space directories? Should we really be removing arbitrary files that we know nothing about, after all? What about pg_basebackup? Shall we update it to only stream through files matching these patterns as those are the only files we consider ourselves to be aware of? I don’t buy off on any argument that presents pluggable storage as not including some way for us to track and be aware of what files are associated with that pluggable storage mechanism and which of those files have checksums and how to verify them if they do. In other words, I sure hope we don’t accept an approach like cstore *fdw* uses for pluggable storage where the core system has no idea whatsoever about what these random files dropped into our tablespace directories are. - I agree with Stephen's point that we should decide if a file has > checksums or not in a single place, and that we should use the same > logic for base backups and pg_verify_checksums. Despite it being a lot of the discussion, I don’t think there was ever disagreement on this point. - I agree with not doing a simple revert to not turn the buildfarm red > again. This is annoying for animal maintainers. Andrew has done a very > nice work in disabling manually those tests temporarily. This is a red herring, and always was, so I’m rather unimpressed at how it keeps coming up- no, I’m not advocating that we should just make the build farm red and just leave it that way. Yes, we should fix this case, and fix pg_basebackup, and maybe even try to add some regression tests which test this exact same case in pg_basebackup, but making the build farm green is *not* the only thing we should care about. - The base backup logic deciding if a file has checksums looks broken to > me: it misses files generated by EXEC_BACKEND, and any instance of > Postgres using an extension with custom files and data checksums has its > backups broken. cstore_fdw has been mentioned above, and I recall that > Heroku for example enables data checksums. If you combine both, it > basically means that such an instance cannot take base backups anymore > while it was able to do so with pre-10 with default options. That's not > cool. This is incorrect, pg_basebackup will still back up and keep files which fail checksum checks- logic which David Steele and I pushed for when the checksumming logic was added to pg_basebackup, as I recall, but it’ll complain and warn about these files ... As it *should*, even if we weren’t doing checksum checks, because these are random files that have been dropped into a PG directory that PG doesn’t know anything about, which aren’t handled through WAL and therefore there’s no way to know if they’ll be at all valid when they’re copied, or that backing them up is at all useful. Backups are absolutely important and we wouldn’t want a backup to be aborted or failed due to checksum failures, in general, but the user should be made aware of these failures. This approach of only taking responsibility for the files we know of through these patterns could also imply that we shouldn’t back up in pg_basebackup files that don’t match- but that strikes me as another similarly risky approach as any mistake in this whitelist of known files could then result in an inconsistent backup that’s missing things that should have been included. As for which approach is easier to maintain, I don’t see one as being meaningfully much easier to maintain than the other, code wise, while having these patterns looks to me as having a lot more risk, with the only rather nebulous gain be
Re: pgsql: Add TAP tests for pg_verify_checksums
On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote: > On 2018-10-20 12:39:55 +0900, Michael Paquier wrote: >> So what I think we ought to do is the following: >> - Start a new thread, this one about TAP tests is not adapted. >> - Add in src/common/relpath.c the API from d55241af called >> isRelFileName(), make use of it in the base backup code, and basically >> remove is_checksummed_file() and the checksum skip list. > > I think it probably shouldn't quite be that as an API. The code should > not just check whether the file matches a pattern, but also importantly > needs to exclude files that are in a temp tablespace. isRelFileName() > doesn't quite describe an API meaning that. I assume we should keep > something like isRelFileName() but use an API ontop of that that also > exclude temp files / relations. From what I can see we would need to check for a couple of patterns if we go to this extent: - Look for PG_TEMP_FILE_PREFIX and exclude those. - looks_like_temp_rel_name() which checks for temp files names. This is similar to isRelFileName except that it works on temporary files. Moving it to relpath.c and renaming it IsTempRelFileName is an idea. But this one would not be necessary as isRelFileName discards temporary relations, no? - parse_filename_for_nontemp_relation() is also too similar to isRelFileName(). At the end, do we really need to do anything more than adding some checks on PG_TEMP_FILE_PREFIX? I am not sure that there is much need for a global API like isChecksummedFile for only those two places. I have already a patch doing the work of moving isRelFileName() into src/common/. Adding one check on PG_TEMP_FILE_PREFIX is not much work on top of it. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-20 13:30:46 +0900, Michael Paquier wrote: > On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote: > > On 2018-10-20 12:39:55 +0900, Michael Paquier wrote: > >> So what I think we ought to do is the following: > >> - Start a new thread, this one about TAP tests is not adapted. > >> - Add in src/common/relpath.c the API from d55241af called > >> isRelFileName(), make use of it in the base backup code, and basically > >> remove is_checksummed_file() and the checksum skip list. > > > > I think it probably shouldn't quite be that as an API. The code should > > not just check whether the file matches a pattern, but also importantly > > needs to exclude files that are in a temp tablespace. isRelFileName() > > doesn't quite describe an API meaning that. I assume we should keep > > something like isRelFileName() but use an API ontop of that that also > > exclude temp files / relations. > > From what I can see we would need to check for a couple of patterns if > we go to this extent: > - Look for PG_TEMP_FILE_PREFIX and exclude those. > - looks_like_temp_rel_name() which checks for temp files names. This is > similar to isRelFileName except that it works on temporary files. > Moving it to relpath.c and renaming it IsTempRelFileName is an idea. > But this one would not be necessary as isRelFileName discards temporary > relations, no? I think it's not good to have those necessary intermingled in an exposed function. > At the end, do we really need to do anything more than adding some > checks on PG_TEMP_FILE_PREFIX? I think we also need the exclusion list basebackup.c has that generally skips files. They might be excluded anyway, but I think it'd be safer to make sure. > I am not sure that there is much need for a global API like > isChecksummedFile for only those two places. Seems likely that other tools would want to have access too. Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Sat, Oct 20, 2018 at 00:31 Michael Paquier wrote: > On Fri, Oct 19, 2018 at 08:50:04PM -0700, Andres Freund wrote: > > On 2018-10-20 12:39:55 +0900, Michael Paquier wrote: > >> So what I think we ought to do is the following: > >> - Start a new thread, this one about TAP tests is not adapted. > >> - Add in src/common/relpath.c the API from d55241af called > >> isRelFileName(), make use of it in the base backup code, and basically > >> remove is_checksummed_file() and the checksum skip list. > > > > I think it probably shouldn't quite be that as an API. The code should > > not just check whether the file matches a pattern, but also importantly > > needs to exclude files that are in a temp tablespace. isRelFileName() > > doesn't quite describe an API meaning that. I assume we should keep > > something like isRelFileName() but use an API ontop of that that also > > exclude temp files / relations. > > From what I can see we would need to check for a couple of patterns if > we go to this extent: > - Look for PG_TEMP_FILE_PREFIX and exclude those. > - looks_like_temp_rel_name() which checks for temp files names. This is > similar to isRelFileName except that it works on temporary files. > Moving it to relpath.c and renaming it IsTempRelFileName is an idea. > But this one would not be necessary as isRelFileName discards temporary > relations, no? > - parse_filename_for_nontemp_relation() is also too similar to > isRelFileName(). > > At the end, do we really need to do anything more than adding some > checks on PG_TEMP_FILE_PREFIX? I am not sure that there is much need > for a global API like isChecksummedFile for only those two places. I > have already a patch doing the work of moving isRelFileName() into > src/common/. Adding one check on PG_TEMP_FILE_PREFIX is not much work > on top of it. I’m not at my computer at the moment so may not be entirely following the question here but to be clear, whatever we do here will have downstream impact into other tools, such as pgbackrest, and therefore we definitely want to have the code in libpgcommon so that these external tools can leverage it and know that they’re doing what PG does. I’d also like to give David Steele a chance to comment on the specific API, and any other backup tools authors, which I don’t think we should be rushing into anyway and I would think we’d only put into master.. Thanks! Stephen
Re: pgsql: Add TAP tests for pg_verify_checksums
On Sat, Oct 20, 2018 at 12:25:19AM -0400, Stephen Frost wrote: > On Fri, Oct 19, 2018 at 23:40 Michael Paquier wrote: >> - I agree with not doing a simple revert to not turn the buildfarm red >> again. This is annoying for animal maintainers. Andrew has done a very >> nice work in disabling manually those tests temporarily. > > This is a red herring, and always was, so I’m rather unimpressed at how it > keeps coming up- no, I’m not advocating that we should just make the build > farm red and just leave it that way. Yes, we should fix this case, and fix > pg_basebackup, and maybe even try to add some regression tests which test > this exact same case in pg_basebackup, but making the build farm green is > *not* the only thing we should care about. Well, the root of the problem was that pg_verify_checksums has been committed without any tests on its own. If we had those tests from the start, then we would not be having this discussion post-release time, still trying to figure out if whitelisting or blacklisting is appropriate. The validation checksums in base backups has been added in 4eb77d50, a couple of days before pg_verify_checksums has been introduced. This has added corruption-related tests in src/bin/pg_basebackup, which is a good thing. However the feature has been designed so as checksum mismatches are ignored after 5 failures, which actually *masked* the fact that EXEC_BACKEND files like CONFIG_EXEC_PARAMS should have been skipped instead of getting checksum failures. And that's a bad thing. So this gives in my opinion a good point about using a whitelist. -- Michael signature.asc Description: PGP signature
Re: Function to promote standby servers
On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote: > Here is another version, with a fix in pg_proc.dat, an improved comment > and "wait_seconds" exercised in the regression test. Thanks for the new version. This looks pretty good to me. I'll see if I can review it once and then commit. > - WAIT_EVENT_SYNC_REP > + WAIT_EVENT_SYNC_REP, > + WAIT_EVENT_PROMOTE > } WaitEventIPC; Those are kept in alphabetical order. Individual wait events are also documented with a description. -- Michael signature.asc Description: PGP signature
Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
On Thu, Oct 18, 2018 at 1:44 PM Andres Freund wrote: > I wonder if it'd make sense to hack up a patch that logs when evicting a > buffer while already holding another lwlock. That shouldn't be too hard. I tried this. It looks like we're calling FlushBuffer() with more than a single LWLock held (not just the single buffer lock) somewhat *less* with the patch. This is a positive sign for the patch, but also means that I'm no closer to figuring out what's going on. I tested a case with a 1GB shared_buffers + a TPC-C database sized at about 10GB. I didn't want the extra LOG instrumentation to influence the outcome. -- Peter Geoghegan
Re: pgsql: Add TAP tests for pg_verify_checksums
On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote: > I’d also like to give David Steele a chance to comment on the specific API, > and any other backup tools authors, which I don’t think we should be > rushing into anyway and I would think we’d only put into master.. Getting David input would be nice! -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Sat, Oct 20, 2018 at 00:43 Michael Paquier wrote: > On Sat, Oct 20, 2018 at 12:25:19AM -0400, Stephen Frost wrote: > > On Fri, Oct 19, 2018 at 23:40 Michael Paquier > wrote: > >> - I agree with not doing a simple revert to not turn the buildfarm red > >> again. This is annoying for animal maintainers. Andrew has done a very > >> nice work in disabling manually those tests temporarily. > > > > This is a red herring, and always was, so I’m rather unimpressed at how > it > > keeps coming up- no, I’m not advocating that we should just make the > build > > farm red and just leave it that way. Yes, we should fix this case, and > fix > > pg_basebackup, and maybe even try to add some regression tests which test > > this exact same case in pg_basebackup, but making the build farm green is > > *not* the only thing we should care about. > > Well, the root of the problem was that pg_verify_checksums has been > committed without any tests on its own. If we had those tests from the > start, then we would not be having this discussion post-release time, > still trying to figure out if whitelisting or blacklisting is > appropriate. > > The validation checksums in base backups has been added in 4eb77d50, a > couple of days before pg_verify_checksums has been introduced. This has > added corruption-related tests in src/bin/pg_basebackup, which is a good > thing. However the feature has been designed so as checksum mismatches > are ignored after 5 failures, which actually *masked* the fact that > EXEC_BACKEND files like CONFIG_EXEC_PARAMS should have been skipped > instead of getting checksum failures. And that's a bad thing. So this > gives in my opinion a good point about using a whitelist. That counter of checksum failures should have been getting reset between files.. that’s certainly what I had understood was intended. The idea of the counter is to not flood the log with errors when a file is discovered that’s full of checksum failures (as can happen if large chunks of the file got replaced with something else, for example), but it should be reporting on each file that does have failures in it. I don’t see how having a whitelist changes that or would have impacted that logic to make it correct initially or not. I’m also trying to understand how this checksum logging limit is getting hit in the tests but they’re passing..? If this was an intended failure check then surely there’s a test done that’s intended to be successful and it should have complained about this file too, or perhaps that’s what’s missing. I’m happy to look into this later this weekend. Certainly seems like something here isn’t really making sense tho. Thanks! Stephen
Re: pgsql: Add TAP tests for pg_verify_checksums
On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote: > I’d also like to give David Steele a chance to comment on the specific API, > and any other backup tools authors, which I don’t think we should be > rushing into anyway and I would think we’d only put into master.. By the way, we need to do something for the checksum verification code in base backups for v11 as well. If you enable checksums and take a base backup of a build with EXEC_BACKEND, then this creates spurious checksums failures. That's a bug. So while I agree that having a larger robust API is fine for HEAD, I would most likely not back-patch it. This is why I would suggest as a first step for HEAD and v11 to use a whitelist for base backups, to check for temporary tablespaces in pg_verify_checksums, to move isRelFileName into src/common/ and to keep the change minimalistic. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Sat, Oct 20, 2018 at 00:58 Michael Paquier wrote: > On Sat, Oct 20, 2018 at 12:41:04AM -0400, Stephen Frost wrote: > > I’d also like to give David Steele a chance to comment on the specific > API, > > and any other backup tools authors, which I don’t think we should be > > rushing into anyway and I would think we’d only put into master.. > > By the way, we need to do something for the checksum verification code > in base backups for v11 as well. If you enable checksums and take a > base backup of a build with EXEC_BACKEND, then this creates spurious > checksums failures. That's a bug. So while I agree that having a > larger robust API is fine for HEAD, I would most likely not back-patch > it. This is why I would suggest as a first step for HEAD and v11 to use > a whitelist for base backups, to check for temporary tablespaces in > pg_verify_checksums, to move isRelFileName into src/common/ and to keep > the change minimalistic. I’m all for keeping the changes which are backpatched minimal, which updating the blacklist as your original patch on this thread did would certainly be. Even adding in the logic to skip temp files as pg_basebackup has would be simpler and based on existing well-tested and extensively used code, unlike this new pattern-based whitelist of files approach. I have to say that I can’t recall hearing much in the way of complaints about pg_basebackup copying all the random cstore files, or the new checksum validation logic complaining about them, and such when doing backups and I wonder if that is because people simply don’t use the two together much, making me wonder how much of an issue this really is or would be with the account-for-everything approach I’ve been advocating for. Thanks! Stephen >
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-20 01:07:43 -0400, Stephen Frost wrote: > I have to say that I can’t recall hearing much in the way of complaints > about pg_basebackup copying all the random cstore files Why would somebody complain about that? It's usually desirable. > or the new checksum validation logic complaining about them, and such > when doing backups and I wonder if that is because people simply don’t > use the two together much, making me wonder how much of an issue this > really is or would be with the account-for-everything approach I’ve > been advocating for. I mean obviously pg_verify_checksum simply hasn't been actually tested much with plain postgres without extensions, given the all weaknesses identified in this thread. Greetings, Andres Freund
Re: pgsql: Add TAP tests for pg_verify_checksums
Hi, On 2018-10-20 00:25:19 -0400, Stephen Frost wrote: > If we are going to decide that we only deal with files in our directories > matching these whitelisted patterns, then shouldn’t we apply similar logic > in things like DROP DATABASE and any other cases where we perform actions > in a recursive manner across our database and table space directories? I'm honestly not sure if you're just trolling at this point. Why would anybody reasonable be concerned about DROP DATABASE dropping the directory for the database? You're not honestly suggesting that anybody would write an extension or anything like that that stores data in the wrong database's directory, right? Other iterations like fsyncing files, copying the entire template database directory, etc are similarly harmless or positive. Greetings, Andres Freund
Re: Multi-insert into a partitioned table with before insert row trigger causes server crash on latest HEAD
(Returns from leave and beyond the reaches of the internet) On 18 October 2018 at 07:45, Peter Eisentraut wrote: > On 16/10/2018 06:33, Ashutosh Sharma wrote: >> I think, the root cause of this problem is that CopyFrom() is using >> the stale value of *has_before_insert_row_trig* to determine if the >> current partition is okay for multi-insert or not i.e. >> has_before_insert_row_trig used to determine multi-insert condition >> for the current partition actually belongs to old partition. I think, >> *has_before_insert_row_trig* needs to updated before CopyFrom() >> evaluates if the current partition is good to go for multi insert or >> not. Attached is the patch based on this. I've also added the relevant >> test-case for it. Peter, David, Could you please have a look into the >> attached patch and share your thoughts. Thank you. > > I have committed your fix and test, moving some code around a bit. Thanks. Thanks for pushing that fix. Originally my patch in [1] only could set leafpart_use_multi_insert to true within the `if (insertMethod == CIM_MULTI_CONDITIONAL)` test, so wouldn't have suffered from this problem. I'm not sure that doubling up the `insertMethod == CIM_MULTI_CONDITIONAL` test is the cleanest fix. Personally, I liked the way it was in the v6 edition of the patch, but I'm used to getting outvoted. [1] https://www.postgresql.org/message-id/CAKJS1f9f8yuj04X_rffNu2JPbvhy+YP_aVH6iwCTJ1OL=yw...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Sat, Oct 20, 2018 at 01:16 Andres Freund wrote: > Hi, > > On 2018-10-20 00:25:19 -0400, Stephen Frost wrote: > > If we are going to decide that we only deal with files in our directories > > matching these whitelisted patterns, then shouldn’t we apply similar > logic > > in things like DROP DATABASE and any other cases where we perform actions > > in a recursive manner across our database and table space directories? > > I'm honestly not sure if you're just trolling at this point. Why would > anybody reasonable be concerned about DROP DATABASE dropping the > directory for the database? You're not honestly suggesting that anybody > would write an extension or anything like that that stores data in the > wrong database's directory, right? Other iterations like fsyncing > files, copying the entire template database directory, etc are similarly > harmless or positive. No, I’m not trolling, what I was trying to do is make the point that this is moving us away from having a very clear idea of what’s PG’s responsibility and what we feel comfortable operating on and this new half-and-half stance where we’ll happily nuke files in a directory that we didn’t create, and back them up even if we have no idea if they’ll be consistent at all or not when restored, but we won’t try to check the checksums on them or do some other set of operations on them. I suspect it’s pretty clear already, but just to make it plain, I really don’t like the half-and-half approach and it seems we’re being backed into this because it happens to fit some specific cases and not because there was any real thought or design put into supporting these use cases or being able to extend PG in this way. I do also see real risks with a whitelisting kind of approach that we end up missing things, and I get that you don’t see that risk but just stating that doesn’t change my opinion on it. Based on what you said up thread this whole thing also seems like it may be just going away anyway- because there’s real design being done to do this properly and allow PG to be extended in a way that we will know about what files are associated with what extensions or storage mechanisms and that seems like just another reason why we shouldn’t be moving to explicitly support random files being dropped into PG directories. Thanks, Stephen
Re: pgsql: Add TAP tests for pg_verify_checksums
Greetings, On Sat, Oct 20, 2018 at 01:11 Andres Freund wrote: > Hi, > > On 2018-10-20 01:07:43 -0400, Stephen Frost wrote: > > I have to say that I can’t recall hearing much in the way of complaints > > about pg_basebackup copying all the random cstore files > > Why would somebody complain about that? It's usually desirable. Even though they’re as likely as not to be invalid or corrupted..? Maybe things have moved forward here, I know there’s been discussion about it, but last I heard those files weren’t WAL’d and therefore the result of copying them from a running server was indeterminate. Yes, sometimes they’ll be fine, but you could say the same about regular PG relations too and yet we certainly wouldn’t be accepting of that. It certainly seems reasonable that people would complain about pg_basebackup misperforming when a backup that it did results in an invalid restore, though it tends to be a lot rarer to get complaints about partial failures like a corrupt or partial file being copied during a backup- but then that’s part of why we stress so much about trying to make sure we don’t do that as it can be hard to detect. People certainly did complain about unlogged tables being backed up and that was just because they took up space in the backup and time on the backup and restore just to be nuked when the server is started. > or the new checksum validation logic complaining about them, and such > > when doing backups and I wonder if that is because people simply don’t > > use the two together much, making me wonder how much of an issue this > > really is or would be with the account-for-everything approach I’ve > > been advocating for. > > I mean obviously pg_verify_checksum simply hasn't been actually tested > much with plain postgres without extensions, given the all weaknesses > identified in this thread. No, it hasn’t, but pg_basebackup has been around quite a while and has always copied everything, as best as I can recall anyway. Thanks, Stephen >
Re: pgsql: Add TAP tests for pg_verify_checksums
On Sat, Oct 20, 2018 at 02:03:32AM -0400, Stephen Frost wrote: > On Sat, Oct 20, 2018 at 01:11 Andres Freund wrote: >> or the new checksum validation logic complaining about them, and such >>> when doing backups and I wonder if that is because people simply don’t >>> use the two together much, making me wonder how much of an issue this >>> really is or would be with the account-for-everything approach I’ve >>> been advocating for. >> >> I mean obviously pg_verify_checksum simply hasn't been actually tested >> much with plain postgres without extensions, given the all weaknesses >> identified in this thread. > > No, it hasn’t, but pg_basebackup has been around quite a while and has > always copied everything, as best as I can recall anyway. At this point, let's create a new thread with a description of what has been discussed and what we'd like to do for HEAD and v11. I got something in mind which would result in a minimal patch. Let's start from that. -- Michael signature.asc Description: PGP signature