Re: Remove Deprecated Exclusive Backup Mode
On 13/12/2018 20:17, David Steele wrote: > On 12/13/18 2:02 PM, Peter Eisentraut wrote: >> On 12/12/2018 05:09, David Steele wrote: >>> I think the patch stands on its own. Exclusive backup mode is broken >>> and is know to cause problems in the field. >> >> Is this breakage documented anywhere for users? > > Yes: > > https://www.postgresql.org/docs/11/continuous-archiving.html > > "Note that if the server crashes during the backup it may not be > possible to restart until the backup_label file has been manually > deleted from the PGDATA directory." > > Seems like the wording could be stronger. I think that is a pretty liberal interpretation of the word "broken". All this says is "if you do A, and B happens, then you need to do C". Clearly, not having to do that at all is better, but if this is all there is to it, then I'm confused by the characterizations of how awful and terrible this feature is and how we must rush to remove it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ALTER INDEX ... ALTER COLUMN not present in dump
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested dump-alter-index-stats-v2.patch looks pretty much reasonable to me, passing on committer. The new status of this patch is: Ready for Committer
Re: Upgrading pg_statistic to handle collation honestly
On 12/12/2018 16:57, Tom Lane wrote: > diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c > index b8445dc..dcbd04c 100644 > *** a/src/backend/commands/analyze.c > --- b/src/backend/commands/analyze.c > *** examine_attribute(Relation onerel, int a > *** 904,914 > --- 904,917 > { > stats->attrtypid = exprType(index_expr); > stats->attrtypmod = exprTypmod(index_expr); > + stats->attrcollid = exprCollation(index_expr); > + /* XXX should we consult indcollation instead? */ After looking through this again, I think the answer here is "yes". If the index definition overrides the collation, then we clearly want to use that. If it's not overridden, then indcollation is still set, so it's just as easy to use it. > } > else > { > stats->attrtypid = attr->atttypid; > stats->attrtypmod = attr->atttypmod; > + stats->attrcollid = attr->attcollation; > } > > typtuple = SearchSysCacheCopy1(TYPEOID, -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Where to save data used by extension ?
Hao Wu wrote: > I have an extension for postgresql. The extension works across some > databases, and needs to save some data. > The data might be modified dynamically by the extension, and all the changed > result must be saved. I have considered some methods. > 1. Use GUC > I find no easy way to write the GUC values back to postgres.conf. And the > data might be a bit complex > 2. Create a table under a special database, like postgres > The weak is the strong assumption that the database postgres does exist and > will not be dropped. > 3. Create a table under a database created by the extension > A little heavy, but without dependencies. > 4. Write to a native file. > The file can not sync to a standby > > Currently, I use #2. Is there any better way to do this? Only 2 and 3 are feasible. Since extensions have a schema, and the database is the best place to persist data for a database extension, I'd use a table in the extension schema, so I'd go for 3. Why is that heavier than 2? Yours, Laurenz Albe
Re: alternative to PG_CATCH
On 13/12/2018 13:26, Kyotaro HORIGUCHI wrote: > Though I didn't look into individual change, this kind of > refactoring looks good to me. But the syntax looks > somewhat.. uh.. > > I'm not sure it is actually workable, but I suspect that what we > need here is just a shortcut of 'PG_CATCH();{PG_RE_THROW();}'. > Something like this: > > #define PG_FINALLY()\ > } \ > else \ > { \ > PG_exception_stack = save_exception_stack; \ > error_context_stack = save_context_stack; \ > PG_RE_THROW();\ > } \ > PG_exception_stack = save_exception_stack;\ > error_context_stack = save_context_stack; \ > { I don't think this works, because the "finally" code needs to be run in the else branch before the rethrow. The fundamental problem, as I see it, is that the macro expansion needs to produce the "finally" code twice: Once in the else (error) branch of the setjmp, and once in the non-error code path, either after the if-setjmp, or in the try block. But to be able to do that, we need to capture the block as a macro argument. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: allow online change primary_conninfo
Hi > I would recommend waiting for the conclusion of other thread before > moving on with this one. Agreed. I mark this patch as Waiting on Author. Not quite correct for waiting another discussion, but most suitable. > We could also consider using > the show hook function of a parameter to print its value in such logs. But show hook also affects "show primary_conninfo;" command and pg_settings.value I already marked primary_conninfo as GUC_SUPERUSER_ONLY. I doubt if we need hide some connection info even from superuser. Also this hook would be a bit complicated, i found suitable code only in libpqrcv_get_conninfo after establishing connect regards, Sergei
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Hi Not sure i can be reviewer since this was initially my proposal. I vote +1 for this patch. Code looks good and i think we have no reason to leave RequestXLogStreaming as-is. regards, Sergei
Re: Hitting CheckRelationLockedByMe() ASSERT with force_generic_plan
On Fri, Dec 14, 2018 at 11:32 AM Rushabh Lathia wrote: > > While looking code further around this, I realized that we need > similar kind of fix for bitmap as well as index only scan as well. > I think if we want to move forward with this patch, we need to first investigate the deadlock hazard mentioned by Tom in other related thread [1]. I and others have also responded on that thread, see if you can respond to it. [1] - https://www.postgresql.org/message-id/3046.1542990312%40sss.pgh.pa.us -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
explain plans with information about (modified) gucs
Hi, every now and then I have to investigate an execution plan that is strange in some way and I can't reproduce the same behavior. Usually it's simply due to data distribution changing since the problem was observed (say, after a nightly batch load/update). In many cases it however may be due to some local GUC tweaks, usually addressing some query specific issues (say, disabling nested loops or lowering join_collapse_limit). I've repeatedly ran into cases where the GUC was not properly reset to the "regular" value, and it's rather difficult to identify this is what's happening. Or cases with different per-user settings and connection pooling (SET SESSION AUTHORIZATION / ROLE etc.). So I propose to extend EXPLAIN output with an additional option, which would include information about modified GUCs in the execution plan (disabled by default, of course): test=# explain (gucs) select * from t; QUERY PLAN Seq Scan on t (cost=0.00..35.50 rows=2550 width=4) GUCs: application_name = 'x', client_encoding = 'UTF8', cpu_tuple_cost = '0.01' (2 rows) Of course, this directly applies to auto_explain too, which gets a new option log_gucs. The patch is quite trivial, but there are about three open questions: 1) names of the options I'm not particularly happy with calling the option "gucs" - it's an acronym and many users have little idea what GUC stands for. So I think a better name would be desirable, but I'm not sure what would that be. Options? Parameters? 2) format of output At this point the names/values are simply formatted into a one-line string. That's not particularly readable, and it's not very useful for the YAML/JSON formats I guess. So adding each modified GUC as an extra text property would be better. 3) identifying modified (and interesting) GUCs We certainly don't want to include all GUCs, so the question is how to decide which GUCs are interesting. The simplest approach would be to look for GUCs that changed in the session (source == PGC_S_SESSION), but that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE because that includes irrelevant options like wal_buffers etc. For now I've used /* return only options that were modified (not as in config file) */ if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE)) continue; which generally does the right thing, although it also includes stuff like application_name or client_encoding. But perhaps it'd be better to whitelist the GUCs in some way, because some of the user-defined GUCs may be sensitive and should not be included in plans. Opinions? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 646cd0d42c..ec44c59b7f 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = true; +static bool auto_explain_log_gucs = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static int auto_explain_log_level = LOG; static bool auto_explain_log_nested_statements = false; @@ -112,6 +113,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auto_explain.log_gucs", + "Print modified GUC values.", + NULL, + &auto_explain_log_gucs, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + DefineCustomBoolVariable("auto_explain.log_verbose", "Use EXPLAIN VERBOSE for plan logging.", NULL, @@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->timing = (es->analyze && auto_explain_log_timing); es->summary = es->analyze; es->format = auto_explain_log_format; + es->gucs = auto_explain_log_gucs; ExplainBeginOutput(es); ExplainQueryText(es, queryDesc); diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml index 120b168d45..852c69b7bb 100644 --- a/doc/src/sgml/auto-explain.sgml +++ b/doc/src/sgml/auto-explain.sgml @@ -169,6 +169,23 @@ LOAD 'auto_explain'; + + + auto_explain.log_gucs (boolean) + + auto_explain.log_gucs configuration parameter + + + + + auto_explain.log_gucs controls whether information + about modified configuration options are logged with the execution + plan. Only options modified at the database, user, client or session + level are considered modified. This parameter is off by default. + + + + auto_explain.log_format (enum) diff --git
Catalog views failed to show partitioned table information.
Hi, There are some catalog views which do not show the partitioned table and its index entry. One of them is "pg_indexes" which failed to show the partitioned index. Attached the patch which fixes the same. Other views such as pg_stat*,pg_statio_* has the same problem for partitioned tables and indexes. Since the partitioned tables and its indexes considered as a dummy, they do not have any significance in stat tables, can we still consider adding relkind=p in these pg_stat_* views? Thoughts? Regards, Suraj pg_indexes_fix_for_partition_index.patch Description: Binary data
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2018-Dec-14, Peter Eisentraut wrote: > On 14/12/2018 01:23, Michael Paquier wrote: > > On Thu, Dec 13, 2018 at 07:14:57PM -0500, Stephen Frost wrote: > >> Based on at least a quick looking around, the actual grammar rule seems > >> to match my recollection[1], adverbs should typically go AFTER the > >> verb + object, and the adverb shouldn't ever be placed between the verb > >> and the object. > > > > This part has been a long debate already in 2012-2013 when I sent the > > first iterations of the patch, and my memories on the matter are that > > the grammar you are showing here matches with the past agreement. > > Do you happen to have a link for that? I didn't find anything. I think putting the CONCURRENTLY in the parenthesized list of options is most sensible. CREATE INDEX didn't have such an option list when we added this feature there; see https://www.postgresql.org/message-id/flat/200608011143.k71Bh9c22067%40momjian.us#029e9a7ee8cb38beee494ef7891bec1d for some discussion about that grammar. Our options were not great ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: [PROPOSAL]a new data type 'bytea' for ECPG
Meskes-san Maybe I understand your comments about compatibility. I will try to implement for sqlda. # I am not goot at library version mechanism. # I will learn about it. Regards Ryo Matsumura
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Fri, Dec 14, 2018 at 12:23:09PM +0300, Sergei Kornilov wrote: > Not sure i can be reviewer since this was initially my proposal. No issues from me if you wish to review the code. In my opinion, it is not a problem if you are a reviewer as I wrote the code. > I vote +1 for this patch. Code looks good and i think we have no > reason to leave RequestXLogStreaming as-is. Thanks for the input. The take on my side is to be able to remove ready_to_display from WalRcv so let's see what others think. -- Michael signature.asc Description: PGP signature
Re: explain plans with information about (modified) gucs
pá 14. 12. 2018 v 12:41 odesílatel Tomas Vondra < tomas.von...@2ndquadrant.com> napsal: > Hi, > > every now and then I have to investigate an execution plan that is > strange in some way and I can't reproduce the same behavior. Usually > it's simply due to data distribution changing since the problem was > observed (say, after a nightly batch load/update). > > In many cases it however may be due to some local GUC tweaks, usually > addressing some query specific issues (say, disabling nested loops or > lowering join_collapse_limit). I've repeatedly ran into cases where the > GUC was not properly reset to the "regular" value, and it's rather > difficult to identify this is what's happening. Or cases with different > per-user settings and connection pooling (SET SESSION AUTHORIZATION / > ROLE etc.). > > So I propose to extend EXPLAIN output with an additional option, which > would include information about modified GUCs in the execution plan > (disabled by default, of course): > > test=# explain (gucs) select * from t; > > QUERY PLAN > >Seq Scan on t (cost=0.00..35.50 rows=2550 width=4) >GUCs: application_name = 'x', client_encoding = 'UTF8', > cpu_tuple_cost = '0.01' >(2 rows) > > Of course, this directly applies to auto_explain too, which gets a new > option log_gucs. > > The patch is quite trivial, but there are about three open questions: > > 1) names of the options > > I'm not particularly happy with calling the option "gucs" - it's an > acronym and many users have little idea what GUC stands for. So I think > a better name would be desirable, but I'm not sure what would that be. > Options? Parameters? > > 2) format of output > > At this point the names/values are simply formatted into a one-line > string. That's not particularly readable, and it's not very useful for > the YAML/JSON formats I guess. So adding each modified GUC as an extra > text property would be better. > > 3) identifying modified (and interesting) GUCs > > We certainly don't want to include all GUCs, so the question is how to > decide which GUCs are interesting. The simplest approach would be to > look for GUCs that changed in the session (source == PGC_S_SESSION), but > that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we > probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE > because that includes irrelevant options like wal_buffers etc. > > For now I've used > > /* return only options that were modified (not as in config file) */ > if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE)) > continue; > > which generally does the right thing, although it also includes stuff > like application_name or client_encoding. But perhaps it'd be better to > whitelist the GUCs in some way, because some of the user-defined GUCs > may be sensitive and should not be included in plans. > > Opinions? > has sense Pavel > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: valgrind issues on Fedora 28
On 12/14/18 5:10 AM, Tom Lane wrote: > Tomas Vondra writes: >> On 11/8/18 1:07 PM, Tomas Vondra wrote: >>> Not sure what to do about this - maybe we should wildcard this first >>> frame, to accept all wcsnlen variants. Opinions? > >> I've committed this with the first frame wirldcarded, and backpatched it >> all the way back to 9.4. > > So I just got around to trying to do some valgrind testing for the first > time in a couple of months, and what I find is that this patch completely > breaks valgrind for me: it exits immediately with complaints like > > ==00:00:00:00.222 14821== FATAL: in suppressions file > "/home/postgres/pgsql/src/tools/valgrind.supp" near line 227: > ==00:00:00:00.222 14821==unknown tool suppression type > ==00:00:00:00.222 14821== exiting now. > > After a bit of hair-pulling I found that the line number report is > a lie, and what it's really unhappy about is the "Memcheck:Addr32" > specification on line 236. This is not terribly surprising perhaps > considering that "Addr32" isn't even documented on what should be > the authoritative page: > http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles > Hmm :-( FWIW the Addr32 comes directly from valgrind itself, which explicitly suggested to add suppression like this: { Memcheck:Addr32 fun:__wcsnlen_avx2 fun:wcsrtombs fun:wcstombs fun:wchar2char fun:str_tolower fun:lower fun:DirectFunctionCall1Coll fun:Generic_Text_IC_like ... } My guess is that documentation is either somewhat obsolete, or is meant more like a general description than an exhaustive and authoritative source of truth regarding suppressions. > This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly, > but we have very little evidence about how far back the committed patch > works. I'm inclined to think we can't use this solution. > Fedora 28 has 3.14.0, so this seems to be due to some improvements since 3.8.1. I'm not sure how to deal with this - surely we don't want to just revert the change because that would start generating noise again. For a moment I thought we might wildcard the error type somehow, so that it would accept Memcheck:* but AFAICs that is not supported. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Remove Deprecated Exclusive Backup Mode
On 12/14/18 3:01 AM, Peter Eisentraut wrote: > On 13/12/2018 20:17, David Steele wrote: >> On 12/13/18 2:02 PM, Peter Eisentraut wrote: >>> On 12/12/2018 05:09, David Steele wrote: I think the patch stands on its own. Exclusive backup mode is broken and is know to cause problems in the field. >>> >>> Is this breakage documented anywhere for users? >> >> Yes: >> >> https://www.postgresql.org/docs/11/continuous-archiving.html >> >> "Note that if the server crashes during the backup it may not be >> possible to restart until the backup_label file has been manually >> deleted from the PGDATA directory." >> >> Seems like the wording could be stronger. > > I think that is a pretty liberal interpretation of the word "broken". > All this says is "if you do A, and B happens, then you need to do C". > > Clearly, not having to do that at all is better, but if this is all > there is to it, then I'm confused by the characterizations of how awful > and terrible this feature is and how we must rush to remove it. If the server crashes during a backup, the server probably won't restart. We say "may" here but if your backup runs more than a few checkpoints it's more like won't. The remedy is to go manually delete a file the user may have never heard of. They are often hesitant to do so -- for good reason. The downtime stretches as they call their support company or consult Google to determine if they should really do this. The main thing is that *manual* intervention is required to get the cluster running again. Sure, you could automate it, but now we have users writing scripts to delete files out of PGDATA (may as well get tablespace_map as well) -- but hopefully not in the case of a restore where you actually need those files. And hopefully not anything else important. That seems pretty broken to me. -- -David da...@pgmasters.net
Re: automatically assigning catalog toast oids
> On 12/13/18, Tom Lane wrote: > Looking at the existing entries, it seems like we'd have to have > one special case: schema public has OID 2200 but is intentionally > not pinned. setup_depend() mentions other exceptions, but only this one currently has any effect as far as I can see: "pg_database: it's a feature, not a bug, that template1 is not pinned." -John Naylor
Re: Change pgarch_readyXlog() to return .history files first
On 12/13/18 7:15 PM, Michael Paquier wrote: > On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote: >> I also think we should consider back-patching this change. It's hard to >> imagine that archive commands would have trouble with this reordering >> and the current ordering causes real pain in HA clusters. > > I would like to hear opinion from other though if we should consider > that as an improvement or an actual bug fix. Changing the order of the > files to map with what the startup process does when promoting does not > sound like a bug fix to me, still this is not really invasive, so we > could really consider it worth back-patching to reduce common pain from > users when it comes to timeline handling. I think an argument can be made that it is a bug (ish). Postgres generates the files in one order, and they get archived in a different order. >> -if (!found) >> +/* Is this a history file? */ >> +bool history = basenamelen >= sizeof(".history") && >> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1), >> + ".history.ready") == 0; > > Or you could just use IsTLHistoryFileName here? We'd have to truncate .ready off the string to make that work, which seems easy enough. Is that what you were thinking? One thing to consider is the check above is more efficient than IsTLHistoryFileName() and it potentially gets run a lot. > If one wants to simply check this code, you can just create dummy orphan > files in archive_status and see in which order they get removed. Seems awfully racy. Are there currently any tests like this for the archiver that I can look at extending? Regards, -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: explain plans with information about (modified) gucs
On 12/14/18 2:05 PM, Jim Finnerty wrote: > You might want to only include the GUCs that are query tuning parameters, > i.e., those returned by: > > SELECT name, setting, category > FROM pg_settings > WHERE category LIKE 'Query Tuning%' > ORDER BY category, name; > Good idea! Thanks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Greetings, * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > On 14/12/2018 01:14, Stephen Frost wrote: > reindex table CONCURRENTLY test; > >> > >> By the way, does this syntax make sense? I haven't seen a discussion on > >> this anywhere in the various threads. I keep thinking that > >> > >> reindex concurrently table test; > >> > >> would make more sense. How about in combination with (verbose)? > > > > I don't think it's a mistake that we have 'create index concurrently' > > and it certainly would seem odd to me for 'create index' and 'reindex > > table' to be different. > > > > Certainly, from my recollection of english, you'd say "I am going to > > reindex the table concurrently", you wouldn't say "I am going to > > reindex concurrently the table." > > > > Based on at least a quick looking around, the actual grammar rule seems > > to match my recollection[1], adverbs should typically go AFTER the > > verb + object, and the adverb shouldn't ever be placed between the verb > > and the object. > > So it would be grammatical to say > > reindex table test concurrently Yes, though I'm not really a fan of it. > or in a pinch > > reindex concurrently table test No, you can't put concurrently between reindex and table. > but I don't see anything grammatical about > > reindex table concurrently test I disagree, this does look reasonable to me and it's certainly much better than 'reindex concurrently table' which looks clearly incorrect. > Where this gets really messy is stuff like this: > > reindex (verbose) database concurrently postgres > > Why would "concurrently" not be part of the options next to "verbose"? That wasn't what was asked and I don't think I see a problem with having concurrently be allowed in the parentheses. For comparison, it's not like "explain analyze select ..." or "explain buffers select" is terribly good grammatical form. If you wanted to try to get to a better form for the spelled out sentence, I would think: concurrently reindex table test would probably be the approach to use, though that's not what we use for 'create index' and it'd be rather out of character for us to start a command with an adverb, making it ultimately a poor choice overall. Going back to what we already have done and have in released versions, we have 'create unique index concurrently test ...' and that's at least reasonable (the adverb isn't showing up between the verb and the object, and the adjective is between the verb and the object) and is what I vote to go with, with the caveat that if we want to also allow it inside the parentheses, I'm fine with that. Thanks! Stephen signature.asc Description: PGP signature
Re: Row Visibility and Table Access Methods
On Fri, Dec 14, 2018 at 12:28 AM Pavel Stehule wrote: > > > čt 13. 12. 2018 v 10:23 odesílatel Simon Riggs > napsal: > >> Thoughts? >> > > looks great > Agreed. This sounds well-thought-out and rather simple to implement. -- Jonah H. Harris
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2018-Dec-14, Stephen Frost wrote: > That wasn't what was asked and I don't think I see a problem with having > concurrently be allowed in the parentheses. For comparison, it's not > like "explain analyze select ..." or "explain buffers select" is > terribly good grammatical form. ... and we don't allow EXPLAIN BUFFERS at all, and if we had had a parenthesized option list in EXPLAIN when we invented EXPLAIN ANALYZE, I bet we would have *not* made the ANALYZE keyword appear unadorned in that command. > If you wanted to try to get to a better form for the spelled out > sentence, I would think: > > concurrently reindex table test > > would probably be the approach to use, I think this is terrible from a command-completion perspective, and from a documentation perspective (Certainly we wouldn't have a manpage about the "concurrently" command, for starters). My vote goes to put the keyword inside of and exclusively in the parenthesized option list. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
Greetings, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > On 2018-Dec-14, Stephen Frost wrote: > > > That wasn't what was asked and I don't think I see a problem with having > > concurrently be allowed in the parentheses. For comparison, it's not > > like "explain analyze select ..." or "explain buffers select" is > > terribly good grammatical form. > > ... and we don't allow EXPLAIN BUFFERS at all, and if we had had a > parenthesized option list in EXPLAIN when we invented EXPLAIN ANALYZE, I > bet we would have *not* made the ANALYZE keyword appear unadorned in > that command. I'm not convinced of that- there is value in being able to write full and useful commands without having to always use parentheses. > > If you wanted to try to get to a better form for the spelled out > > sentence, I would think: > > > > concurrently reindex table test > > > > would probably be the approach to use, > > I think this is terrible from a command-completion perspective, and from > a documentation perspective (Certainly we wouldn't have a manpage about > the "concurrently" command, for starters). Right, I agreed that this had other downsides in the email you're replying to here. Glad we agree that it's not a good option. > My vote goes to put the keyword inside of and exclusively in the > parenthesized option list. I disagree with the idea of exclusively having concurrently be in the parentheses. 'explain buffers' is a much less frequently used option (though that might, in part, be because it's a bit annoying to write out explain (analyze, buffers) select...; I wonder if we could have a way to say "if I'm running analyze, I always want buffers"...), but concurrently reindexing a table (or index..) is going to almost certainly be extremely common, perhaps even more common than *not* reindexing concurrently. Thanks! Stephen signature.asc Description: PGP signature
Re: Upgrading pg_statistic to handle collation honestly
Peter Eisentraut writes: > On 12/12/2018 16:57, Tom Lane wrote: >> +stats->attrcollid = exprCollation(index_expr); >> +/* XXX should we consult indcollation instead? */ > After looking through this again, I think the answer here is "yes". Yeah, I was leaning towards that as well, but hadn't written the extra code needed to make it so. Will fix. regards, tom lane
Re: explain plans with information about (modified) gucs
On 12/14/18 3:01 PM, Tomas Vondra wrote: > On 12/14/18 2:05 PM, Jim Finnerty wrote: >> You might want to only include the GUCs that are query tuning parameters, >> i.e., those returned by: >> >> SELECT name, setting, category >> FROM pg_settings >> WHERE category LIKE 'Query Tuning%' >> ORDER BY category, name; >> > > Good idea! Thanks. V2 filtering the options to QUERY_TUNING group (and subgroups). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 646cd0d42c..ec44c59b7f 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = true; +static bool auto_explain_log_gucs = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static int auto_explain_log_level = LOG; static bool auto_explain_log_nested_statements = false; @@ -112,6 +113,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable("auto_explain.log_gucs", + "Print modified GUC values.", + NULL, + &auto_explain_log_gucs, + false, + PGC_SUSET, + 0, + NULL, + NULL, + NULL); + DefineCustomBoolVariable("auto_explain.log_verbose", "Use EXPLAIN VERBOSE for plan logging.", NULL, @@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->timing = (es->analyze && auto_explain_log_timing); es->summary = es->analyze; es->format = auto_explain_log_format; + es->gucs = auto_explain_log_gucs; ExplainBeginOutput(es); ExplainQueryText(es, queryDesc); diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml index 120b168d45..852c69b7bb 100644 --- a/doc/src/sgml/auto-explain.sgml +++ b/doc/src/sgml/auto-explain.sgml @@ -169,6 +169,23 @@ LOAD 'auto_explain'; + + + auto_explain.log_gucs (boolean) + + auto_explain.log_gucs configuration parameter + + + + + auto_explain.log_gucs controls whether information + about modified configuration options are logged with the execution + plan. Only options modified at the database, user, client or session + level are considered modified. This parameter is off by default. + + + + auto_explain.log_format (enum) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index de09ded65b..b8cab69f71 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -31,6 +31,7 @@ #include "storage/bufmgr.h" #include "tcop/tcopprot.h" #include "utils/builtins.h" +#include "utils/guc_tables.h" #include "utils/json.h" #include "utils/lsyscache.h" #include "utils/rel.h" @@ -164,6 +165,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, const char *queryString, es->costs = defGetBoolean(opt); else if (strcmp(opt->defname, "buffers") == 0) es->buffers = defGetBoolean(opt); + else if (strcmp(opt->defname, "gucs") == 0) + es->gucs = defGetBoolean(opt); else if (strcmp(opt->defname, "timing") == 0) { timing_set = true; @@ -547,6 +550,37 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, /* Create textual dump of plan tree */ ExplainPrintPlan(es, queryDesc); + if (es->gucs) + { + int i; + int num; + StringInfoData str; + struct config_generic **gucs; + + gucs = get_modified_guc_options(&num); + + for (i = 0; i < num; i++) + { + char *setting; + struct config_generic *conf = gucs[i]; + + if (i == 0) +initStringInfo(&str); + else +appendStringInfoString(&str, ", "); + + setting = GetConfigOptionByName(conf->name, NULL, true); + + if (setting) +appendStringInfo(&str, "%s = '%s'", conf->name, setting); + else +appendStringInfo(&str, "%s = NULL", conf->name); + } + + if (num > 0) + ExplainPropertyText("GUCs", str.data, es); + } + if (es->summary && planduration) { double plantime = INSTR_TIME_GET_DOUBLE(*planduration); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 6fe1939881..2d37760c7a 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -8556,6 +8556,45 @@ ShowAllGUCConfig(DestReceiver *dest) end_tup_output(tstate); } +struct config_generic ** +get_modified_guc_options(int *num) +{ + int i; + struct config_generic **result; + + *num = 0; + result = palloc(sizeof(struct config_generic *) * num_guc_variables); + + for (i = 0; i < num_guc_variables; i++) + { + struct config_generic *conf = guc_variables[i]; + + /* return only options visible to the user */ + if ((conf->flags & GUC_NO_SHOW_ALL) || + ((conf->flags & GUC_SUPERUSER_ONLY) && + !is_me
Re: explain plans with information about (modified) gucs
Tomas Vondra writes: > ... I propose to extend EXPLAIN output with an additional option, which > would include information about modified GUCs in the execution plan > (disabled by default, of course): I'm a bit suspicious about whether this'll have any actual value, if it's disabled by default (which I agree it needs to be, if only for compatibility reasons). The problem you're trying to solve is basically "I forgot that this might have an effect", but stuff that isn't shown by default will not help you un-forget. It certainly won't fix the form of the problem that I run into, which is people sending in EXPLAIN plans and not mentioning their weird local settings. > We certainly don't want to include all GUCs, so the question is how to > decide which GUCs are interesting. The simplest approach would be to > look for GUCs that changed in the session (source == PGC_S_SESSION), but > that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we > probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE > because that includes irrelevant options like wal_buffers etc. Don't you want to show anything that's not the built-in default? (I agree OVERRIDE could be excluded, but that's irrelevant for query tuning parameters.) Just because somebody injected a damfool setting of, say, random_page_cost via the postmaster command line or environment settings doesn't make it not damfool :-( regards, tom lane
Re: explain plans with information about (modified) gucs
On 12/14/18 4:21 PM, Tom Lane wrote: > Tomas Vondra writes: >> ... I propose to extend EXPLAIN output with an additional option, which >> would include information about modified GUCs in the execution plan >> (disabled by default, of course): > > I'm a bit suspicious about whether this'll have any actual value, > if it's disabled by default (which I agree it needs to be, if only for > compatibility reasons). The problem you're trying to solve is basically > "I forgot that this might have an effect", but stuff that isn't shown > by default will not help you un-forget. It certainly won't fix the > form of the problem that I run into, which is people sending in EXPLAIN > plans and not mentioning their weird local settings. > Not quite. I agree we'll still have to deal with plans from users without this info, but it's easier to ask for explain with this extra option (just like we regularly ask for explain analyze instead of just plain explain). I'd expect the output to be more complete than trying to figure out which of the GUCs might have effect / been modified here. But more importantly - my personal primary use case here is explains from application connections generated using auto_explain, with some application-level GUC magic. And there I can easily tweak auto_explain config to do (auto_explain.log_gucs = true) of course. >> We certainly don't want to include all GUCs, so the question is how to >> decide which GUCs are interesting. The simplest approach would be to >> look for GUCs that changed in the session (source == PGC_S_SESSION), but >> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we >> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE >> because that includes irrelevant options like wal_buffers etc. > > Don't you want to show anything that's not the built-in default? > (I agree OVERRIDE could be excluded, but that's irrelevant for query > tuning parameters.) Just because somebody injected a damfool setting > of, say, random_page_cost via the postmaster command line or > environment settings doesn't make it not damfool :-( > Probably. My assumption here was that I can do select * from pg_settings and then combine it with whatever is included in the plan. But you're right comparing it with the built-in default may be a better option. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: row filtering for logical replication
Greetings, * Petr Jelinek (petr.jeli...@2ndquadrant.com) wrote: > On 23/11/2018 03:02, Stephen Frost wrote: > > * Euler Taveira (eu...@timbira.com.br) wrote: > >> 2018-02-28 21:54 GMT-03:00 Craig Ringer : > >>> Good idea. I haven't read this yet, but one thing to make sure you've > >>> handled is limiting the clause to referencing only the current tuple and > >>> the > >>> catalogs. user-catalog tables are OK, too, anything that is > >>> RelationIsAccessibleInLogicalDecoding(). > >>> > >>> This means only immutable functions may be invoked, since a stable or > >>> volatile function might attempt to access a table. And views must be > >>> prohibited or recursively checked. (We have tree walkers that would help > >>> with this). > >>> > >>> It might be worth looking at the current logic for CHECK expressions, > >>> since > >>> the requirements are similar. In my opinion you could safely not bother > >>> with > >>> allowing access to user catalog tables in the filter expressions and limit > >>> them strictly to immutable functions and the tuple its self. > >> > >> IIRC implementation is similar to RLS expressions. I'll check all of > >> these rules. > > > > Given the similarity to RLS and the nearby discussion about allowing > > non-superusers to create subscriptions, and probably publications later, > > I wonder if we shouldn't be somehow associating this with RLS policies > > instead of having the publication filtering be entirely independent.. > > I do see the appeal here, if you consider logical replication to be a > streaming select it probably applies well. > > But given that this is happening inside output plugin which does not > have full executor setup and has catalog-only snapshot I am not sure how > feasible it is to try to merge these two things. As per my previous > email it's possible that we'll have to be stricter about what we allow > in expressions here. I can certainly understand the concern about trying to combine the implementation of this with that of RLS; perhaps that isn't a good fit due to the additional constraints put on logical decoding. That said, I still think it might make sense to consider these filters for logical decoding to be policies and, ideally, to allow users to use the same policy for both. In the end, the idea of having to build a single large and complex 'create publication' command which has a bunch of tables, each with their own filter clauses, just strikes me as pretty painful. > The other issue with merging this is that the use-case for filtering out > the data in logical replication is not necessarily about security, but > often about sending only relevant data. So it makes sense to have filter > on publication without RLS enabled on table and if we'd force that, we'd > limit usefulness of this feature. I definitely have a serious problem if we are going to say that you can't use this filtering for security-sensitive cases. > We definitely want to eventually create subscriptions as non-superuser > but that has zero effect on this as everything here is happening on > different server than where subscription lives (we already allow > creation of publications with just CREATE privilege on database and > ownership of the table). What I wasn't clear about above was the idea that we might allow a user other than the table owner to publish a given table, but that such a publication should certanily only be allowed to include the rows which that user has access to- as regulated by RLS. If the RLS policy is too complex to allow that then I would think we'd simply throw an error at the create publication time and the would-be publisher would need to figure that out with the table owner. I'll admit that this might seem like a stretch, but what happens today? Today, people write cronjobs to try to sync between tables with FDWs and you don't need to own a table to use it as the target of a foreign table. I do think that we'll need to have some additional privileges around who is allowed to create publications, I'm not entirely thrilled with that being combined with the ability to create schemas; the two seem quite different to me. * Euler Taveira (eu...@timbira.com.br) wrote: > Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek > escreveu: > > But given that this is happening inside output plugin which does not > > have full executor setup and has catalog-only snapshot I am not sure how > > feasible it is to try to merge these two things. As per my previous > > email it's possible that we'll have to be stricter about what we allow > > in expressions here. > > This feature should be as simple as possible. I don't want to > introduce a huge overhead just for filtering some data. Data sharding > generally uses simple expressions. RLS often uses simple filters too. > > The other issue with merging this is that the use-case for filtering out > > the data in logical replication is not necessarily about security, but > > often about sending only relevant data. So i
Re: alternative to PG_CATCH
Peter Eisentraut writes: > The fundamental problem, as I see it, is that the macro expansion needs > to produce the "finally" code twice: Once in the else (error) branch of > the setjmp, and once in the non-error code path, either after the > if-setjmp, or in the try block. But to be able to do that, we need to > capture the block as a macro argument. I don't especially like the MACRO({...}) proposal, because it looks way too much like gcc's special syntax for "statement expressions". If we have to go this way, I'd rather see if MACRO((...)) can be made to work. But I question your assumption that we have to have two physical copies of the "finally" code. There's nothing wrong with implementing this sort of infrastructure with some goto's, or whatever else we have to have to make it work. Also, it'd look more natural as an extension to the existing PG_TRY infrastructure if the source code were like PG_TRY(); { ... } PG_FINALLY(); { ... } PG_END_TRY(); So even if Kyotaro-san's initial sketch isn't right in detail, I think he has the right idea about where we want to go. regards, tom lane
Re: row filtering for logical replication
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 11/23/18 8:03 PM, Stephen Frost wrote: > > * Fabrízio de Royes Mello (fabriziome...@gmail.com) wrote: > >> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek > >> wrote: > If carefully documented I see no problem with it... we already have an > analogous problem with functional indexes. > >>> > >>> The difference is that with functional indexes you can recreate the > >>> missing object and everything is okay again. With logical replication > >>> recreating the object will not help. > >>> > >> > >> In this case with logical replication you should rsync the object. That is > >> the price of misunderstanding / bad use of the new feature. > >> > >> As usual, there are no free beer ;-) > > > > There's also certainly no shortage of other ways to break logical > > replication, including ways that would also be hard to recover from > > today other than doing a full resync. > > Sure, but that seems more like an argument against creating additional > ones (and for preventing those that already exist). I'm not sure this > particular feature is where we should draw the line, though. I was actually going in the other direction- we should allow it because advanced users may know what they're doing better than we do and we shouldn't prevent things just because they might be misused or misunderstood by a user. > > What that seems to indicate, to me at least, is that it'd be awful > > nice to have a way to resync the data which doesn't necessairly > > involve transferring all of it over again. > > > > Of course, it'd be nice if we could track those dependencies too, > > but that's yet another thing. > > Yep, that seems like a good idea in general. Both here and for > functional indexes (although I suppose sure is a technical reason why it > wasn't implemented right away for them). We don't track function dependencies in general and I could certainly see cases where you really wouldn't want to do so, at least not in the same way that we track FKs or similar. I do wonder if maybe we didn't track function dependencies because we didn't (yet) have create or replace function and that now we should. We don't track dependencies inside a function either though. > > In short, I'm not sure that I agree with the idea that we shouldn't > > allow this and instead I'd rather we realize it and put the logical > > replication into some kind of an error state that requires a resync. > > That would still mean a need to resync the data to recover, so I'm not > sure it's really an improvement. And I suppose it'd require tracking the > dependencies, because how else would you mark the subscription as > requiring a resync? At which point we could decline the DROP without a > CASCADE, just like we do elsewhere, no? I was actually thinking more along the lines of just simply marking the publication/subscription as being in a 'failed' state when a failure actually happens, and maybe even at that point basically throwing away everything except the shell of the publication/subscription (so the user can see that it failed and come in and properly drop it); I'm thinking about this as perhaps similar to a transaction being aborted. Thanks! Stephen signature.asc Description: PGP signature
Re: valgrind issues on Fedora 28
Tomas Vondra writes: > On 12/14/18 5:10 AM, Tom Lane wrote: >> So I just got around to trying to do some valgrind testing for the first >> time in a couple of months, and what I find is that this patch completely >> breaks valgrind for me ... >> This is valgrind 3.8.1 (RHEL6's version), so not bleeding edge exactly, >> but we have very little evidence about how far back the committed patch >> works. I'm inclined to think we can't use this solution. > Fedora 28 has 3.14.0, so this seems to be due to some improvements since > 3.8.1. I'm not sure how to deal with this - surely we don't want to just > revert the change because that would start generating noise again. For a > moment I thought we might wildcard the error type somehow, so that it > would accept Memcheck:* but AFAICs that is not supported. Meh. There are lots of situations where it's necessary to carry some platform-specific valgrind suppressions; I have half a dozen on my RHEL6 box because of various weirdness in its old version of perl, for example. I think this is one where people running code new enough to have this problem need to carry private suppressions of their own. In general, I'm not particularly on board with our valgrind.supp carrying suppressions for code outside our own code base: I think that's assuming WAY too much about which version of what is installed on a particular box. Maybe we could do something to make it simpler to have custom suppressions? Not sure what, though. regards, tom lane
Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
On Thu, Dec 13, 2018 at 10:40:59PM +0300, Alexander Korotkov wrote: > On Thu, Dec 13, 2018 at 8:06 PM Andrey Borodin wrote: > > That's the same variable, one place is definition while other is potential > > misuse. > > Seems like these 2 lines [0] > > > > + if (BufferIsValid(lbuffer)) > > + UnlockReleaseBuffer(lbuffer); > > > > are superfluous: lbuffer is UnlockReleased earlier. > > Thanks to everybody for noticing! Speaking more generally backpatch > to 9.4 was completely wrong: there were multiple errors. It's my > oversight, I forget how much better our xlog system became since 9.4 > :) > > I've pushed bf0e5a73be fixing that. I can confirm the compiler warning is now gone. Thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
removal of dangling temp tables
We recently ran into a funny situation, where autovacuum would not remove very old temp tables. The relfrozenxid of those tables was about to reach the max freeze age, so monitoring started to complain. It turned out that autovacuum saw that the backendId was used by a live backend ... but that session was in reality not using those temp tables, and had not used any temp table at all. They were left-overs from a crash months ago, and since the session was not using temp tables, they had not been removed. DISCARD ALL may have run, but had no effect. I think the best way to fix this is to call RemoveTempRelations() unconditionally at session start (without doing the rest of the temp table setup, just the removal.) In versions earlier than pg11, related issues occur if you have a crash with autovacuum off and/or workers disabled, and temp tables are leaked in backendID 1 or 2; then start with normal values. In that case, those backendIDs are used by the logical replication launcher and the autovacuum launcher, so autovacuum does not remove them either. This was fixed in PG11 inadvertently by this commit: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=943576bddcb52971041d9f5f806789921fa107ee The reason the commit fixes it is that now the databaseID of the PGPROC entry is compared to the temp table's database; and for those worker processes, the DatabaseId is InvalidOid so it all works out. This isn't a terribly interesting bug, as restarting with changed worker/autovacuum options after a crash that happens to leak temp tables should be quite rare. But anyway we can fix this second issue in prior branches by adding a comparison to databaseId to the existing 'if' test in autovacuum; easy enough, no compatibility concerns. This should also cover the case that one session crashes leaking temp tables, then the same session connects to a different database for a long time. (I didn't verify this last point to be a real problem.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
A case for UPDATE DISTINCT attribute
I have observed that the following pattern is repeating in our data management programs: UPDATE event SET fuid = ${fuid}, venue_id = ${venueId}, url = ${url} WHERE id = ${id} AND fuid IS != ${fuid} AND venue_id IS != ${venueId} AND url IS DISTINCT FROM ${url}; Note: "url" can be null. Therefore, using IS DISTINCT FROM. The reasons we are using this pattern are multiple: - an empty update will trigger matching triggers. - an empty update will be WAL-logged - an empty update create dead tuples that will need to be cleaned up by AUTOVACUUM In cases where the data does not change, all of these are undesirable side effects. Meanwhile, a WHERE condition that excludes rows with matching values makes this into a noop in case of matching target column values. It appears this that this pattern should be encouraged, but the verbosity (and the accompanying risk of introducing logical error, e.g. accidentally using = comparison on a NULLable column) makes this a rarely used pattern. I suggest that introducing an attribute such as "UPDATE DISTINCT", e.g. UPDATE DISTINCT event SET fuid = ${fuid}, venue_id = ${venueId}, url = ${url} WHERE id = ${id} would encourage greater adoption of such pattern. Is there a technical reason this does not existing already? ᐧ
Re: removal of dangling temp tables
On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera wrote: > I think the best way to fix this is to call RemoveTempRelations() > unconditionally at session start (without doing the rest of the temp > table setup, just the removal.) That would certainly simplify things. I think I thought about that as far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed like a significant behavior change and I wasn't sure that everyone would like it. In particular, it adds overhead to backend startup that, in the case of a large temp schema, could be fairly long. Nevertheless, I tentatively think that a change like this is a good idea. I wouldn't back-patch it, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: fix psql \conninfo & \connect when using hostaddr
On 2018-Nov-09, Michael Paquier wrote: > On Thu, Nov 08, 2018 at 12:13:31PM -0300, Alvaro Herrera wrote: > > Right, that too. Fortunately I think compilers warn about > > mismatching indentation nowadays, at least in some cases. > > I don't recall seeing a compiler warning about that, but I do recall > coverity complaining loudly about such things. :-) /pgsql/source/master/src/backend/catalog/namespace.c: In function 'InitRemoveTempRelations': /pgsql/source/master/src/backend/catalog/namespace.c:4132:2: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if (OidIsValid(namespaceOid)) ^~ /pgsql/source/master/src/backend/catalog/namespace.c:4134:3: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' get_namespace_oid(namespaceName, true); ^ $ gcc --version gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Remove Deprecated Exclusive Backup Mode
On Fri, Dec 14, 2018 at 8:27 AM David Steele wrote: > If the server crashes during a backup, the server probably won't > restart. We say "may" here but if your backup runs more than a few > checkpoints it's more like won't. The remedy is to go manually delete a > file the user may have never heard of. They are often hesitant to do so > -- for good reason. Sure. So it's broken in the sense that if you don't read and carefully follow the directions, you will have problems. But that's also true of an automobile, a power saw, the DELETE command, antibiotics, and the Internal Revenue Code. Many things in life are complicated and require due caution, and yet people regularly navigate things that are VASTLY more complicated than "if the server crashes during a hot backup, remove this file." Especially because if we subsequently fail for this reason, we give you a very precise hint that tells you exactly what to do: HINT: If you are not restoring from a backup, try removing the file "/backup_label" Now, I grant that emitting a HINT telling you exactly what to do is not as good as not requiring manual user invention in the first place, but it's not like we're requiring you to carry out some byzantine process that nobody can possible understand. The "if" condition here is one that any DBA should be able to understand: am I, or am I not, in the process of restoring a backup? The system does not know, but the DBA should. And once you know that the answer is "no, I'm not restoring a backup", then you just remove the exact pathname indicated and try it again and everything works fine. Peter's complaint -- that this is a pretty liberal reading of the word "broken" -- is IMHO very well put. Saying that something is broken means it doesn't work. But the exclusive backup mode works fine if used according to the directions. The fact that people often get confused and don't follow the directions, and that the directions tend to end up requiring manual intervention after a system crash, is unfortunate, and it is why we have a new API. But "that thing isn't as well designed as it could've been" is not the same thing as "that thing does not work when used as designed," no matter how much you keep insisting the contrary. It does work. It has worked for a long time. Many people have used it successfully. It was the ONLY way of doing this for many years. There is no necessary reason why it has to be a huge problem, and I don't think it is. In my experience with EnterpriseDB customers, this IS an occasional source of confusion, but there are other things that are a problem a lot more frequently, like work_mem, which you often can't set high enough to get good query performance without causing OOM events when the system gets busy. I'd hesitate to call work_mem broken, but a GUC for which no single value both adequately protects against OOM and gives workable performance is closer to broken than an exclusive backup mode, which actually does work - albeit with occasional manual intervention - if you read and follow the directions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: removal of dangling temp tables
On 2018-Dec-14, Robert Haas wrote: > On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera > wrote: > > I think the best way to fix this is to call RemoveTempRelations() > > unconditionally at session start (without doing the rest of the temp > > table setup, just the removal.) > > That would certainly simplify things. I think I thought about that as > far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed > like a significant behavior change and I wasn't sure that everyone > would like it. In particular, it adds overhead to backend startup > that, in the case of a large temp schema, could be fairly long. Hmm, I think in the case covered by your commit, that is a session that crashes with a few thousands of temp tables, this new patch might cause a failure to open a new session altogether. Maybe it'd be better to change temp table removal to always drop max_locks_per_transaction objects at a time (ie. commit/start a new transaction every so many objects). -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 13a24631fc..003bbabf1f 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -4118,6 +4118,26 @@ RemoveTempRelations(Oid tempNamespaceId) } /* + * Remove temp tables, without relying on myTempNamespace being set, and + * without setting it. This is used at session start. + */ +void +InitRemoveTempRelations(void) +{ + Oid namespaceOid; + char namespaceName[NAMEDATALEN]; + + Assert(MyBackendId != InvalidBackendId); + + snprintf(namespaceName, sizeof(namespaceName), "pg_temp_%d", MyBackendId); + + namespaceOid = get_namespace_oid(namespaceName, true); + + if (OidIsValid(namespaceOid)) + RemoveTempRelations(namespaceOid); +} + +/* * Callback to remove temp relations at backend exit. */ static void diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index b636b1e262..b41b47222c 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -1074,6 +1074,10 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, if (!bootstrap) pgstat_bestart(); + /* Remove any temp tables that might have leaked previously */ + if (!bootstrap) + InitRemoveTempRelations(); + /* close the transaction we started above */ if (!bootstrap) CommitTransactionCommand(); diff --git a/src/include/catalog/namespace.h b/src/include/catalog/namespace.h index 0e202372d5..f90eeff1b7 100644 --- a/src/include/catalog/namespace.h +++ b/src/include/catalog/namespace.h @@ -144,6 +144,7 @@ extern void GetTempNamespaceState(Oid *tempNamespaceId, Oid *tempToastNamespaceId); extern void SetTempNamespaceState(Oid tempNamespaceId, Oid tempToastNamespaceId); +extern void InitRemoveTempRelations(void); extern void ResetTempTableNamespace(void); extern OverrideSearchPath *GetOverrideSearchPath(MemoryContext context);
Re: removal of dangling temp tables
On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera wrote: > Hmm, I think in the case covered by your commit, that is a session that > crashes with a few thousands of temp tables, this new patch might cause > a failure to open a new session altogether. Oh, good point. Or if the catalog is corrupted. > Maybe it'd be better to change temp table removal to always drop > max_locks_per_transaction objects at a time (ie. commit/start a new > transaction every so many objects). We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure how we'd implement that, but I agree it would be significantly better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: removal of dangling temp tables
Robert Haas writes: > On Fri, Dec 14, 2018 at 11:29 AM Alvaro Herrera > wrote: >> I think the best way to fix this is to call RemoveTempRelations() >> unconditionally at session start (without doing the rest of the temp >> table setup, just the removal.) > That would certainly simplify things. I think I thought about that as > far back as a734fd5d1c309cc553b7c8c79fba96218af090f7 but it seemed > like a significant behavior change and I wasn't sure that everyone > would like it. In particular, it adds overhead to backend startup > that, in the case of a large temp schema, could be fairly long. > Nevertheless, I tentatively think that a change like this is a good > idea. I wouldn't back-patch it, though. I seem to recall discussions about having crash recovery go around and clean out temp tables. That seems like a better plan than penalizing every session start with this. regards, tom lane
Re: Ryu floating point output patch
> "Thomas" == Thomas Munro writes: >>> Do we have any reason for the restriction beyond stylistic preference? >> No. Robert and Tom were against allowing mixing declarations and >> code, a few more against // comments. I don't quite agree with >> either, but getting to the rest of C99 seemed more important than >> fighting those out at that moment. Thomas> -1 for making superficial changes to code that we are Thomas> "vendoring", unless it is known to be dead/abandoned and we're Thomas> definitively forking it. It just makes it harder to track Thomas> upstream's bug fixes and improvements, surely? Well, the question there is how far to take that principle; do we avoid pgindent'ing the code, do we avoid changing uint64_t etc. to uint64 etc., and so on. (I notice that a stray uint8_t that I left behind broke the Windows build but not the linux/bsd one, so something would have to be fixed in the windows build in order to avoid having to change that.) The Ryu code is perhaps an extreme example because it follows almost the exact reverse of our coding standard. -- Andrew (irc:RhodiumToad)
Re: removal of dangling temp tables
On Fri, Dec 14, 2018 at 12:57 PM Tom Lane wrote: > I seem to recall discussions about having crash recovery go around > and clean out temp tables. That seems like a better plan than > penalizing every session start with this. Well, crash recovery already removes the files, but it can't really remove the catalog entries, can it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Speeding up text_position_next with multibyte encodings
On 11/30/18, Dmitry Dolgov <9erthali...@gmail.com> wrote: > Unfortunately, patch doesn't compile anymore due: > > varlena.c: In function ‘text_position_next_internal’: > varlena.c:1337:13: error: ‘start_ptr’ undeclared (first use in this > function) > Assert(start_ptr >= haystack && start_ptr <= haystack_end); > > Could you please send an updated version? For now I'm moving it to the next > CF. I signed up to be a reviewer, and I will be busy next month, so I went ahead and fixed the typo in the patch that broke assert-enabled builds. While at it, I standardized on the spelling "start_ptr" in a few places to match the rest of the file. It's a bit concerning that it wouldn't compile with asserts, but the patch was written by a committer and seems to work. On 10/19/18, Heikki Linnakangas wrote: > This is a win in most cases. One case is slower: calling position() with > a large haystack input, where the match is near the end of the string. I wanted to test this case in detail, so I ran the attached script, which runs position() in three scenarios: short: 1 character needle at the end of a ~1000 character haystack, repeated 1 million times medium: 50 character needle at the end of a ~1 million character haystack, repeated 1 million times long: 250 character needle at the end of an ~80 million character haystack (~230MB, comfortably below the 256MB limit Heikki reported), done just one time I took the average of 5 runs using Korean characters in both UTF-8 (3-byte) and EUC-KR (2-byte) encodings. UTF-8 length master patch short 2.26s 2.51s med 2.28s 2.54s long3.07s 3.11s EUC-KR length master patch short 2.29s 2.37s med 2.29s 2.36s long1.75s 1.71s With UTF-8, the patch is 11-12% slower on short and medium strings, and about the same on long strings. With EUC-KR, the patch is about 3% slower on short and medium strings, and 2-3% faster on long strings. It seems the worst case is not that bad, and could be optimized, as Heikki said. -John Naylor From ca7a228174acb8b85b87642bb6df182139587c05 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Sun, 18 Nov 2018 17:40:09 +0700 Subject: [PATCH v2] Use single-byte Boyer-Moore-Horspool search even with multibyte encodings. The old implementation first converted the input strings to arrays of wchars, and performed the on those. However, the conversion is expensive, and for a large input string, consumes a lot of memory. Avoid the conversion, and instead use the single-byte algorithm even with multibyte encodings. That has a couple of problems, though. Firstly, we might get fooled if the needle string's byte sequence appears embedded inside a different string. We might incorrectly return a match in the middle of a multi-byte character. The second problem is that the text_position function needs to return the position of the match, counted in characters, rather than bytes. We can work around both of those problems by an extra step after finding a match. Walk the string one character at a time, starting from the beginning, until we reach the position of the match. We can then check if the match was found at a valid character boundary, which solves the first problem, and we can count the characters, so that we can return the character offset. Walking the characters is faster than the wchar conversion, especially in the case that there are no matches and we can avoid it altogether. Also, avoiding the large allocation allows these functions to work with inputs larger than 256 MB, even with multibyte encodings. For UTF-8, we can even skip the step to verify that the match lands on a character boundary, because in UTF-8, the byte sequence of one character cannot appear in the middle of a different character. I think many other encodings have the same property, but I wasn't sure, so I only enabled that optimization for UTF-8. Most of the callers of the text_position_setup/next functions were actually not interested in the position of the match, counted in characters. To cater for them, refactor the text_position_next() interface into two parts: searching for the next match (text_position_next()), and returning the current match's position as a pointer (text_position_get_match_ptr()) or as a character offset (text_position_get_match_pos()). Most callers of text_position_setup/next are not interested in the character offsets, so getting the pointer to the match within the original string is a more convenient API for them. It also allows skipping the character counting with UTF-8 encoding altogether. --- src/backend/utils/adt/varlena.c | 475 +--- 1 file changed, 253 insertions(+), 222 deletions(-) diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 0fd3b15748..90f5a37f08 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/ba
Re: Ryu floating point output patch
Andrew Gierth writes: > "Thomas" == Thomas Munro writes: > Thomas> -1 for making superficial changes to code that we are > Thomas> "vendoring", unless it is known to be dead/abandoned and we're > Thomas> definitively forking it. It just makes it harder to track > Thomas> upstream's bug fixes and improvements, surely? > Well, the question there is how far to take that principle; do we avoid > pgindent'ing the code, do we avoid changing uint64_t etc. to uint64 > etc., and so on. > The Ryu code is perhaps an extreme example because it follows almost the > exact reverse of our coding standard. My heart kind of sinks when looking at this patch, because it's a large addition of not-terribly-well-documented code that nobody here really understands, never mind the problem that it's mostly not close to our own coding standards. Our track record in borrowing code from "upstream" projects is pretty miserable: almost without exception, we've found ourselves stuck with maintaining such code ourselves after a few years. I don't see any reason to think that wouldn't be true here; in fact there's much more reason to worry here than we had for, eg, borrowing the regex code. The maintenance track record of this github repo appears to span six months, and it's now been about four months since the last commit. It might be abandonware already. Is this a path we really want to go down? I'm not convinced the cost/benefit ratio is attractive. If we do go down this path, though, I'm not too worried about making it simple to absorb upstream fixes; I bet there will be few to none. (Still, you might want to try to automate and document the coding format conversion steps, along the line of what I've done recently for our copy of tzcode.) regards, tom lane
Re: Ryu floating point output patch
Hi, On 2018-12-14 13:25:29 -0500, Tom Lane wrote: > Our track record in borrowing code from "upstream" projects is pretty > miserable: almost without exception, we've found ourselves stuck with > maintaining such code ourselves after a few years. I don't see any > reason to think that wouldn't be true here; in fact there's much more > reason to worry here than we had for, eg, borrowing the regex code. > The maintenance track record of this github repo appears to span six > months, and it's now been about four months since the last commit. > It might be abandonware already. It's been absorbed into MSVC's standard library and a bunch of other projects, so there's certainly some other prominent adoption. The last commit was a month ago, no? November 6th afaict. > Is this a path we really want to go down? I'm not convinced the > cost/benefit ratio is attractive. float->text conversion is one of the major bottlenecks when backing up postgres, it's definitely a pain-point in practice. I've not really seen a nicer implementation anywhere, not even close. Greetings, Andres Freund
Re: removal of dangling temp tables
Robert Haas writes: > On Fri, Dec 14, 2018 at 12:57 PM Tom Lane wrote: >> I seem to recall discussions about having crash recovery go around >> and clean out temp tables. That seems like a better plan than >> penalizing every session start with this. > Well, crash recovery already removes the files, but it can't really > remove the catalog entries, can it? Hm. It *could*, if we wanted it to run some transactions after finishing recovery. But I guess I wonder why bother; if the disk space is gone then there's not that much reason to be in a hurry to get rid of the catalog entries. The particular problem Alvaro is complaining of might be better handled by ignoring temp tables while computing max freeze age etc. I have some recollection that we'd intentionally included them, but I wonder why really; it's not like autovacuum is going to be able to do anything about an ancient temp table. Alternatively, maybe we could have backends flag whether they've taken ownership of their temp schemas or not, and let autovacuum flush old temp tables if not? regards, tom lane
Re: Computing the conflict xid for index page-level-vacuum on primary
On Fri, Dec 14, 2018 at 4:43 AM Andres Freund wrote: > This leads me to think that we possibly should move computation of the > last removed xid from recovery to the primary, during the generation of > the xl_btree_delete WAL record. Do I understand correctly that we need this xid computation, because we may delete some index tuples using kill_prior_tuple before we prune corresponding heap tuples (which would be also replicated and cause required conflict)? If so, then can we just give up with that? That is before setting kill_prior_tuple = true, prune corresponding heap tuples. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: removal of dangling temp tables
Hi, On 2018-12-14 13:35:50 -0500, Tom Lane wrote: > Hm. It *could*, if we wanted it to run some transactions after > finishing recovery. How, we don't have infrastructure for changing databases yet? > But I guess I wonder why bother; if the disk > space is gone then there's not that much reason to be in a hurry > to get rid of the catalog entries. The particular problem Alvaro > is complaining of might be better handled by ignoring temp tables > while computing max freeze age etc. I have some recollection that > we'd intentionally included them, but I wonder why really; it's > not like autovacuum is going to be able to do anything about an > ancient temp table. We can't truncate the clog, adapt the xid horizon, etc if there's any temp tables. Otherwise you'd get failures when reading from one, no? Greetings, Andres Freund
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Wed, Nov 28, 2018 at 1:43 PM Alvaro Herrera wrote: > Right, I think option 4 is a clear improvement over option 1. I can get > behind that one. Since not many people care to vote, I think this tips > the scales enough to that side. I'm showing up very late to the party here, but I like option 1 best. I feel like the SQL standard has a pretty clear idea that NULL is how you represent a value is unknown, which shows up in a lot of places. Deciding that we're going to use a different sentinel value in this one case because NULL is a confusing concept in general seems pretty strange to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: removal of dangling temp tables
On Fri, Dec 14, 2018 at 6:35 PM Tom Lane wrote: > Hm. It *could*, if we wanted it to run some transactions after > finishing recovery. It'd have to launch a separate process per database. That would be useful infrastructure for other things, too, like automatic catalog upgrades in minor releases, but I'm not volunteering to write that infrastructure right now. > Alternatively, maybe we could have backends flag whether they've > taken ownership of their temp schemas or not, and let autovacuum > flush old temp tables if not? Yes, that seems like a possibly promising approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Computing the conflict xid for index page-level-vacuum on primary
Hi, On 2018-12-14 21:39:48 +0300, Alexander Korotkov wrote: > On Fri, Dec 14, 2018 at 4:43 AM Andres Freund wrote: > > This leads me to think that we possibly should move computation of the > > last removed xid from recovery to the primary, during the generation of > > the xl_btree_delete WAL record. > > Do I understand correctly that we need this xid computation, because > we may delete some index tuples using kill_prior_tuple before we prune > corresponding heap tuples (which would be also replicated and cause > required conflict)? Correct. > If so, then can we just give up with that? That is before setting > kill_prior_tuple = true, prune corresponding heap tuples. That'd require WAL logging such changes, which'd be pretty bad, because we'd need to do it one-by-one. What we could do, and what I suggested earlier, is that we do a bit more pruning when emitting such WAL records. Greetings, Andres Freund
Re: Ryu floating point output patch
Andres Freund writes: > On 2018-12-14 13:25:29 -0500, Tom Lane wrote: >> The maintenance track record of this github repo appears to span six >> months, and it's now been about four months since the last commit. >> It might be abandonware already. > The last commit was a month ago, no? November 6th afaict. Hmm, the commit history I was looking at ended on Aug 19, but I must've done something wrong, because going in from a different angle shows me commits up to November. > It's been absorbed into MSVC's standard library and a bunch of other > projects, so there's certainly some other prominent adoption. If we think that's going on, maybe we should just sit tight till glibc absorbs it. regards, tom lane
Re: Ryu floating point output patch
Hi, On 2018-12-14 13:47:53 -0500, Tom Lane wrote: > Andres Freund writes: > > It's been absorbed into MSVC's standard library and a bunch of other > > projects, so there's certainly some other prominent adoption. > > If we think that's going on, maybe we should just sit tight till glibc > absorbs it. All the stuff it'll have to put above it will make it slower anyway. It's not like this'll be a feature that's hard to rip out if all libc's have fast roundtrip safe conversion routines, or if something better comes along. Greetings, Andres Freund
Re: Ryu floating point output patch
> "Tom" == Tom Lane writes: Tom> The maintenance track record of this github repo appears to span Tom> six months, The algorithm was only published six months ago. -- Andrew (irc:RhodiumToad)
ExecBuildGroupingEqual versus collations
In pursuit of places that might not be on board with non-default collations, I tried sticking Assert(PG_GET_COLLATION() == C_COLLATION_OID); into nameeq() and the other name comparison functions, in my build with type name's typcollation changed to "C". I'm down to one place in the core regression tests that falls over because of that: ExecBuildGroupingEqual() figures it can get away with InitFunctionCallInfoData(*fcinfo, finfo, 2, InvalidOid, NULL, NULL); i.e. passing collation = InvalidOid to the type-specific equality functions. Now, it's certainly true that nameeq() doesn't need a collation spec today, any more than texteq() does, because both types legislate that equality is bitwise. But if we leave ExecBuildGroupingEqual like this, we're mandating that no type anywhere, ever, can have a collation-dependent notion of equality. Is that really a restriction we're comfortable with? citext is sort of the poster child here, because it already wishes it could have collation-dependent equality. We'd have to do some work to pass a reasonable collation choice through to this code from wherever we have the info available, but I don't think it'd be a huge amount of work. BTW, this case is perhaps a bit of a counterexample to Peter's idea about doing collation lookups earlier: if we change this, we'd presumably have to do collation lookup right here, even though we know very well that common types' equality functions won't use the information. In any case, I think we need a policy decision about whether it's our intent to allow collation-dependent equality or not. Comments? regards, tom lane
Re: Ryu floating point output patch
Andrew Gierth writes: > "Tom" == Tom Lane writes: > Tom> The maintenance track record of this github repo appears to span > Tom> six months, > The algorithm was only published six months ago. Doesn't really affect my point: there's little reason to believe that there will be an active upstream several years down the pike. regards, tom lane
Re: Ryu floating point output patch
> "Tom" == Tom Lane writes: Tom> (Still, you might want to try to automate and document the coding Tom> format conversion steps, along the line of what I've done recently Tom> for our copy of tzcode.) Working around our declaration-after-statement prohibition required manually moving some lines of code, manually separating some declarations from their assignments (removing const qualifiers), and as a last resort introducing new code blocks. I doubt it could be automated in any sane way. (This might be an argument to relax that rule.) Mechanical search-and-replace accounted for the _t suffix on types, the addition of the UINT64CONSTs, and changing assert to Assert; that much could be automated. pgindent did a pretty horrible job on the comments, a script could probably do better. I guess removal of the unwanted #ifs could be automated without too much difficulty. (But I'm not likely going to do any of this in the forseeable future, because I don't expect it to be useful.) -- Andrew (irc:RhodiumToad)
Re: 'infinity'::Interval should be added
On Thu, Dec 13, 2018 at 9:42 AM Simon Riggs wrote: > At present we have a timestamp of 'infinity' and infinite ranges, but no > infinite interval > > SELECT 'infinity'::timestamp; > works > > SELECT 'infinity'::interval; > ERROR: invalid input syntax for type interval: "infinity" > > Seems a strange anomaly that we should fix. Why? I consider it somewhat of a wart that timestamps allow infinity - it adds special case coding all over the place. Allowing intervals to be infinite as well seems like it'll just create more special cases without really solving anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ExecBuildGroupingEqual versus collations
On Fri, 14 Dec 2018 at 14:25, Tom Lane wrote: > Now, it's certainly true that nameeq() doesn't need a collation spec > today, any more than texteq() does, because both types legislate that > equality is bitwise. But if we leave ExecBuildGroupingEqual like this, > we're mandating that no type anywhere, ever, can have a > collation-dependent notion of equality. Is that really a restriction > we're comfortable with? citext is sort of the poster child here, > because it already wishes it could have collation-dependent equality. > There already seems to be a policy that individual types are not allowed to have their own concepts of equality: postgres=> select 'NaN'::float = 'NaN'::float; ?column? -- t (1 row) postgres=> According to the IEEE floating point specification, this should be f not t. To me, this suggests that we have already made this decision. Any type that needs an "almost but not quite equality" concept needs to define a custom operator or function. I think this is a reasonable approach to take for collation-related issues. Interesting relevant Python observation: >>> a = float ('NaN') >>> a is a True >>> a == a False >>> So Python works quite differently from Postgres in this respect (and many others).
Re: Add timeline to partial WAL segments
On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier wrote: > On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote: > > The LSN switch point is often the same even when servers are going to > > different timelines. If the LSN is different enough then the problem > > solves itself since the .partial will be on an entirely different > > segment. > > That would mean that WAL forked exactly at the same record. You have > likely seen more cases where than can happen in real life than I do. Suppose that the original master fails during an idle period, and we promote a slave. But we accidentally promote a slave that can't serve as the new master, like because it's in a datacenter with an unreliable network connection or one which is about to be engulfed in lava. So, we go to promote a different slave, and because we never got around to reconfiguring the standbys to follow the previous promotion, kaboom. Or, suppose we do PITR to recover from some user error, but then somebody screws up the contents of the recovered cluster and we have to do it over again. Naturally we'll recover to the same point. The new TLI is the only thing that is guaranteed to be unique with each new promotion, and I would guess that it is therefore the right thing to use to disambiguate them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ExecBuildGroupingEqual versus collations
Isaac Morland writes: > On Fri, 14 Dec 2018 at 14:25, Tom Lane wrote: >> Now, it's certainly true that nameeq() doesn't need a collation spec >> today, any more than texteq() does, because both types legislate that >> equality is bitwise. But if we leave ExecBuildGroupingEqual like this, >> we're mandating that no type anywhere, ever, can have a >> collation-dependent notion of equality. Is that really a restriction >> we're comfortable with? citext is sort of the poster child here, >> because it already wishes it could have collation-dependent equality. > There already seems to be a policy that individual types are not allowed to > have their own concepts of equality: > postgres=> select 'NaN'::float = 'NaN'::float; > ?column? > -- > t > (1 row) > According to the IEEE floating point specification, this should be f not t. I don't think that's a particularly relevant counter-example. The problem with following the IEEE spec for that is that it breaks pretty much everybody's notion of equality, in particular the reflexive axiom (A=A for any A), which is one of the axioms needed for btrees to work. That does *not* preclude allowing non-bitwise-equal values to be considered equal, which is the case that citext would like --- and, indeed, a case that IEEE math also requires (0 = -0), and which we do support for floats. regards, tom lane
Re: Ryu floating point output patch
> "Andres" == Andres Freund writes: >> Is this a path we really want to go down? I'm not convinced the >> cost/benefit ratio is attractive. Andres> float->text conversion is one of the major bottlenecks when Andres> backing up postgres, it's definitely a pain-point in practice. Also an issue with queries. I got into this whole area after diagnosing a problem for a user on IRC who was seeing a 4x slowdown for PG as compared to MSSQL, from 30 seconds to 120 seconds. We determined that the _entire_ difference was accounted for by float conversion. Now that was an unusual case, downloading a large dataset of floats at once into a statistics program, but it's very easy to show proportionally large overheads from float output: (using this table: create table flttst4 as select i a, i b, i c, i d, i::float8 e, random() f from generate_series(1::bigint, 1000::bigint) i; ) postgres=# copy flttst4(d) to '/dev/null'; -- bigint column COPY 1000 Time: 2166.001 ms (00:02.166) postgres=# copy flttst4(e) to '/dev/null'; -- float8 col, integer values COPY 1000 Time: 4233.005 ms (00:04.233) postgres=# copy flttst4(f) to '/dev/null'; -- float8 col, random() COPY 1000 Time: 7261.421 ms (00:07.261) -- vs. timings with Ryu: postgres=# copy flttst4(e) to '/dev/null'; -- float8 col, integer values COPY 1000 Time: 2391.725 ms (00:02.392) postgres=# copy flttst4(f) to '/dev/null'; -- float8 col, random() COPY 1000 Time: 2245.699 ms (00:02.246) -- Andrew (irc:RhodiumToad)
Re: ExecBuildGroupingEqual versus collations
Hi, On 2018-12-14 14:25:30 -0500, Tom Lane wrote: > In pursuit of places that might not be on board with non-default > collations, I tried sticking > > Assert(PG_GET_COLLATION() == C_COLLATION_OID); > > into nameeq() and the other name comparison functions, in my build with > type name's typcollation changed to "C". I'm down to one place in the > core regression tests that falls over because of that: > ExecBuildGroupingEqual() figures it can get away with > > InitFunctionCallInfoData(*fcinfo, finfo, 2, > InvalidOid, NULL, NULL); > > i.e. passing collation = InvalidOid to the type-specific equality > functions. Yea, I just cargo-culted that behaviour forward, the code previously did: bool execTuplesMatch(TupleTableSlot *slot1, TupleTableSlot *slot2, int numCols, AttrNumber *matchColIdx, FmgrInfo *eqfunctions, MemoryContext evalContext) ... /* Apply the type-specific equality function */ if (!DatumGetBool(FunctionCall2(&eqfunctions[i], attr1, attr2))) { result = false; /* they aren't equal */ break; } > Now, it's certainly true that nameeq() doesn't need a collation spec > today, any more than texteq() does, because both types legislate that > equality is bitwise. But if we leave ExecBuildGroupingEqual like this, > we're mandating that no type anywhere, ever, can have a > collation-dependent notion of equality. Is that really a restriction > we're comfortable with? citext is sort of the poster child here, > because it already wishes it could have collation-dependent equality. Don't we rely on that in other places too? > We'd have to do some work to pass a reasonable collation choice through > to this code from wherever we have the info available, but I don't > think it'd be a huge amount of work. Yea, I think it shouldn't be too hard for most callers. I think one issue here is that it's not clear how it'd be sensible to have collation dependent equality for cases where we don't have a real way to override the collation for an operation. It's not too hard to imagine adding something to GROUP BY, but set operations seem harder. > BTW, this case is perhaps a bit of a counterexample to Peter's idea about > doing collation lookups earlier: if we change this, we'd presumably have > to do collation lookup right here, even though we know very well that > common types' equality functions won't use the information. Hm. Greetings, Andres Freund
Re: 'infinity'::Interval should be added
On Fri, 14 Dec 2018 at 15:16, Robert Haas wrote: > Why? I consider it somewhat of a wart that timestamps allow infinity > - it adds special case coding all over the place. Allowing intervals > to be infinite as well seems like it'll just create more special cases > without really solving anything. > Au contraire, it eliminates special case coding all over the place. For example, if I have authorizations that don't expire, I can just give them an expiration of 'infinity' and operations like "today's date less than expiration" just work; I don't have to say "expiration null or greater than today's date". I would be interested if you have an example where the ability of date/timestamp values to be infinite adds special case coding. If we allow intervals to be infinity, then we will eliminate more special cases. Right now, whenever timestamp values are subtracted, the programmer has to consider what happens if an operand is infinity and handle that case specially. With infinite intervals allowed, this problem is reduced. For example, how long until expiry? Right now: CASE WHEN expiry = 'infinity' THEN NULL WHEN expiry = '-infinity' THEN '0' ELSE expiry - current_timestamp END With the proposal: expiry - current_timestamp And another improvement: infinity is represented by 'infinity' rather than the somewhat ambiguous NULL. Just for definiteness, I believe the following are the correct semantics for subtraction involving infinity: ∞ - t = ∞ (t ≠ ∞) -∞ - t = -∞ (t ≠ -∞) ∞ - ∞ = err -∞ - -∞ = err I'm not sure whether "err" should be an error, as it is now ("cannot subtract infinite dates") and like division by 0, or NULL. The wart I'm worried about is subtraction of infinite dates. Right now dates subtract to give integers; and there are no infinite integers. All the clever solutions to this I have right now involve making highly backward-incompatible changes.
Re: ExecBuildGroupingEqual versus collations
Andres Freund writes: > On 2018-12-14 14:25:30 -0500, Tom Lane wrote: >> Now, it's certainly true that nameeq() doesn't need a collation spec >> today, any more than texteq() does, because both types legislate that >> equality is bitwise. But if we leave ExecBuildGroupingEqual like this, >> we're mandating that no type anywhere, ever, can have a >> collation-dependent notion of equality. Is that really a restriction >> we're comfortable with? citext is sort of the poster child here, >> because it already wishes it could have collation-dependent equality. > Don't we rely on that in other places too? Possibly; the regression tests probably don't stress type "name" too hard, so my Assert might well not be finding some other code paths that do likewise. The question at hand is whether we're going to say those are bugs to be fixed, or that it's OK as-is. > I think one issue here is that it's not clear how it'd be sensible to > have collation dependent equality for cases where we don't have a real > way to override the collation for an operation. It's not too hard to > imagine adding something to GROUP BY, but set operations seem harder. Yeah, fair point. But we've got comparable issues with respect to which equality operator to use in the first place, for a lot of these constructs, cf https://www.postgresql.org/message-id/10492.1531515...@sss.pgh.pa.us (which I've not made any further progress on). Unless you want to throw up your hands and say that *all* equality is bitwise, we need to think about ways to deal with that. regards, tom lane
Re: 'infinity'::Interval should be added
Isaac Morland writes: > I would be interested if you have an example where the ability of > date/timestamp values to be infinite adds special case coding. I think Robert is talking about the implementation functions for timestamp-related operations, which typically do have to special-case infinite inputs. I take your point that there might be fewer special cases in the calling SQL code, though. > The wart I'm worried about is subtraction of infinite dates. Right now > dates subtract to give integers; and there are no infinite integers. All > the clever solutions to this I have right now involve making highly > backward-incompatible changes. As far as that goes, I'm content to say that infinity is outside the domain of type date. If you need infinities, use timestamp. regards, tom lane
Re: inconsistency and inefficiency in setup_conversion()
On 12/1/18, Dmitry Dolgov <9erthali...@gmail.com> wrote: > I see that the author keeps patch updated, but I'm a bit worried because of > lack of full review since probably May. I'm moving it to the next CF, let's > see > if there would be more feedback. > > P.S. adding Daniel, since he is assigned as a reviewer. Having heard nothing in a while, I've removed Daniel as a reviewer to make room for someone else. He is, of course free to re-add himself. v8 is attached. Since it's been a few months since last discussion, I'd like to summarize the purpose of this patch and advocate for its inclusion in v12: 1. Correctness In the intro thread [1], I showed that object comments on some conversions are wrong, and hard to fix given the current setup. This is a documentation bug of sorts. 2. Clean-up Currently, utils/mb/conversion_procs/Makefile has an ad-hoc script to generate the SQL file, which has to be duplicated in the MSVC tooling, and executed by initdb.c. Storing the conversions in .dat files removes the need for any of that. 3. Performance This patch shaves 5-6% off of initdb. Not as much as hoped, but still a nice bonus. [1] https://www.postgresql.org/message-id/CAJVSVGWtUqxpfAaxS88vEGvi%2BjKzWZb2EStu5io-UPc4p9rSJg%40mail.gmail.com --John Naylor From bdbdb36129708d1d417f8db9009b377c3d4e3e90 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Fri, 14 Dec 2018 15:19:54 -0500 Subject: [PATCH v8 1/2] Add pg_language lookup. This didn't seem worth doing before, but an upcoming commit will add 88 entries to pg_proc whose languge is 'c'. In addition, we can now use 'internal' in Gen_fmgr.pl instead of looking up the OID. This simplifies that script and its Makefile a bit. --- doc/src/sgml/bki.sgml| 6 +-- src/backend/catalog/genbki.pl| 8 +++ src/backend/utils/Gen_fmgrtab.pl | 7 ++- src/backend/utils/Makefile | 8 +-- src/include/catalog/pg_proc.dat | 92 src/include/catalog/pg_proc.h| 2 +- src/tools/msvc/Solution.pm | 4 +- 7 files changed, 64 insertions(+), 63 deletions(-) diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml index 786abb95d4..37bb7a2346 100644 --- a/doc/src/sgml/bki.sgml +++ b/doc/src/sgml/bki.sgml @@ -409,8 +409,8 @@ that's error-prone and hard to understand, so for frequently-referenced catalogs, genbki.pl provides mechanisms to write symbolic references instead. Currently this is possible for references -to access methods, functions, operators, opclasses, opfamilies, and -types. The rules are as follows: +to access methods, functions, languages, operators, opclasses, opfamilies, +and types. The rules are as follows: @@ -420,7 +420,7 @@ Use of symbolic references is enabled in a particular catalog column by attaching BKI_LOOKUP(lookuprule) to the column's definition, where lookuprule - is pg_am, pg_proc, + is pg_am, pg_proc, pg_language, pg_operator, pg_opclass, pg_opfamily, or pg_type. BKI_LOOKUP can be attached to columns of diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl index 8e2a2480be..8ccfcb7724 100644 --- a/src/backend/catalog/genbki.pl +++ b/src/backend/catalog/genbki.pl @@ -182,6 +182,13 @@ foreach my $row (@{ $catalog_data{pg_am} }) $amoids{ $row->{amname} } = $row->{oid}; } +# language OID lookup +my %langoids; +foreach my $row (@{ $catalog_data{pg_language} }) +{ + $langoids{ $row->{lanname} } = $row->{oid}; +} + # opclass OID lookup my %opcoids; foreach my $row (@{ $catalog_data{pg_opclass} }) @@ -256,6 +263,7 @@ foreach my $row (@{ $catalog_data{pg_type} }) # Map catalog name to OID lookup. my %lookup_kind = ( pg_am => \%amoids, + pg_language => \%langoids, pg_opclass => \%opcoids, pg_operator => \%operoids, pg_opfamily => \%opfoids, diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl index ed16737a6a..b6809c7277 100644 --- a/src/backend/utils/Gen_fmgrtab.pl +++ b/src/backend/utils/Gen_fmgrtab.pl @@ -59,6 +59,8 @@ die "No include path; you must specify -I.\n" if !$include_path; # Note: We pass data file names as arguments and then look for matching # headers to parse the schema from. This is backwards from genbki.pl, # but the Makefile dependencies look more sensible this way. +# We currently only need pg_proc, but retain the possibility of reading +# more than one data file. my %catalogs; my %catalog_data; foreach my $datfile (@input_files) @@ -82,9 +84,6 @@ foreach my $datfile (@input_files) my $FirstGenbkiObjectId = Catalog::FindDefinedSymbol('access/transam.h', $include_path, 'FirstGenbkiObjectId'); -my $INTERNALlanguageId = - Catalog::FindDefinedSymbolFromData($catalog_data{pg_language}, - 'INTERNALlanguageId'); # Collect certain fields from pg_proc.dat. my @fmgr = (); @@ -94,7 +93,7 @@ foreach my $row (@{ $catalog_data{pg_proc} }) my %bki_values = %$row; # Select out just the rows for i
Re: valgrind issues on Fedora 28
On 12/14/18 4:58 PM, Tom Lane wrote: > ... > > In general, I'm not particularly on board with our valgrind.supp > carrying suppressions for code outside our own code base: I think > that's assuming WAY too much about which version of what is installed > on a particular box. > Fair point. > Maybe we could do something to make it simpler to have custom > suppressions? Not sure what, though. > I was thinking that perhaps we could allows specifying path to extra suppressions and pass that to valgrind. But we don't actually invoke valgrind, that's something people do on their own anyway - so we don't have anywhere to pass the path to. And whoever invokes valgrind can simply stick it directly into the command they're using (as it allows specifying multiple --suppressions= options). Or perhaps just put it into ~/.valgrindrc. So perhaps we should simply revert that commit and be done with it. One place that will need to solve it is buildfarm client, but it could pick either of the options I mentioned. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Speeding up text_position_next with multibyte encodings
On 12/14/18, John Naylor wrote: > I signed up to be a reviewer, and I will be busy next month, so I went > ahead and fixed the typo in the patch that broke assert-enabled > builds. While at it, I standardized on the spelling "start_ptr" in a > few places to match the rest of the file. It's a bit concerning that > it wouldn't compile with asserts, but the patch was written by a > committer and seems to work. I just noticed that the contrib/citext test fails. I've set the status to waiting on author. -John Naylor
Re: Bogus EPQ plan construction in postgres_fdw
On Wed, Dec 12, 2018 at 8:00 PM Tom Lane wrote: > By chance I noticed that postgres_fdw's postgresGetForeignPlan() assumes > --- without any checking --- that the outer_plan it's given for a join > relation must have a NestLoop, MergeJoin, or HashJoin node at the top. > That's been wrong at least since commit 4bbf6edfb (which could cause > insertion of a Sort node on top) and it seems like a pretty unsafe > thing to Just Assume even without that. Thanks for taking care of this. I've looked at this code any number of times and never quite noticed the assumption that outer_plan had to be a join. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: 'infinity'::Interval should be added
On Fri, Dec 14, 2018 at 9:11 PM Tom Lane wrote: > Isaac Morland writes: > > I would be interested if you have an example where the ability of > > date/timestamp values to be infinite adds special case coding. > > I think Robert is talking about the implementation functions for > timestamp-related operations, which typically do have to special-case > infinite inputs. I take your point that there might be fewer special > cases in the calling SQL code, though. Exactly. And now that we have JIT compilation, the overhead of such cases in more worth considering than previously. To take a related example, int4pl() can be reduced to a single instruction if you ignore overflow -- but you can't ignore overflow, which means you needs some more instructions to handle the overflow case, which means that the number of instructions is actually increasing quite a bit. Without JIT, maybe that's not too noticeable because you've got to pay the cost of setting up a stack frame for the function and tearing it down, but JIT can optimize those costs away. So essentially I think supporting special values like infinity boils down to trading away some small amount of performance -- more likely to be noticeable with JIT -- for some amount of possible programmer convenience. Some people may think that's a good trade, and other people may not like it. It just depends on whether the 'infinity' value is useful to you. If it is, you'll probably like the trade, because aside from the notational cost which Isaac mentioned, it's probably vastly faster to handle the infinity values in C code than to stick CASE..WHEN in to an SQL query. If it's not, you may dislike it. If your application code now has to know about the possibility 'infinity' value that it otherwise wouldn't have to worry about, you may dislike it for that reason also. I'm not sure there's one right answer here - my personal feeling is that infinite values are a wart, but I grant Isaac's point that they can sometimes simplify SQL coding. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Catalog views failed to show partitioned table information.
On Fri, Dec 14, 2018 at 05:21:49PM +0530, Suraj Kharage wrote: > There are some catalog views which do not show the partitioned table and > its index entry. > One of them is "pg_indexes" which failed to show the partitioned index. > Attached the patch which fixes the same. I tend to agree with your comment here. pg_tables lists partitioned tables, but pg_indexes is forgotting about partitioned indexes. So this is a good thing to add. > Other views such as pg_stat*,pg_statio_* has the same problem for > partitioned tables and indexes. > Since the partitioned tables and its indexes considered as a dummy, they do > not have any significance in stat tables, > can we still consider adding relkind=p in these pg_stat_* views? Thoughts? I am less sure about that as partitioned relations do not have a physical presence. -- Michael signature.asc Description: PGP signature
log bind parameter values on error
Hello, I'd like to propose a patch to log bind parameter values not only when logging duration, but also on error (timeout in particular) or in whatever situation the statement normally gets logged. This mostly could be useful when the request originator doesn't log them either, so it's hard to reproduce the problem. Unfortunately, when enabled, the feature comes with some memory and CPU overhead, as we cannot convert certain values to text when in aborted transaction. We potentially could do the trick with built-in types, but it would need cautious work with composite types, and also require more computation on the logging stage, which is a risk of cascading errors. Custom types still wouldn't be loggable, even as passed by client, which would be not great. So I decided to cache textual representations on bind stage, which is especially easy if the client uses text protocol. Best, Alexey diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 4a7121a..7b38aed 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6265,6 +6265,23 @@ log_line_prefix = '%m [%p] %q%u@%d/%a ' + + log_parameters (boolean) + + log_parameters configuration parameter + + + + +Controls whether the statement is logged with bind parameter values. +It adds some overhead, as postgres will cache textual +representations of parameter values in memory for all statements, +even if they eventually do not get logged. The default is +off. Only superusers can change this setting. + + + + log_statement (enum) diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c index 5684997..ba431ce 100644 --- a/src/backend/commands/portalcmds.c +++ b/src/backend/commands/portalcmds.c @@ -94,7 +94,7 @@ PerformCursorOpen(DeclareCursorStmt *cstmt, ParamListInfo params, /* * Create a portal and copy the plan and queryString into its memory. */ - portal = CreatePortal(cstmt->portalname, false, false); + portal = CreatePortal(cstmt->portalname, false, false, false); oldContext = MemoryContextSwitchTo(portal->portalContext); diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 6036b73..363c096 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -404,6 +404,7 @@ EvaluateParams(PreparedStatement *pstmt, List *params, paramLI->parserSetup = NULL; paramLI->parserSetupArg = NULL; paramLI->numParams = num_params; + paramLI->hasTextValues = false; i = 0; foreach(l, exprstates) diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index fc7c605..6d047cd 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -921,6 +921,7 @@ postquel_sub_params(SQLFunctionCachePtr fcache, paramLI->parserSetup = NULL; paramLI->parserSetupArg = NULL; paramLI->numParams = nargs; + paramLI->hasTextValues = false; fcache->paramLI = paramLI; } else diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index ad72667..532a365 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -1302,7 +1302,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan, else { /* In this path, error if portal of same name already exists */ - portal = CreatePortal(name, false, false); + portal = CreatePortal(name, false, false, false); } /* Copy the plan's query string into the portal */ @@ -2399,6 +2399,7 @@ _SPI_convert_params(int nargs, Oid *argtypes, paramLI->parserSetup = NULL; paramLI->parserSetupArg = NULL; paramLI->numParams = nargs; + paramLI->hasTextValues = false; for (i = 0; i < nargs; i++) { diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c index 79197b1..3c3b152 100644 --- a/src/backend/nodes/params.c +++ b/src/backend/nodes/params.c @@ -31,6 +31,8 @@ * set of parameter values. If dynamic parameter hooks are present, we * intentionally do not copy them into the result. Rather, we forcibly * instantiate all available parameter values and copy the datum values. + * + * We don't copy textual representations here. */ ParamListInfo copyParamList(ParamListInfo from) @@ -53,6 +55,7 @@ copyParamList(ParamListInfo from) retval->parserSetup = NULL; retval->parserSetupArg = NULL; retval->numParams = from->numParams; + retval->hasTextValues = false; for (i = 0; i < from->numParams; i++) { @@ -229,6 +232,7 @@ RestoreParamList(char **start_address) paramLI->parserSetup = NULL; paramLI->parserSetupArg = NULL; paramLI->numParams = nparams; + paramLI->hasTextValues = false; for (i = 0; i < nparams; i++) { diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 5ab7d3c..2ad56a2 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -183,7 +183,7 @@ static void f
Re: Variable-length FunctionCallInfoData
Hi, On 2018-10-09 12:18:02 -0700, Andres Freund wrote: > Here's an updated version of the patch. Besides a rebase the biggest > change is that I've wrapped: > > On 2018-06-05 10:29:52 -0700, Andres Freund wrote: > > There's some added uglyness, which I hope we can polish a bit > > further. Right now we allocate a good number of FunctionCallInfoData > > struct on the stack - which doesn't quite work afterwards anymore. So > > the stack allocations, for the majoroity cases where the argument number > > is known, currently looks like: > > > > union { > > FunctionCallInfoData fcinfo; > > char *fcinfo_data[SizeForFunctionCallInfoData(0)]; > > } fcinfodata; > > FunctionCallInfo fcinfo = &fcinfodata.fcinfo; > > > > that's not pretty, but also not that bad. > > into a macro STACK_FCINFO_FOR_ARGS(varname, numargs). That makes the > code look much nicer. > > I think we should go for this. If there's some agreement on that I'll > perform a bit more polishing. > > I think it'd probably good to add accessors for value/nullness in > arguments that hide the difference between extension authors. Would probably mostly make sense if we backpatched > those for compatibility. > > > Also attached is a second patch that avoids all the duplication in > fmgr.[ch]. While the savings are nice, I'm a bit doubtful that the > amount of magic here is reasonable. Any opinions? Any comments? Otherwise I plan to press forward with this soon-ish. Greetings, Andres Freund
Re: Add timeline to partial WAL segments
On 12/14/18 3:26 PM, Robert Haas wrote: > On Thu, Dec 13, 2018 at 12:17 AM Michael Paquier wrote: >> On Wed, Dec 12, 2018 at 07:54:05AM -0500, David Steele wrote: >>> The LSN switch point is often the same even when servers are going to >>> different timelines. If the LSN is different enough then the problem >>> solves itself since the .partial will be on an entirely different >>> segment. >> >> That would mean that WAL forked exactly at the same record. You have >> likely seen more cases where than can happen in real life than I do. > > Suppose that the original master fails during an idle period, and we > promote a slave. But we accidentally promote a slave that can't serve > as the new master, like because it's in a datacenter with an > unreliable network connection or one which is about to be engulfed in > lava. Much more common than people think. > So, we go to promote a different slave, and because we never > got around to reconfiguring the standbys to follow the previous > promotion, kaboom. Exactly. > Or, suppose we do PITR to recover from some user error, but then > somebody screws up the contents of the recovered cluster and we have > to do it over again. Naturally we'll recover to the same point. > > The new TLI is the only thing that is guaranteed to be unique with > each new promotion, and I would guess that it is therefore the right > thing to use to disambiguate them. This is the conclusion we came to after a few months of diagnosing and working on this problem. The question in my mind: is it safe to back-patch? -- -David da...@pgmasters.net
Re: valgrind issues on Fedora 28
On 12/14/18 4:35 PM, Tomas Vondra wrote: > On 12/14/18 4:58 PM, Tom Lane wrote: >> ... >> >> In general, I'm not particularly on board with our valgrind.supp >> carrying suppressions for code outside our own code base: I think >> that's assuming WAY too much about which version of what is installed >> on a particular box. >> > Fair point. > >> Maybe we could do something to make it simpler to have custom >> suppressions? Not sure what, though. >> > I was thinking that perhaps we could allows specifying path to extra > suppressions and pass that to valgrind. > > But we don't actually invoke valgrind, that's something people do on > their own anyway - so we don't have anywhere to pass the path to. And > whoever invokes valgrind can simply stick it directly into the command > they're using (as it allows specifying multiple --suppressions= > options). Or perhaps just put it into ~/.valgrindrc. > > So perhaps we should simply revert that commit and be done with it. > > One place that will need to solve it is buildfarm client, but it could > pick either of the options I mentioned. > > The buildfarm client has a parameter in the config file for valgrind options. All you would have to do is an an extra --suppressions setting in there. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Variable-length FunctionCallInfoData
> "Andres" == Andres Freund writes: >> I think it'd probably good to add accessors for value/nullness in >> arguments that hide the difference between > sake of extension authors. Would probably mostly make sense if we >> backpatched those for compatibility. Speaking as an affected extension author: don't backpatch them. The extension code has to cope with being compiled against a minor release that doesn't have the backpatch, so the extension author has to do their own workaround anyway. Having additional conditions based on the minor release is just more pain. -- Andrew (irc:RhodiumToad)
Re: Add timeline to partial WAL segments
On Fri, Dec 14, 2018 at 06:05:18PM -0500, David Steele wrote: > On 12/14/18 3:26 PM, Robert Haas wrote: >> The new TLI is the only thing that is guaranteed to be unique with >> each new promotion, and I would guess that it is therefore the right >> thing to use to disambiguate them. > > This is the conclusion we came to after a few months of diagnosing and > working on this problem. Hm. Okay. From that point of view I think that we should also make pg_receivewal able to generate the same kind of segment format when it detects a timeline switch for the last partial segment of the previous timeline. Switching to a new timeline makes the streaming finish, so we could use that to close properly the WAL segment with the new name. At the same time it would be nice to have a macro able to generate a partial segment name and also check after it. > The question in my mind: is it safe to back-patch? That's unfortunately a visibility change, meaning that any existing backup solution able to handle .partial segments would get broken in a minor release. From that point of view this should go only on HEAD. The other patch to reorder the priority of the .ready files could go down without any problem though. -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote: > On 12/13/18 7:15 PM, Michael Paquier wrote: >> I would like to hear opinion from other though if we should consider >> that as an improvement or an actual bug fix. Changing the order of the >> files to map with what the startup process does when promoting does not >> sound like a bug fix to me, still this is not really invasive, so we >> could really consider it worth back-patching to reduce common pain from >> users when it comes to timeline handling. > > I think an argument can be made that it is a bug (ish). Postgres > generates the files in one order, and they get archived in a different > order. I am not completely sure either. In my experience, if there is any doubt on such definitions the best answer is to not backpatch. >> Or you could just use IsTLHistoryFileName here? > > We'd have to truncate .ready off the string to make that work, which > seems easy enough. Is that what you were thinking? Yes, that's the idea. pgarch_readyXlog returns the segment name which should be archived, so you could just compute it after detecting a .ready file. > One thing to consider is the check above is more efficient than > IsTLHistoryFileName() and it potentially gets run a lot. This check misses strspn(), so any file finishing with .history would get eaten even if that's unlikely to happen. >> If one wants to simply check this code, you can just create dummy orphan >> files in archive_status and see in which order they get removed. > > Seems awfully racy. Are there currently any tests like this for the > archiver that I can look at extending? Sorry for the confusion, I was referring to manual testing here. Thinking about it, we could have an automatic test to check for the file order pattern by creating dummy files, starting the server with archiver enabled, and then parse the logs as orphan .ready files would get removed in the order their archiving is attempted with one WARNING entry generated for each. I am not sure if that is worth a test though. -- Michael signature.asc Description: PGP signature
Re: automatically assigning catalog toast oids
Hi, On 2018-12-13 20:21:12 -0500, Tom Lane wrote: > Andres Freund writes: > > [ separation of FirstBootstrapObjectId and FirstGenbkiObjectId ] > > It seems like we should also add a check to genbki.pl that the ending > value of GenbkiNextOid is <= FirstBootstrapObjectId, so that we'll > certainly notice when it's necessary to adjust those boundaries. Hm, if we go there, it'd probably also good to check for <= FirstGenbkiObjectId? > >> I did *not* change record_plan_function_dependency(), it seems correct > >> that it doesn't track genbki assigned oids, they certainly can't change > >> while a server is running. But I'm not entirely clear to why that's not > >> using FirstNormalObjectId as the cutoff, so perhaps I'm missing > >> something. Similar with logic in predicate.c. > > It's not about whether they can change whether the server is running, > it's about whether they're pinned. OIDs above 1 might or might > not be pinned; we shouldn't hard-wire assumptions about that. So > I suspect you made the wrong choice there. Hm, but aren't pins setup by initdb's setup_depend()? IOW, genbki assigned oids will be in there? Am I missing something? > BTW, this ties into something that was bugging me this afternoon while > looking at dependencies on the default collation: there's a bunch of > places that special-case DEFAULT_COLLATION_OID to skip adding dependencies > on it, for instance this in dependency.c: > > /* > * We must also depend on the constant's collation: it could be > * different from the datatype's, if a CollateExpr was const-folded to > * a simple constant. However we can save work in the most common > * case where the collation is "default", since we know that's pinned. > */ > if (OidIsValid(con->constcollid) && > con->constcollid != DEFAULT_COLLATION_OID) > add_object_address(OCLASS_COLLATION, con->constcollid, 0, >context->addrs); > > I'm pretty sure that that special case is my fault, added in perhaps > over-eagerness to minimize the cost added by the collations feature. > Looking at it now, it seems like mostly a wart. But perhaps there is > a more general principle here: why don't we just hard-wire an assumption > that all OIDs below FirstGenbkiObjectId are pinned? I'm thinking of > getting rid of the DEFAULT_COLLATION_OID special cases in dependency- > recording logic and instead having someplace central like isObjectPinned > just assume that such OIDs are pinned, so that we don't bother to > consult or update pg_depend for them. > We could take it a bit further and not make the 'p' entries for such > OIDs in the first place, greatly reducing the size of pg_depend in > typical installations. Perhaps that'd confuse clients that look at > that catalog, though. I think that'd be good, it's the second largest table in a new cluster, and it's not going to get smaller. 'p' already is somewhat of a special case, so I'm not particulary concerned with clients having to understand that. Greetings, Andres Freund
Re: Add pg_partition_root to get top-most parent of a partition tree
On Fri, Dec 14, 2018 at 02:20:27PM +0900, Amit Langote wrote: > Given that pg_partition_root will return a valid result for any relation > that can be part of a partition tree, it seems strange that the above > sentence says "for the given partitioned table or partitioned index". It > should perhaps say: > > Return the top-most parent of the partition tree to which the given > relation belongs Check. > +static bool > +check_rel_for_partition_info(Oid relid) > +{ > +charrelkind; > + > +/* Check if relation exists */ > +if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid))) > +return false; > > This should be checked in the caller imho. On this one I disagree, both pg_partition_root and pg_partition_tree share the same semantics on the matter. If the set of functions gets expanded again later on, I got the feeling that we could forget about it again, and at least placing the check here has the merit to make out future selves not forget about that pattern.. > I can't imagine this function growing more code to perform additional > checks beside just checking the relkind, so the name of this function may > be a bit too ambitious. How about calling it > check_rel_can_be_partition()? The comment above the function could be a > much simpler sentence too. I know I may be just bikeshedding here > though. The review is also here for that. The routine name you are suggesting looks good to me. > +/* > + * If the relation is not a partition, return itself as a result. > + */ > +if (!get_rel_relispartition(relid)) > +PG_RETURN_OID(relid); > > Maybe the comment here could say "The relation itself may be the root > parent". Check. I tweaked the comment in this sense. > "valid" is duplicated in the last sentence in the comment. Anyway, what's > being Asserted can be described simply as: > > /* > * 'rootrelid' must contain a valid OID, given that the input relation is > * a valid partition tree member as checked above. > */ Changed in this sense. Please find attached an updated patch. -- Michael diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b3336ea9be..b328c31637 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -20270,6 +20270,17 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); their partitions, and so on. + + +pg_partition_root +pg_partition_root(regclass) + + regclass + +Return the top-most parent of a partition tree to which the given +relation belongs. + + diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c index 6fb4f6bc50..55588a8fd3 100644 --- a/src/backend/utils/adt/partitionfuncs.c +++ b/src/backend/utils/adt/partitionfuncs.c @@ -25,6 +25,34 @@ #include "utils/lsyscache.h" #include "utils/syscache.h" +/* + * Perform several checks on a relation on which is extracted some + * information related to its partition tree. Returns false if the + * relation cannot be processed, in which case it is up to the caller + * to decide what to do, by either raising an error or doing something + * else. + */ +static bool +check_rel_can_be_partition(Oid relid) +{ + char relkind; + + /* Check if relation exists */ + if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid))) + return false; + + relkind = get_rel_relkind(relid); + + /* Only allow relation types that can appear in partition trees. */ + if (relkind != RELKIND_RELATION && + relkind != RELKIND_FOREIGN_TABLE && + relkind != RELKIND_INDEX && + relkind != RELKIND_PARTITIONED_TABLE && + relkind != RELKIND_PARTITIONED_INDEX) + return false; + + return true; +} /* * pg_partition_tree @@ -39,19 +67,10 @@ pg_partition_tree(PG_FUNCTION_ARGS) { #define PG_PARTITION_TREE_COLS 4 Oid rootrelid = PG_GETARG_OID(0); - char relkind = get_rel_relkind(rootrelid); FuncCallContext *funcctx; ListCell **next; - if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(rootrelid))) - PG_RETURN_NULL(); - - /* Return NULL for relation types that cannot appear in partition trees */ - if (relkind != RELKIND_RELATION && - relkind != RELKIND_FOREIGN_TABLE && - relkind != RELKIND_INDEX && - relkind != RELKIND_PARTITIONED_TABLE && - relkind != RELKIND_PARTITIONED_INDEX) + if (!check_rel_can_be_partition(rootrelid)) PG_RETURN_NULL(); /* stuff done only on the first call of the function */ @@ -153,3 +172,39 @@ pg_partition_tree(PG_FUNCTION_ARGS) /* done when there are no more elements left */ SRF_RETURN_DONE(funcctx); } + +/* + * pg_partition_root + * + * For the given relation part of a partition tree, return its top-most + * root parent. + */ +Datum +pg_partition_root(PG_FUNCTION_ARGS) +{ + Oid relid = PG_GETARG_OID(0); + Oid rootrelid; + List *ancestors; + + if (!check_rel_can_be_partition(relid)) + PG_RETURN_NULL(); + + /* + * If the relation is not a
Re: Computing the conflict xid for index page-level-vacuum on primary
On Thu, Dec 13, 2018 at 5:42 PM Andres Freund wrote: > This leads me to think that we possibly should move computation of the > last removed xid from recovery to the primary, during the generation of > the xl_btree_delete WAL record. For the record, I share your sense that this isn't a great design for microvacuuming. It would be better if it happened on the primary, where we generally have a lot more control/context. Of course, the devil is in the details. > Talking with Peter Geoghegan we were pondering a few ideas to address > that: > - we could stop doing page level deletion after the tuples on the first > heap page, if that frees sufficient space on the index page, to stave > of a page split for longer. That has the downside that it's possible > that we'd end up WAL logging more, because we'd repeatedly emit > xl_btree_delete records. > - We could use the visits to the heap pages to be *more* aggressive > about deleting the heap page. If there's some tuples on a heap page > that are deletable, and we know about that due to the kill_prior_tuple > logic, it's fairly likely that other pointers to the same page > actually are also dead: We could re-check that for all index entries > pointing to pages that we'd visit anyway. Right. As we discussed, B-Tree is concerned solely with delaying and probably even preventing a page split in its _bt_findsplitloc() path. It's far from obvious that an eventual page split is inevitable, since of course the range of values that belong on the leaf page are always constrained. We can probably afford to be lazy about the things we actually kill, since we're operating on the primary at a critical moment: the point that we need some exact amount of free space to delay a page split. We can free only as much space as needed, balancing the need to access heap pages against the immediate need for free space on the leaf page. Now, I admit that I haven't thought through the implications for WAL volume at all. However, it's possible that we might actually manage to *decrease* the WAL volume in many cases by never freeing space that isn't truly needed. It's not impossible that we'd regularly come out ahead there, since a split is not inevitable just because we're running low on free space this instant. And, I suspect that WAL volume is the only major additional overhead that comes from being more discerning about how many tuples are actually killed. Eagerly freeing space to prevent a page split is a high priority for many B-Tree implementations; the additional fragmentation of the leaf page caused by a more incremental approach to deletion isn't likely to be a problem. Once you establish the principle that you can be lazy here, you also establish the principle that you can be more eager, finding additional things to kill that didn't already have their LP_DEAD bits set based on heuristics (e.g. "I'm visiting the same heap page anyway, might as well look for extra stuff"). It isn't necessary to actually have any of that fancy stuff in your initial version, though. -- Peter Geoghegan
Re: automatically assigning catalog toast oids
On 2018-Dec-13, Tom Lane wrote: > We could take it a bit further and not make the 'p' entries for such > OIDs in the first place, greatly reducing the size of pg_depend in > typical installations. Perhaps that'd confuse clients that look at > that catalog, though. The number of 'p' entries in pg_depend is often annoying when manually querying it. I support the idea of special-casing such entries. > Looking at the existing entries, it seems like we'd have to have > one special case: schema public has OID 2200 but is intentionally > not pinned. I wouldn't have a hard time with teaching isObjectPinned > about that; though if it turns out that many places need to know > about it, maybe it's not workable. Why not just move that OID outside the Genbki special range? I have seen quite a few installs where schema public was removed and later re-added. I've never seen a query hardcode OID 2200, and I'd call one which does buggy. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: automatically assigning catalog toast oids
Hi, On 2018-12-14 22:54:14 -0300, Alvaro Herrera wrote: > On 2018-Dec-13, Tom Lane wrote: > > Looking at the existing entries, it seems like we'd have to have > > one special case: schema public has OID 2200 but is intentionally > > not pinned. I wouldn't have a hard time with teaching isObjectPinned > > about that; though if it turns out that many places need to know > > about it, maybe it's not workable. > > Why not just move that OID outside the Genbki special range? I have > seen quite a few installs where schema public was removed and later > re-added. I've never seen a query hardcode OID 2200, and I'd call one > which does buggy. +1 Greetings, Andres Freund
Re: removal of dangling temp tables
On 2018-Dec-14, Robert Haas wrote: > On Fri, Dec 14, 2018 at 12:27 PM Alvaro Herrera > wrote: > > Maybe it'd be better to change temp table removal to always drop > > max_locks_per_transaction objects at a time (ie. commit/start a new > > transaction every so many objects). > > We're basically just doing DROP SCHEMA ... CASCADE, so I'm not sure > how we'd implement that, but I agree it would be significantly better. (Minor nit: even currently, we don't drop the schema itself, only the objects it contains.) I was thinking we could scan pg_depend for objects depending on the schema, add them to an ObjectAddresses array, and do performMultipleDeletions once every max_locks_per_transaction objects. But in order for this to have any useful effect we'd have to commit the transaction and start another one; maybe that's too onerous. Maybe we could offer such a behavior as a special case to be used only in case the regular mechanism fails. So add a PG_TRY which, in case of failure, sends a hint to do the cleanup. Not sure this is worthwhile. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: removal of dangling temp tables
On 2018-Dec-14, Robert Haas wrote: > On Fri, Dec 14, 2018 at 6:35 PM Tom Lane wrote: > > Hm. It *could*, if we wanted it to run some transactions after > > finishing recovery. > > It'd have to launch a separate process per database. That would be > useful infrastructure for other things, too, like automatic catalog > upgrades in minor releases, but I'm not volunteering to write that > infrastructure right now. This looks like a major project. > > Alternatively, maybe we could have backends flag whether they've > > taken ownership of their temp schemas or not, and let autovacuum > > flush old temp tables if not? > > Yes, that seems like a possibly promising approach. I did propose in my OP the idea of a PGPROC boolean flag that indicates whether the temp namespace has been set up. If not, have autovac remove those tables. I like this option better, but I fear it adds more ProcArrayLock contention. Maybe I can just use a new LWLock to coordinate that particular member of the ProcGlobal array ... (but then it can no longer be a boolean.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Sat, Dec 15, 2018 at 12:14 AM Robert Haas wrote: > > On Wed, Nov 28, 2018 at 1:43 PM Alvaro Herrera > wrote: > > Right, I think option 4 is a clear improvement over option 1. I can get > > behind that one. Since not many people care to vote, I think this tips > > the scales enough to that side. > > I'm showing up very late to the party here, > Not a problem, people like you are always welcome. > but I like option 1 best. > You are not alone in this camp, so, IIUC, below are voting by different people Option-1: Vik, Sergei, Robert Option-2: Alvaro, Magnus Option-3: Michael, Hari Option-4: Amit, Hari, Magnus, Alvaro, Michael, Peter Some people's name is present in two options as they are okay with either one of those. I see that now more people are in favor of option-1, but still, the tilt is more towards option-4. > I feel like the SQL standard has a pretty clear idea that NULL is how > you represent a value is unknown, which shows up in a lot of places. > Deciding that we're going to use a different sentinel value in this > one case because NULL is a confusing concept in general seems pretty > strange to me. > I agree that NULL seems to be the attractive choice for a default value, but we have to also see what it means? In this case, NULL will mean 'all' which doesn't go with generally what NULL means (aka unknown, only NULL can be compared with other NULL). There are a few people on this thread who feel that having NULL can lead to misuse of this API [1] as explained here and probably we need to use some workaround for it to be used in production [2]. [1] - https://www.postgresql.org/message-id/CAA4eK1LicqWY55XxmahQXti4RjQ28iuASAk1X8%2ByKX0J051_VQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/20181117111653.cetidngkgol5e5xn%40alvherre.pgsql -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: removal of dangling temp tables
On Fri, Dec 14, 2018 at 11:06:32PM -0300, Alvaro Herrera wrote: > I did propose in my OP the idea of a PGPROC boolean flag that indicates > whether the temp namespace has been set up. If not, have autovac remove > those tables. I like this option better, but I fear it adds more > ProcArrayLock contention. Maybe I can just use a new LWLock to > coordinate that particular member of the ProcGlobal array ... (but then > it can no longer be a boolean.) Isn't that what tempNamespaceId can be used for in PGPROC now? The flag would be set only when a backend creates a new temporary schema so as it can be tracked as the owner of the schema. -- Michael signature.asc Description: PGP signature
unnecessary creation of FSM files during bootstrap mode
As pointed out by John Naylor [1], it seems during bootstrap mode, we are always creating FSM files which are not required. In commit's b9d01fe288 and 3908473c80, we have added some code where we allowed the creation of files during mdopen even if they didn't exist during the bootstrap mode. The comments in the code say: "During bootstrap, there are cases where a system relation will be accessed (by internal backend processes) before the bootstrap script nominally creates it." I am sure this will be the case when that code is added but is it required today? While going through commit 3908473c80, I came across below comment: - * During bootstrap processing, we skip that check, because pg_time, - * pg_variable, and pg_log get created before their .bki file entries - * are processed. - */ + fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, 0600); The system tables mentioned in above commit are not present today, so do we really need that code and even if it is required shall we do it only for 'main' or 'init' forks? Tom, as you are a committer of the commits b9d01fe288 and 3908473c80, do you remember anything in this regard? [1] - https://www.postgresql.org/message-id/CAJVSVGVtf%2B-2sQGVyEZJQh15dpVicpFA6BBiPLugaD4oBEaiHg%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Dec 13, 2018 at 3:18 AM John Naylor wrote: > > On 11/24/18, Amit Kapila wrote: > > 4. You have mentioned above that "system catalogs created during > > bootstrap still have a FSM if they have any data." and I can also see > > this behavior, have you investigated this point further? > > I found the cause of this. There is some special-case code in md.c to > create any file if it's opened in bootstrap mode. I removed this and a > similar special case (attached), and make check still passes. After > digging through the history, I'm guessing this has been useless code > since about 2001, when certain special catalogs were removed. > Good finding, but I think it is better to discuss this part separately. I have started a new thread for this issue [1]. [1] - https://www.postgresql.org/message-id/CAA4eK1KsET6sotf%2BrzOTQfb83pzVEzVhbQi1nxGFYVstVWXUGw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com