Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Thu, 17 Jan 2019 at 03:42, David Rowley wrote: > 39. I don't see analyze_mcv_list() being used anywhere around this comment: > > * If we can fit all the items onto the MCV list, do that. Otherwise use > * analyze_mcv_list to decide how many items to keep in the MCV list, just > * like for the single-dimensional MCV list. > Right. Also, analyze_mcv_list() is no longer being used anywhere outside of analyze.c, so it can go back to being static. Regards, Dean
Re: "could not reattach to shared memory" on buildfarm member dory
On Sun, Dec 02, 2018 at 09:35:06PM -0800, Noah Misch wrote: > On Tue, Sep 25, 2018 at 08:05:12AM -0700, Noah Misch wrote: > > On Mon, Sep 24, 2018 at 01:53:05PM -0400, Tom Lane wrote: > > > Overall, I agree that neither of these approaches are exactly attractive. > > > We're paying a heck of a lot of performance or complexity to solve a > > > problem that shouldn't even be there, and that we don't understand well. > > > In particular, the theory that some privileged code is injecting a thread > > > into every new process doesn't square with my results at > > > https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us > > > > > > I think our best course of action at this point is to do nothing until > > > we have a clearer understanding of what's actually happening on dory. > > > Perhaps such understanding will yield an idea for a less painful fix. > > > > I see. > > Could one of you having a dory login use > https://live.sysinternals.com/Procmon.exe to capture process events during > backend startup? Ping. > The ideal would be one capture where startup failed reattach > and another where it succeeded, but having the successful run alone would be a > good start. The procedure is roughly this: > > - Install PostgreSQL w/ debug symbols. > - Start a postmaster. > - procmon /nomonitor > - procmon "Filter" menu -> Enable Advanced Output > - Ctrl-l, add filter for "Process Name" is "postgres.exe" > - Ctrl-e (starts collecting data) > - psql (leave it running) > - After ~60s, Ctrl-e again in procmon (stops collecting data) > - File -> Save -> PML > - File -> Save -> XML, include stack traces, resolve stack symbols > - Compress the PML and XML files, and mail them here
Re: log bind parameter values on error
There appears to be a problem with how this affects current logging behavior. I'm using pgbench -M extended -S -T 10 bench to test the extended protocol. Unpatched, with log_statement=all, you get something like LOG: execute : SELECT abalance FROM pgbench_accounts WHERE aid = $1; DETAIL: parameters: $1 = '30223' With your patch, with log_statement=all and log_parameters=on, you get the same, but with log_statement=all and log_parameters=off you get LOG: execute : SELECT abalance FROM pgbench_accounts WHERE aid = $1; DETAIL: parameters: $1 = UNKNOWN TYPE We should probably keep the existing parameter logging working as before. This also raises the question of the new parameter name. Parameters are already logged. So the name should perhaps be more like log_parameters_on_error. Some API notes on your patch: I think you can change get_portal_bind_parameters() to take a ParamListInfo, since you're not doing anything with the Portal other than grab the parameters. And that would allow you to keep the signature of errdetail_params() unchanged. I did some performance tests using the commands shown above and didn't find any problems. Obviously the default pgbench stuff isn't very parameter-intensive. Do you have tests with more and larger parameter values? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during query execution
On Thu, 17 Jan 2019 at 17:18, Amit Langote wrote: > > On 2019/01/04 9:53, David Rowley wrote: > > Without PREPAREd statements, if the planner itself was unable to prune > > the partitions it would already have obtained the lock during > > planning, so AcquireExecutorLocks(), in this case, would bump into the > > local lock hash table entry and forego trying to obtain the lock > > itself. That's not free, but it's significantly faster than obtaining > > a lock. > > Hmm, AcquireExecutorLocks is only called if prepared statements are used > and that too if a generic cached plan is reused. > > GetCachedPlan -> CheckCachedPlan -> AcquireExecutorLocks ah okay. Thanks for the correction, but either way, it's really only useful when run-time pruning prunes partitions before initialising the Append/MergeAppend node. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: add_partial_path() may remove dominated path but still in use
Hello, sorry for the absence. At Fri, 11 Jan 2019 11:36:43 -0500, Robert Haas wrote in > On Thu, Jan 10, 2019 at 9:10 PM Kohei KaiGai wrote: > > 2019年1月11日(金) 5:52 Robert Haas : > > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai wrote: > > > > So, is it sufficient if set_rel_pathlist_hook is just relocated in > > > > front of the generate_gather_paths? > > > > If we have no use case for the second hook, here is little necessity > > > > to have the post_rel_pathlist_hook() here. > > > > (At least, PG-Strom will use the first hook only.) > > > > > > +1. That seems like the best way to be consistent with the principle > > > that we need to have all the partial paths before generating any > > > Gather paths. > > > > > Patch was updated, just for relocation of the set_rel_pathlist_hook. > > Please check it. > > Seems reasonable to me. Also seems reasonable to me. The extension can call generate_gather_paths redundantly as is but it almost doesn't harm, so it is acceptable even in a minor release. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
Hello Tom, Although we've got a few NetBSD and OpenBSD buildfarm critters, none of them are doing --enable-tap-tests. If they were, we'd have noticed the pgbench regression tests falling over: [...] I am, TBH, inclined to fix this by removing that test case rather than teaching it another spelling to accept. I think it's very hard to make the case that tests like this one are anything but a waste of developer and buildfarm time. When they are also a portability hazard, it's time to cut our losses. (I also note that this test has caused us problems before, cf 869aa40a2 and 933851033.) I'd rather keep it by simply adding the "|unknown" alternative. 30 years of programming have taught me that testing limit & error cases is useful, although you never know when it will be proven so. Client application coverage is currently abysmal, especially "psql" despite the many script used for testing (39% of lines, 42% of functions!), pgbench is under 90%. Generally we really need more tests, not less. TAP tests are a good compromise because they are not always run, and ISTM sometimes (i.e. you asked for it) is enough. I agree that some tests can be useless, but I do not think that it applies to this one. This test also checks that under a bad option pgbench stops with an appropriate 1 exit status. Recently a patch updated the exit status of pgbench in many cases to distinguish between different kind errors, thus having non-regression in this area was shown to be a good idea. Moreover, knowing that the exit status on getopt errors is consistent across platform has value, and knowing that there is some variability is not uninteresting. -- Fabien.
Re: Proposal for Signal Detection Refactoring
Latest patch, simpler but answers the key questions as code comments On Mon, Dec 3, 2018 at 6:56 AM Chris Travers wrote: > On Thu, Nov 29, 2018 at 8:46 PM Dmitry Dolgov <9erthali...@gmail.com> > wrote: > >> > On Wed, Oct 10, 2018 at 7:10 PM Chris Travers >> wrote: >> > >> >> More generally, I'd like this material to be code comments. It's the >> >> kind of stuff that gets outdated before long if it's kept separate. >> > >> > The problem is that code comments are not going to be good places to >> document "how do I check for pending actions?" That could be moved to the >> main SGML I guess. >> >> I aggree with Peter here, for me it also feels more natural to have this >> information as code commentaries - at least if I would search for it that >> would >> be my first thought. As for "how do I..." part, I think there are alreasy >> similar commentaries in the code, which makes sense - this kind of >> questions >> usually appear when you're reading/writing some code. >> >> It doesn't look like there is much left to do in this discussion, but for >> now >> I'll move it to the next CF. >> > > Ok will move this into code comments if there are no objections. > > > -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin signal-comments.patch Description: Binary data
Re: speeding up planning with partitions
Thank you Imai-san for testing. Sorry it totally slipped my mind to reply to this email. On 2019/01/09 11:08, Imai, Yoshikazu wrote: > I wonder why force_custom_plan is faster than auto after applied the patch. > > When we use PREPARE-EXECUTE, a generic plan is created and used if its cost is > cheaper than creating and using a custom plan with plan_cache_mode='auto', > while a custom plan is always created and used with > plan_cache_mode='force_custom_plan'. > So one can think the difference in above results is because of creating or > using a generic plan. > > I checked how many times a generic plan is created during executing pgbench > and > found a generic plan is created only once and custom plans are created at > other > times with plan_cache_mode='auto'. I also checked the time of creating a > generic plan, but it didn't take so much(250ms or so with 4096 partitions). So > the time of creating a generic plan does not affect the performance. > > Currently, a generic plan is created at sixth time of executing EXECUTE query. > I changed it to more later (ex. at 400,000th time of executing EXECUTE query > on > master with 4096 partitions, because 7000TPS x 60sec=420, transactions are > run while executing pgbench.), then there are almost no difference between > auto > and force_custom_plan. I think that creation of a generic plan affects the > time > of executing queries which are ordered after creating generic plan. > > If my assumption is right, we can expect some additional process is occurred > at > executing queries ordered after creating a generic plan, which results in > auto is > slower than force_custom_plan because of additional process. But looking at > above results, on master with 4096 partitions, auto is faster than > force_custom_plan. > So now I am confused. > > Do you have any ideas what does affect the performance? Are you saying that, when using auto mode, all executions of the query starting from 7th are slower than the first 5 executions? That is, the latency of creating and using a custom plan increases *after* a generic plan is created and discarded on the 6th execution of the query? If so, that is inexplicable to me. Thanks, Amit
Re: Log a sample of transactions
On 1/16/19 10:09 AM, Masahiko Sawada wrote: Since we set xact_is_sampled only when transaction starts and see it during transaction we cannot disable logging during transaction and vice versa. I can imagine the use case where user wants to disable logging during transaction. So it might be better to also check if log_xact_sample_rate > 0 in check_log_duration(). I Agree, here is third patch. Thanks! diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index b6f5822b84..3eb418cf99 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5780,6 +5780,24 @@ local0.*/var/log/postgresql + + log_transaction_sample_rate (real) + + log_transaction_sample_rate configuration parameter + + + + + Set the fraction of transactions whose statements are logged. It applies + to each new transaction regardless of their duration. The default is + 0, meaning do not log statements from this transaction. + Setting this to 1 logs all statements for all transactions. + log_transaction_sample_rate is helpful to track a + sample of transaction. + + + + diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index f665e38ecf..19306a9547 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1821,6 +1821,9 @@ StartTransaction(void) s->state = TRANS_START; s->transactionId = InvalidTransactionId; /* until assigned */ + /* Determine if we log statements in this transaction */ + xact_is_sampled = random() <= log_xact_sample_rate * MAX_RANDOM_VALUE; + /* * initialize current transaction state fields * diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 0c0891b33e..ec477d69d0 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -100,7 +100,8 @@ int max_stack_depth = 100; /* wait N seconds to allow attach from a debugger */ int PostAuthDelay = 0; - +/* flag for logging statements in this transaction */ +bool xact_is_sampled = false; /* * private variables @@ -2203,6 +2204,8 @@ check_log_statement(List *stmt_list) * check_log_duration * Determine whether current command's duration should be logged. * If log_statement_sample_rate < 1.0, log only a sample. + * We also check if this statement in this transaction must be logged + * (regardless of its duration). * * Returns: * 0 if no logging is needed @@ -2218,7 +2221,8 @@ check_log_statement(List *stmt_list) int check_log_duration(char *msec_str, bool was_logged) { - if (log_duration || log_min_duration_statement >= 0) + if (log_duration || log_min_duration_statement >= 0 || + (xact_is_sampled && log_xact_sample_rate > 0)) { long secs; int usecs; @@ -2252,11 +2256,12 @@ check_log_duration(char *msec_str, bool was_logged) (log_statement_sample_rate == 1 || random() <= log_statement_sample_rate * MAX_RANDOM_VALUE); - if ((exceeded && in_sample) || log_duration) + if ((exceeded && in_sample) || log_duration || + (xact_is_sampled && log_xact_sample_rate > 0)) { snprintf(msec_str, 32, "%ld.%03d", secs * 1000 + msecs, usecs % 1000); - if (exceeded && !was_logged) + if ((exceeded || xact_is_sampled) && !was_logged) return 2; else return 1; diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index c216ed0922..06a5e668aa 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -494,6 +494,7 @@ int client_min_messages = NOTICE; int log_min_duration_statement = -1; int log_temp_files = -1; double log_statement_sample_rate = 1.0; +double log_xact_sample_rate = 0; int trace_recovery_messages = LOG; int temp_file_limit = -1; @@ -3347,13 +3348,25 @@ static struct config_real ConfigureNamesReal[] = {"log_statement_sample_rate", PGC_SUSET, LOGGING_WHEN, gettext_noop("Fraction of statements over log_min_duration_statement to log."), gettext_noop("If you only want a sample, use a value between 0 (never " - "log) and 1.0 (always log).") + "log) and 1 (always log).") }, &log_statement_sample_rate, 1.0, 0.0, 1.0, NULL, NULL, NULL }, + { + {"log_transaction_sample_rate", PGC_SUSET, LOGGING_WHEN, + gettext_noop("Set the fraction of transactions to log for new transactions."), + gettext_noop("Logs all statements from a fraction of transactions. " + "Use a value between 0 (never log) and 1 (log all " + "statements for all transactions).") + }, + &log_xact_sample_rate, + 0.0, 0.0, 1.0, + NULL, NULL, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, 0.0, 0.0, 0.0, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index a21865a77f..07e590ea20 100644 --- a/src/backend/utils/misc/pos
Re: Libpq support to connect to standby server as priority
Tsunakawa, Takayuki wrote: > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > > The problem here of course is that whoever invented target_session_attrs > > was unconcerned with following that precedent, so what we have is > > "target_session_attrs=(any | read-write)". > > Are we prepared to add some aliases in service of unifying these names? > > I think "yes". > > > 2. Whether or not you want to follow pgJDBC's naming, it seems like we > > ought to have both "require read only" and "prefer read only" behaviors > > in this patch, and maybe likewise "require read write" versus "prefer > > read write". > > Agreed, although I don't see a use case for "prefer read write". I don't > think > there's an app like "I want to write, but I'm OK if I cannot." I don't think so either, although of course I cannot prove it. My opinion is that we shouldn't add options like "prefer read write" just out of a fuzzy desire for symmetry. It would probably make the code even more complicated, and more choice means that it becomes harder for the user to pick the right one (the latter may be a weak argument). The motivation behind all this is to load balance reading and writing sessions among a group of replicating servers where you don't know for sure who is what at the moment, so "preferably read-only", "must be able to write" and "don't care" are choice enough. There is nothing that bars future patches from adding additional modes if the need really arises. > > 4. Given that other discussion, it's not quite clear what we should > > even be checking. The existing logic devolves to checking that > > transaction_read_only is true, but that's not really the same thing as > > "is a master server", eg you might have connected to a master server > > under a role that has SET ROLE default_transaction_read_only = false. > > PgJDBC uses transaction_read_only like this: [...] > > But as some people said, I don't think this is the right way. I suppose > what's leading > to the current somewhat complicated situation is that there was no easy way > for the > client to know whether the server is the master. That ended up in using > "SHOW transaction_read_only" instead, and people supported that compromise by > saying > "read only status is more useful than whether the server is standby or not," > I'm afraid. > > The original desire should have been the ability to connect to a primary or a > standby. > So, I think we should go back to the original thinking (and not complicate > the feature), > and create a read only GUC_REPORT variable, say, server_role, that identifies > whether > the server is a primary or a standby. I think that transaction_read_only is good. If it is set to false, we are sure to be on a replication primary or stand-alone server, which is enough to know for the load balancing use case. I deem it unlikely that someone will set default_transaction_read_only to FALSE and then complain that the feature is not working as expected, but again I cannot prove that claim. As Robert said, transaction_read_only might even give you the option to use the feature for more than just load balancing between replication master and standby. Yours, Laurenz Albe
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
I've started looking over 0002. Here are a few things so far: 1. I think this should be pg_statistic_ext.stxhistogram? Values of the pg_histogram can be obtained only from the pg_statistic.stxhistogram column. 2. I don't think this bms_copy is needed anymore. I think it was previously since there were possibly multiple StatisticExtInfo objects per pg_statistic_ext row, but now it's 1 for 1. + info->keys = bms_copy(keys); naturally, the bms_free() will need to go too. 3. I've not really got into understanding how the new statistics types are applied yet, but I found this: * If asked to build both MCV and histogram, first build the MCV part * and then histogram on the remaining rows. I guess that means we'll get different estimates with: create statistic a_stats (mcv,histogram) on a,b from t; vs create statistic a_stats1 (mcv) on a,b from t; create statistic a_stats2 (histogram) on a,b from t; Is that going to be surprising to people? 4. I guess you can replace "(histogram == NULL);" with "false". The compiler would likely do it anyway, but... if (histogram != NULL) { /* histogram already is a bytea value, not need to serialize */ nulls[Anum_pg_statistic_ext_stxhistogram - 1] = (histogram == NULL); values[Anum_pg_statistic_ext_stxhistogram - 1] = PointerGetDatum(histogram); } but, hmm. Shouldn't you serialize this, like you are with the others? 5. serialize_histogram() and statext_histogram_deserialize(), should these follow the same function naming format? 6. IIRC some compilers may warn about this: if (stat->kinds & requiredkinds) making it: if ((stat->kinds & requiredkinds)) should fix that. UPDATE: Tried to make a few compilers warn about this and failed. Perhaps I've misremembered. 7. Comment claims function has a parameter named 'requiredkind', but it no longer does. The comment also needs updated to mention that it finds statistics with any of the required kinds. * choose_best_statistics * Look for and return statistics with the specified 'requiredkind' which * have keys that match at least two of the given attnums. Return NULL if * there's no match. * * The current selection criteria is very simple - we choose the statistics * object referencing the most of the requested attributes, breaking ties * in favor of objects with fewer keys overall. * * XXX If multiple statistics objects tie on both criteria, then which object * is chosen depends on the order that they appear in the stats list. Perhaps * further tiebreakers are needed. */ StatisticExtInfo * choose_best_statistics(List *stats, Bitmapset *attnums, int requiredkinds) 8. Looking at statext_clauselist_selectivity() I see it calls choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV | STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to attempt to find the best match plus the one with the most statistics kinds? It might only matter if someone had: create statistic a_stats1 (mcv) on a,b from t; create statistic a_stats2 (histogram) on a,b from t; create statistic a_stats3 (mcv,histogram) on a,b from t; Is it fine to just return a_stats1 and ignore the fact that a_stats3 is probably better? Or too corner case to care? 9. examine_equality_clause() assumes it'll get a Var. I see we should only allow clauses that pass statext_is_compatible_clause_internal(), so maybe it's worth an Assert(IsA(var, Var)) along with a comment to mention anything else could not have been allowed. 10. Does examine_equality_clause need 'root' as an argument? 11. UINT16_MAX -> PG_UINT16_MAX /* make sure we fit into uint16 */ Assert(count <= UINT16_MAX); (Out of energy for today.) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pg_dump multi VALUES INSERT
On Fri, Jan 4, 2019 at 3:08 PM David Rowley wrote: > On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen > wrote: > > On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO > wrote: > >> > At first i also try to do it like that but it seems the function will > >> > became long and more complex to me > >> > >> Probably. But calling it with size 100 should result in the same > behavior, > >> so it is really just an extension of the preceeding one? Or am I missing > >> something? > >> > > > > Specifying table data using single value insert statement and user > specified values insert statement > > have enough deference that demand to be separate function and they are > not the same thing that should implement > > with the same function. Regarding code duplication i think the solution > is making those code separate function > > and call at appropriate place. > > I don't really buy this. I've just hacked up a version of > dumpTableData_insert() which supports a variable number rows per > statement. It seems fairly clean and easy to me. Likely the fact that > this is very possible greatly increases the chances of this getting in > since it gets rid of the code duplication. I did also happen to move > the row building code out of the function into its own function, but > that's not really required. I just did that so I could see all the > code in charge of terminating each statement on my screen without > having to scroll. I've not touched any of the plumbing work to plug > the rows_per_statement variable into the command line argument. So > it'll need a bit of merge work with the existing patch. There will > need to be some code that ensures that the user does not attempt to > have 0 rows per statement. The code I wrote won't behave well if that > happens. > > The attache patch use your method mostly ... Checks existing patch ... > > I see you added a test, but missed checking for 0. That needs to be fixed. > > + if (dopt.dump_inserts_multiple < 0) > + { > + write_msg(NULL, "argument of --insert-multi must be positive number\n"); > + exit_nicely(1); > + } > > fixed I also didn't adopt passing the rows-per-statement into the FETCH. I > think that's a very bad idea and we should keep that strictly at 100. > I don't see any reason to tie the two together. If a user wants 10 > rows per statement, do we really want to FETCH 10 times more often? > And what happens when they want 1 million rows per statement? We've no > reason to run out of memory from this since we're just dumping the > rows out to the archive on each row. > > okay > > +Specify the number of values per INSERT > command. > +This will make the dump file smaller than > --inserts > +and it is faster to reload but lack per row data lost on error > +instead entire affected insert statement data lost. > > Unsure what you mean about "data lost". It also controls the number > of "rows" per INSERT statement, not the number of > values. > > I think it would be fine just to say: > > +When using --inserts, this allows the maximum > number > +of rows per INSERT statement to be specified. > +This setting defaults to 1. > > > i change it too except "This setting defaults to 1" because it doesn't have default value. 1 row per statement means --inserts option . > > 2. Is --insert-multi a good name? What if they do --insert-multi=1? > That's not very "multi". Is --rows-per-insert better? > > --rows-per-insert is better for me . regards Surafel diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 9e0bb93f08..4195fb81a2 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -775,6 +775,16 @@ PostgreSQL documentation + + --rows-per-insert + + +When using --rows-per-insert, this allows the maximum number +of rows per INSERT statement to be specified. + + + + --load-via-partition-root diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 4a2e122e2d..73a243ecb0 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -72,6 +72,7 @@ typedef struct _restoreOptions int dropSchema; int disable_dollar_quoting; int dump_inserts; + int dump_inserts_multiple; int column_inserts; int if_exists; int no_comments; /* Skip comments */ @@ -144,6 +145,7 @@ typedef struct _dumpOptions /* flags for various command-line long options */ int disable_dollar_quoting; int dump_inserts; + int dump_inserts_multiple; int column_inserts; int if_exists; int no_comments; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 0e129f9654..e49e2206e7 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -313,6 +313,7 @@ main(int argc, char **argv) int plainText = 0; ArchiveFormat archiveFormat = archUnknown; ArchiveMode archiveMode; + char *p; static Du
Re: Libpq support to connect to standby server as priority
On Tue, 15 Jan 2019 at 23:21, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > From: Dave Cramer [mailto:p...@fastcrypt.com] > > The original desire should have been the ability to connect to a > > primary or a standby. So, I think we should go back to the original > thinking > > (and not complicate the feature), and create a read only GUC_REPORT > variable, > > say, server_role, that identifies whether the server is a primary or a > > standby. > > > > > > > > I'm confused as to how this would work. Who or what determines if the > server > > is a primary or standby? > > Overall, the server determines the server role (primary or standby) using > the same mechanism as pg_is_in_recovery(), and set the server_role GUC > parameter. As the parameter is GUC_REPORT, the change is reported to the > clients using the ParameterStatus ('S') message. The clients also get the > value at connection. > Thanks, that clarifies it. Dave Cramer da...@postgresintl.com www.postgresintl.com > >
Re: Libpq support to connect to standby server as priority
On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii wrote: > > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > >> But pg_is_in_recovery() returns true even for a promoting standby. So > >> you have to wait and retry to send pg_is_in_recovery() until it > >> finishes the promotion to find out it is now a primary. I am not sure > >> if backend out to be responsible for this process. If not, libpq would > >> need to handle it but I doubt it would be possible. > > > > Yes, the application needs to retry connection attempts until success. > That's not different from PgJDBC and other DBMSs. > > I don't know what PgJDBC is doing, however I think libpq needs to do > more than just retrying. > > 1) Try to find a node on which pg_is_in_recovery() returns false. If >found, then we assume that is the primary. We also assume that >other nodes are standbys. done. > > 2) If there's no node on which pg_is_in_recovery() returns false, then >we need to retry until we find it. To not retry forever, there >should be a timeout counter parameter. > > IIRC this is essentially what pgJDBC does. Dave Cramer da...@postgresintl.com www.postgresintl.com > >
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 05:59, Laurenz Albe wrote: > Tsunakawa, Takayuki wrote: > > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > > > The problem here of course is that whoever invented > target_session_attrs > > > was unconcerned with following that precedent, so what we have is > > > "target_session_attrs=(any | read-write)". > > > Are we prepared to add some aliases in service of unifying these names? > > > > I think "yes". > > > > > 2. Whether or not you want to follow pgJDBC's naming, it seems like we > > > ought to have both "require read only" and "prefer read only" behaviors > > > in this patch, and maybe likewise "require read write" versus "prefer > > > read write". > I just had a look at the JDBC code there is no prefer read write. There is a "preferSecondary" The logic behind this is that the connection would presumably be only doing reads so ideally it would like a secondary, but if it can't find one it will connect to a primary. To be clear there are 4 target server types in pgJDBC, "any", "master","secondary", and "preferSecondary" (looking at this I need to alias master to primary, but that's another discussion) I have no idea where "I want to write but I'm OK if I cannot came from"? Dave
Re: [PROPOSAL] Shared Ispell dictionaries
I attached files of new version of the patch, I applied your tweaks. > XXX All dictionaries, but only when there's invalid dictionary? I've made a little optimization. I introduced hashvalue into TSDictionaryCacheEntry. Now released only DSM of altered or dropped dictionaries. > > /* XXX not really a pointer, so the name is misleading */ > > I think we don't need DictPointerData struct anymore, because only > ts_dict_shmem_release function needs it (see comments above) and we only > need it to hash search. I'll move all fields of DictPointerData to > TsearchDictKey struct. I was wrong, DictInitData also needs DictPointerData. I didn't remove DictPointerData, I renamed it to DictEntryData. Hope that it is a more appropriate name. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company >From c20c171c2107efc6f87b688af0feecf2f98fcd69 Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 14:27:32 +0300 Subject: [PATCH 1/4] Fix-ispell-memory-handling --- src/backend/tsearch/spell.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c index eb39466b22..eb8416ce7f 100644 --- a/src/backend/tsearch/spell.c +++ b/src/backend/tsearch/spell.c @@ -78,6 +78,8 @@ #define tmpalloc(sz) MemoryContextAlloc(Conf->buildCxt, (sz)) #define tmpalloc0(sz) MemoryContextAllocZero(Conf->buildCxt, (sz)) +#define tmpstrdup(str) MemoryContextStrdup(Conf->buildCxt, (str)) + /* * Prepare for constructing an ISpell dictionary. * @@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag) Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1); strcpy(Conf->Spell[Conf->nspell]->word, word); Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0') - ? cpstrdup(Conf, flag) : VoidString; + ? tmpstrdup(flag) : VoidString; Conf->nspell++; } @@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry, entry->flag.i = i; } else - entry->flag.s = cpstrdup(Conf, s); + entry->flag.s = tmpstrdup(s); entry->flagMode = Conf->flagMode; entry->value = val; @@ -1541,6 +1543,9 @@ nextline: return; isnewformat: + pfree(recoded); + pfree(pstr); + if (oldformat) ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), -- 2.20.1 >From ca45e4ca314bdf8bed1a47796afb3e86c6fcd684 Mon Sep 17 00:00:00 2001 From: Arthur Zakirov Date: Thu, 17 Jan 2019 15:05:44 +0300 Subject: [PATCH 2/4] Change-tmplinit-argument --- contrib/dict_int/dict_int.c | 4 +- contrib/dict_xsyn/dict_xsyn.c| 4 +- contrib/unaccent/unaccent.c | 4 +- src/backend/commands/tsearchcmds.c | 10 - src/backend/snowball/dict_snowball.c | 4 +- src/backend/tsearch/dict_ispell.c| 4 +- src/backend/tsearch/dict_simple.c| 4 +- src/backend/tsearch/dict_synonym.c | 4 +- src/backend/tsearch/dict_thesaurus.c | 4 +- src/backend/utils/cache/ts_cache.c | 13 +- src/include/tsearch/ts_cache.h | 4 ++ src/include/tsearch/ts_public.h | 67 ++-- 12 files changed, 105 insertions(+), 21 deletions(-) diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c index 628b9769c3..ddde55eee4 100644 --- a/contrib/dict_int/dict_int.c +++ b/contrib/dict_int/dict_int.c @@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize); Datum dintdict_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictInt*d; ListCell *l; @@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS) d->maxlen = 6; d->rejectlong = false; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c index 509e14aee0..15b1a0033a 100644 --- a/contrib/dict_xsyn/dict_xsyn.c +++ b/contrib/dict_xsyn/dict_xsyn.c @@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename) Datum dxsyn_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); DictSyn*d; ListCell *l; char *filename = NULL; @@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS) d->matchsynonyms = false; d->keepsynonyms = true; - foreach(l, dictoptions) + foreach(l, init_data->dict_options) { DefElem*defel = (DefElem *) lfirst(l); diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c index fc5176e338..f3663cefd0 100644 --- a/contrib/unaccent/unaccent.c +++ b/contrib/unaccent/unaccent.c @@ -270,12 +270,12 @@ PG_FUNCTION_INFO_V1(unaccent_init); Datum unaccent_init(PG_FUNCTION_ARGS) { - List *dictoptions = (List *) PG_GETARG_POINTER(0); + DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0); TrieChar *rootTrie = NULL; bool fileloaded =
Re: ArchiveEntry optional arguments refactoring
On 2019-Jan-16, Dmitry Dolgov wrote: > Proposed idea is to refactor out all/optional arguments into a separate data > structure, so that adding/removing a new argument wouldn't change that much of > unrelated code. Then for every invocation of ArchiveEntry this structure needs > to be prepared before the call, or as Andres suggested: > > ArchiveEntry((ArchiveArgs){.tablespace = 3, >.dumpFn = somefunc, >...}); Prepping the struct before the call would be our natural style, I think. This one where the struct is embedded in the function call does not look *too* horrible, but I'm curious as to what does pgindent do with it. > Another suggestion from Amit is to have an ArchiveEntry() function with > limited > number of parameters, and an ArchiveEntryEx() with those extra parameters > which > are not needed in usual cases. Is there real savings to be had by doing this? What would be the arguments to each function? Off-hand, I'm not liking this idea too much. But maybe we can combine both ideas and have one "normal" function with only the most common args, and create ArchiveEntryExtended to use the struct as proposed by Andres. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Acceptable/Best formatting of callbacks (for pluggable storage)
On 2019-Jan-11, Andres Freund wrote: > Just as an example of why I think this isn't great: Hmm, to me, the first example is much better because of *vertical* space -- I can have the whole API in one screenful. In the other example, the same number of functions take many more lines. The fact that the arguments are indented differently doesn't bother me. > typedef Size (*EstimateDSMForeignScan_function) (ForeignScanState *node, > ParallelContext *pcxt); > typedef void (*InitializeDSMForeignScan_function) (ForeignScanState *node, >ParallelContext *pcxt, >void *coordinate); > typedef void (*ReInitializeDSMForeignScan_function) (ForeignScanState *node, > ParallelContext *pcxt, > void *coordinate); > typedef void (*InitializeWorkerForeignScan_function) (ForeignScanState *node, > shm_toc *toc, > void *coordinate); > typedef void (*ShutdownForeignScan_function) (ForeignScanState *node); > typedef bool (*IsForeignScanParallelSafe_function) (PlannerInfo *root, > RelOptInfo *rel, > RangeTblEntry *rte); > typedef List *(*ReparameterizeForeignPathByChild_function) (PlannerInfo *root, > List *fdw_private, > RelOptInfo > *child_rel); > > that's a lot of indentation variability in a small amount of space - I > find it noticably slower to mentally parse due to that. Compare that > with: > > typedef Size (*EstimateDSMForeignScan_function) > (ForeignScanState *node, > ParallelContext *pcxt); > > typedef void (*InitializeDSMForeignScan_function) > (ParallelContext *pcxt, > void *coordinate); > > typedef void (*ReInitializeDSMForeignScan_function) > (ForeignScanState *node, > ParallelContext *pcxt, > void *coordinate); > > typedef void (*InitializeWorkerForeignScan_function) > (ForeignScanState *node, > shm_toc *toc, > void *coordinate); > > typedef void (*ShutdownForeignScan_function) > (ForeignScanState *node); > > typedef bool (*IsForeignScanParallelSafe_function) > (PlannerInfo *root, > RelOptInfo *rel, > RangeTblEntry *rte); > > typedef List *(*ReparameterizeForeignPathByChild_function) > (PlannerInfo *root, > List *fdw_private, > RelOptInfo *child_rel); > > I find the second formatting considerably easier to read, albeit not for > the first few seconds. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ArchiveEntry optional arguments refactoring
Alvaro Herrera writes: > On 2019-Jan-16, Dmitry Dolgov wrote: >> ArchiveEntry((ArchiveArgs){.tablespace = 3, >> .dumpFn = somefunc, >> ...}); > Is there real savings to be had by doing this? What would be the > arguments to each function? Off-hand, I'm not liking this idea too > much. I'm not either. What this looks like it will mainly do is create a back-patching barrier, with little if any readability improvement. I don't buy the argument that this would move the goalposts in terms of how much work it is to add a new argument. You'd still end up touching every call site. regards, tom lane
Re: Early WIP/PoC for inlining CTEs
On 1/11/19 8:10 PM, Robert Haas wrote: WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query... Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the usefulness of forcing inlining other than if we by default do not inline when a CTE is referenced multiple times. Do you imaging it working something like the below? 1. Default # Not inlined - Referenced multiple times - Includes FOR UPDATE|SHARE - Includes volatile functions - Recurisve - DML # Inlined - Simple case (no side effects, referenced once) 2. MATERIALIZED # Not inlined - Everything # Inlined - (none) 3. NOT MATERIALIZED # Not inlined - Recursive - DML # Inlined - Everything else Andreas
Re: Feature: temporary materialized views
On 1/11/19 8:47 PM, Mitar wrote: In create_ctas_internal() why do you copy the relation even when you do not modify it? I was modelling this after code in view.c [1]. I can move copy into the "if". Makes sense. Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from ExecCreateTableAs()? I feel it is there for a good reason and that we preferably want to reduce the duration of SECURITY_RESTRICTED_OPERATION to only include when we actually execute the query. The comment there said that this is not really necessary for security: "This is not necessary for security, but this keeps the behavior similar to REFRESH MATERIALIZED VIEW. Otherwise, one could create a materialized view not possible to refresh." Based on my experimentation, this is required to be able to use temporary materialized views, but it does mean one has to pay attention from where one can refresh. For example, you cannot refresh from outside of the current session, because temporary object is not available there. I have not seen any other example where refresh would not be possible. This is why I felt comfortable removing this. Also, no test failed after removing this. Hm, I am still not convinced just removing it is a good idea. Sure, it is not a security issue but usability is also important. The question is how much this worsens usability and how much extra work it would be to keep the restriction. Btw, if we are going to remove SECURITY_RESTRICTED_OPERATION we should remove more code. There is no reason to save and reset the bitmask if we do not alter it. Andreas
Re: Early WIP/PoC for inlining CTEs
Andreas Karlsson writes: > On 1/11/19 8:10 PM, Robert Haas wrote: >> WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query... > Hm, when would one want "NOT MATERIALIZED"? I am not sure I see the > usefulness of forcing inlining other than if we by default do not inline > when a CTE is referenced multiple times. I'm also concerned about what we do if the user says NOT MATERIALIZED but there are semantic or implementation reasons not to inline. Either we throw an error or do something the user didn't expect, and neither is very nice. So I'm not in favor of having that option. regards, tom lane
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Wed, Jan 16, 2019 at 8:49 AM James Coleman wrote: > > On Tue, Jan 15, 2019 at 11:37 PM Tom Lane wrote: > > Well, as I said upthread, it seems like we need to think a bit more > > carefully about what it is that clause_is_strict_for is testing --- > > and if that ends up finding that some other name is more apposite, > > I'd not have any hesitation about renaming it. But we're really > > missing a bet if the ScalarArrayOp-specific check isn't inside that > > recursive check of an "atomic" expression's properties. The > > routines above there only recurse through things that act like > > AND or OR, but clause_is_strict_for is capable of descending > > through other stuff as long as it's strict in some sense. What > > we need to be clear about here is exactly what that sense is. > > All right, I'll look over all of the other callers and see what makes > sense as far as naming (or perhaps consider a new parallel function; > unsure at this point.) I investigated all of the callers of clause_is_strict_for and confirmed they fall into two buckets: - Recursive case inside the function itself. - Callers proving a variant of IS [NOT] NULL. Given all cases appear to be safe to extend to scalar array ops, I've tentatively renamed the function clause_proved_for_null_test and moved the code back into that function rather than be in the caller. This also guarantees that beyond proving implication of IS NOT NULL and refutation of IS NULL we also get proof of weak refutation of strict predicates (since IS NOT NULL refutes them and we strongly imply IS NOT NULL); I've added tests for this case. I added many more comments explaining the proofs here, but it boils down to the fact that non-empty array ops are really just a strict clause, and the empty array case isn't strict, but when not returning NULL returns false, and therefore we can still make the same proofs. Since the empty array case is really the only interesting one, I brought back the empty array tests, but modified them to use calculated/non-constant arrays so that they actually hit the new code paths. I also verified that the proofs they make are a subset of the ones we get from existing code for constant empty arrays (it makes sense we'd be able to prove less about possibly empty arrays than known empty arrays.) > I might also try to see if we can edit the tests slightly to require > the recursion case to be exercised. I didn't tackle this, but if you think it would add real value, then I can look into it further. I also updated the commit message to better match the more extended implications of this change over just the IS NOT NULL case I was originally pursuing. One other thing: for known non-empty arrays we could further extend this to prove that an IS NULL clause weakly implies the ScalarArrayOp. I'm not sure this is worth is it; if someone things otherwise let me know. James Coleman saop_is_not_null-v7.patch Description: Binary data
Re: Feature: temporary materialized views
Andreas Karlsson writes: > On 1/11/19 8:47 PM, Mitar wrote: >>> Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from >>> ExecCreateTableAs()? >> The comment there said that this is not really necessary for security: >> "This is not necessary for security, but this keeps the behavior >> similar to REFRESH MATERIALIZED VIEW. Otherwise, one could create a >> materialized view not possible to refresh." > Hm, I am still not convinced just removing it is a good idea. Sure, it > is not a security issue but usability is also important. Indeed. I don't buy the argument that this should work differently for temp views. The fact that they're only accessible in the current session is no excuse for that: security considerations still matter, because you can have different privilege contexts within a single session (consider SECURITY DEFINER functions etc). What is the stumbling block to just leaving that alone? regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On Sun, Jan 13, 2019 at 11:41 AM Tom Lane wrote: > Putting a limit on the size of the syscaches doesn't accomplish anything > except to add cycles if your cache working set is below the limit, or > make performance fall off a cliff if it's above the limit. If you're running on a Turing machine, sure. But real machines have finite memory, or at least all the ones I use do. Horiguchi-san is right that this is a real, not theoretical problem. It is one of the most frequent operational concerns that EnterpriseDB customers have. I'm not against solving specific cases with more targeted fixes, but I really believe we need something more. Andres mentioned one problem case: connection poolers that eventually end up with a cache entry for every object in the system. Another case is that of people who keep idle connections open for long periods of time; those connections can gobble up large amounts of memory even though they're not going to use any of their cache entries any time soon. The flaw in your thinking, as it seems to me, is that in your concern for "the likelihood that cache flushes will simply remove entries we'll soon have to rebuild," you're apparently unwilling to consider the possibility of workloads where cache flushes will remove entries we *won't* soon have to rebuild. Every time that issue gets raised, you seem to blow it off as if it were not a thing that really happens. I can't make sense of that position. Is it really so hard to imagine a connection pooler that switches the same connection back and forth between two applications with different working sets? Or a system that keeps persistent connections open even when they are idle? Do you really believe that a connection that has not accessed a cache entry in 10 minutes still derives more benefit from that cache entry than it would from freeing up some memory? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ArchiveEntry optional arguments refactoring
On 2019-01-17 10:23:39 -0500, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jan-16, Dmitry Dolgov wrote: > >> ArchiveEntry((ArchiveArgs){.tablespace = 3, > >> .dumpFn = somefunc, > >> ...}); > > > Is there real savings to be had by doing this? What would be the > > arguments to each function? Off-hand, I'm not liking this idea too > > much. > > I'm not either. What this looks like it will mainly do is create > a back-patching barrier, with little if any readability improvement. I don't really buy this. We've whacked around the arguments to ArchiveEntry() repeatedly over the last few releases, so there's already a hindrance to backpatching. And given the current setup we've to whack around all 70+ callsites whenever a single argument is added. With the setup I'd suggested you don't, because the designated initializer syntax allows you to omit items that ought to be zero-initialized. And given the number of arguments to ArchiveEntry() having a name for each argument would help for readability too. It's currently not exactly obvious what is an argument for what: ArchiveEntry(AH, nilCatalogId, createDumpId(), "ENCODING", NULL, NULL, "", "ENCODING", SECTION_PRE_DATA, qry->data, "", NULL, NULL, 0, NULL, NULL); If you compare that with ArchiveEntry(AH, (ArchiveEntry){.catalogId = nilCatalogId, .dumpId = createDumpId(), .tag = "ENCODING", .desc = "ENCODING", .section = SECTION_PRE_DATA, .defn = qry->data}); it's definitely easier to see what argument is what. > I don't buy the argument that this would move the goalposts in terms > of how much work it is to add a new argument. You'd still end up > touching every call site. Why? A lot of arguments that'd be potentially added or removed would not be set by each callsites. If you e.g. look at you can see that a lot of changes where like ArchiveEntry(fout, nilCatalogId, createDumpId(), "pg_largeobject", NULL, NULL, "", -false, "pg_largeobject", SECTION_PRE_DATA, +"pg_largeobject", SECTION_PRE_DATA, loOutQry->data, "", NULL, NULL, 0, NULL, NULL); i.e. just removing a 'false' argument. In like 70+ callsites. With the above scheme, we'd instead just have removed a single .withoids = true, from dumpTableSchema()'s ArchiveEntry() call. Greetings, Andres Freund
Re: Proposal for Signal Detection Refactoring
Hi, On 2019-01-17 10:50:56 +0100, Chris Travers wrote: > diff --git a/src/backend/utils/init/globals.c > b/src/backend/utils/init/globals.c > index c6939779b9..5ed715589e 100644 > --- a/src/backend/utils/init/globals.c > +++ b/src/backend/utils/init/globals.c > @@ -27,12 +27,35 @@ > > ProtocolVersion FrontendProtocol; > > +/* > + * Signal Handling variables here are set in signal handlers and can be > + * checked by code to determine the implications of calling > + * CHECK_FOR_INTERRUPTS(). While all are supported, in practice > + * InterruptPending, QueryCancelPending, and ProcDiePending appear to be > + * sufficient for almost all use cases. > + */ Especially the latter part of comment seems like a bad idea. > +/* Whether CHECK_FOR_INTERRUPTS will do anything */ > volatile sig_atomic_t InterruptPending = false; That's not actually a correct description. CFI is dependent on other things than just InterruptPending, namely HOLD_INTERRUPTS() (even though it's inside ProcessInterrupts()). It also always does more on windows. I think the variable name is at least as good a description as the comment you added. > +/* Whether we are planning on cancelling the current query */ > volatile sig_atomic_t QueryCancelPending = false; > +/* Whether we are planning to exit the current process when safe to do so.*/ > volatile sig_atomic_t ProcDiePending = false; > + > +/* Whether we have detected a lost client connection */ > volatile sig_atomic_t ClientConnectionLost = false; > + > +/* > + * Whether the process is dying because idle transaction timeout has been > + * reached. > + */ > volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; > + > +/* Whether we will reload the configuration at next opportunity */ > volatile sig_atomic_t ConfigReloadPending = false; > + These comments all seem to add no information above the variable name. I'm not quite understanding what this patch is intended to solve, sorry. Greetings, Andres Freund
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila wrote: > Yes, I think it would be good if you can explain the concept of > local-map with the help of this example. > Then let's not add a reference to the version number in this case. I Okay, done in v14. I kept your spelling of the new macro. One minor detail added: use uint8 rather than char for the local map array. This seems to be preferred, especially in this file. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 38bc2d9b4ae3f0d1ab6d21c54ed638c5aee6d637 Mon Sep 17 00:00:00 2001 From: John Naylor Date: Thu, 17 Jan 2019 12:25:30 -0500 Subject: [PATCH v14 1/2] Avoid creation of the free space map for small heap relations. Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. --- contrib/pageinspect/expected/page.out | 77 +++--- contrib/pageinspect/sql/page.sql | 33 +-- doc/src/sgml/storage.sgml | 13 +- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +- src/backend/access/heap/hio.c | 47 ++-- src/backend/access/heap/vacuumlazy.c | 17 +- src/backend/access/transam/xact.c | 14 ++ src/backend/storage/freespace/README | 32 ++- src/backend/storage/freespace/freespace.c | 272 +- src/backend/storage/freespace/indexfsm.c | 6 +- src/include/storage/freespace.h | 9 +- src/test/regress/expected/fsm.out | 62 + src/test/regress/parallel_schedule| 6 + src/test/regress/serial_schedule | 1 + src/test/regress/sql/fsm.sql | 46 16 files changed, 545 insertions(+), 102 deletions(-) create mode 100644 src/test/regress/expected/fsm.out create mode 100644 src/test/regress/sql/fsm.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..83e5910453 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48 +1,69 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); -VACUUM test1; -- set up FSM +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; +-- set up FSM and VM +VACUUM test_rel_forks; -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; -ERROR: block number 1 is out of range for relation "test1" -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; vm_0 -- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1; -ERROR: block number 1 is out of range for relation "test1" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; +ERROR: block number 1 is out of range for relation "test_rel_forks" SELECT octet_length(get_raw_page('xxx', 'main', 0)); ERROR: relation "xxx" does not exist -SELECT octet_length(get_raw_page('test1', 'xxx', 0)); +SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0)); ERROR: invalid fork name HINT: Valid fork names are "main", "fsm", "vm", and "init". -SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0); +SELECT * FROM fsm_page_conten
Re: Feature: temporary materialized views
On 1/17/19 4:57 PM, Tom Lane wrote: Andreas Karlsson writes: On 1/11/19 8:47 PM, Mitar wrote: Is it really ok to just remove SECURITY_RESTRICTED_OPERATION from ExecCreateTableAs()? The comment there said that this is not really necessary for security: "This is not necessary for security, but this keeps the behavior similar to REFRESH MATERIALIZED VIEW. Otherwise, one could create a materialized view not possible to refresh." Hm, I am still not convinced just removing it is a good idea. Sure, it is not a security issue but usability is also important. Indeed. I don't buy the argument that this should work differently for temp views. The fact that they're only accessible in the current session is no excuse for that: security considerations still matter, because you can have different privilege contexts within a single session (consider SECURITY DEFINER functions etc). What is the stumbling block to just leaving that alone? I think the issue Mitar ran into is that the temporary materialized view is created in the rStartup callback of the receiver which happens after SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the creation of the view itself is denied. From a cursory glance it looks like it would be possible to move the setting of SECURITY_RESTRICTED_OPERATION to inside the rStartup callabck, other than that the code for resetting the security context might get a bit ugly. Do you see any flaws with that solution? Andreas
Re: Proposal for Signal Detection Refactoring
On Thu, Jan 17, 2019 at 6:38 PM Andres Freund wrote: > Hi, > > On 2019-01-17 10:50:56 +0100, Chris Travers wrote: > > diff --git a/src/backend/utils/init/globals.c > b/src/backend/utils/init/globals.c > > index c6939779b9..5ed715589e 100644 > > --- a/src/backend/utils/init/globals.c > > +++ b/src/backend/utils/init/globals.c > > @@ -27,12 +27,35 @@ > > > > ProtocolVersion FrontendProtocol; > > > > +/* > > + * Signal Handling variables here are set in signal handlers and can be > > + * checked by code to determine the implications of calling > > + * CHECK_FOR_INTERRUPTS(). While all are supported, in practice > > + * InterruptPending, QueryCancelPending, and ProcDiePending appear to be > > + * sufficient for almost all use cases. > > + */ > > Especially the latter part of comment seems like a bad idea. > > > > +/* Whether CHECK_FOR_INTERRUPTS will do anything */ > > volatile sig_atomic_t InterruptPending = false; > > That's not actually a correct description. CFI is dependent on other > things than just InterruptPending, namely HOLD_INTERRUPTS() (even though > it's inside ProcessInterrupts()). It also always does more on windows. > I think the variable name is at least as good a description as the > comment you added. > > > > +/* Whether we are planning on cancelling the current query */ > > volatile sig_atomic_t QueryCancelPending = false; > > > +/* Whether we are planning to exit the current process when safe to do > so.*/ > > volatile sig_atomic_t ProcDiePending = false; > > + > > +/* Whether we have detected a lost client connection */ > > volatile sig_atomic_t ClientConnectionLost = false; > > + > > +/* > > + * Whether the process is dying because idle transaction timeout has > been > > + * reached. > > + */ > > volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false; > > + > > +/* Whether we will reload the configuration at next opportunity */ > > volatile sig_atomic_t ConfigReloadPending = false; > > + > > These comments all seem to add no information above the variable name. > > > I'm not quite understanding what this patch is intended to solve, sorry. > I would like to see the fact that checking these global fields is the correct way of understanding what CHECK_FOR_INTERRUPTS will do before you do it as per the two or three places in the code where this is already done. I don't like reasoning about acceptable coding practices based on "someone has already committed something like this before." I found that difficult when I was trying to fix the race condition and thought this would help others in similar cases. Keep in mind that C extensions might use internal functions etc. perhaps in ways which are different from what the main source code does. Therefore what is safe to do, and what we are not willing to guarantee as safe should probably be documented. > > Greetings, > > Andres Freund > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: Proposal for Signal Detection Refactoring
Hi, On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote: > More generally, I'd like this material to be code comments. It's the > kind of stuff that gets outdated before long if it's kept separate. I'm not sure I buy this here - we don't have (but perhaps should?) a convenient location for an overview comment around this. There's no "signal_handling.c" where it'd clearly belong - given the lack of a clear point to look to, I don't think a README.SIGNAL_HANDLING would get out-of-date more quickly than code comments in mildly related place (say postgres.c or miscinit.c) would get out of date at a different pace. Greetings, Andres Freund
Re: ArchiveEntry optional arguments refactoring
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2019-01-17 10:23:39 -0500, Tom Lane wrote: > > Alvaro Herrera writes: > > > On 2019-Jan-16, Dmitry Dolgov wrote: > > >> ArchiveEntry((ArchiveArgs){.tablespace = 3, > > >> .dumpFn = somefunc, > > >> ...}); > > > > > Is there real savings to be had by doing this? What would be the > > > arguments to each function? Off-hand, I'm not liking this idea too > > > much. > > > > I'm not either. What this looks like it will mainly do is create > > a back-patching barrier, with little if any readability improvement. > > I don't really buy this. We've whacked around the arguments to > ArchiveEntry() repeatedly over the last few releases, so there's already > a hindrance to backpatching. And given the current setup we've to whack > around all 70+ callsites whenever a single argument is added. With the > setup I'd suggested you don't, because the designated initializer syntax > allows you to omit items that ought to be zero-initialized. > > And given the number of arguments to ArchiveEntry() having a name for > each argument would help for readability too. It's currently not exactly > obvious what is an argument for what: > ArchiveEntry(AH, nilCatalogId, createDumpId(), >"ENCODING", NULL, NULL, "", >"ENCODING", SECTION_PRE_DATA, >qry->data, "", NULL, >NULL, 0, >NULL, NULL); > > If you compare that with > > ArchiveEntry(AH, > (ArchiveEntry){.catalogId = nilCatalogId, > .dumpId = createDumpId(), > .tag = "ENCODING", > .desc = "ENCODING", > .section = SECTION_PRE_DATA, > .defn = qry->data}); > > it's definitely easier to see what argument is what. +1. I was on the fence about this approach when David started using it in pgBackRest but I've come to find that it's actually pretty nice and being able to omit things which should be zero/default is very nice. I feel like it's quite similar to what we do in other places too- just look for things like: utils/adt/jsonfuncs.c:600 sem = palloc0(sizeof(JsonSemAction)); ... sem->semstate = (void *) state; sem->array_start = okeys_array_start; sem->scalar = okeys_scalar; sem->object_field_start = okeys_object_field_start; /* remainder are all NULL, courtesy of palloc0 above */ pg_parse_json(lex, sem); ... pfree(sem); > > I don't buy the argument that this would move the goalposts in terms > > of how much work it is to add a new argument. You'd still end up > > touching every call site. > > Why? A lot of arguments that'd be potentially added or removed would not > be set by each callsites. > > If you e.g. look at > > you can see that a lot of changes where like > ArchiveEntry(fout, nilCatalogId, createDumpId(), > "pg_largeobject", NULL, NULL, "", > -false, "pg_largeobject", SECTION_PRE_DATA, > +"pg_largeobject", SECTION_PRE_DATA, > loOutQry->data, "", NULL, > NULL, 0, > NULL, NULL); > > i.e. just removing a 'false' argument. In like 70+ callsites. With the > above scheme, we'd instead just have removed a single .withoids = true, > from dumpTableSchema()'s ArchiveEntry() call. Agreed. Using this approach in more places, when appropriate and sensible, seems like a good direction to go in. To be clear, I don't think we should go rewrite pieces of code just for the sake of it as that would make back-patching more difficult, but when we're making changes anyway, or where it wouldn't really change the landscape for back-patching, then it seems like a good change. Thanks! Stephen signature.asc Description: PGP signature
Re: ArchiveEntry optional arguments refactoring
Hi, On 2019-01-17 09:29:04 -0800, Andres Freund wrote: > On 2019-01-17 10:23:39 -0500, Tom Lane wrote: > > I don't buy the argument that this would move the goalposts in terms > > of how much work it is to add a new argument. You'd still end up > > touching every call site. > > Why? A lot of arguments that'd be potentially added or removed would not > be set by each callsites. > > If you e.g. look at > > you can see that a lot of changes where like > ArchiveEntry(fout, nilCatalogId, createDumpId(), > "pg_largeobject", NULL, NULL, "", > -false, "pg_largeobject", SECTION_PRE_DATA, > +"pg_largeobject", SECTION_PRE_DATA, > loOutQry->data, "", NULL, > NULL, 0, > NULL, NULL); > > i.e. just removing a 'false' argument. In like 70+ callsites. With the > above scheme, we'd instead just have removed a single .withoids = true, > from dumpTableSchema()'s ArchiveEntry() call. the "at" I was trying to reference above is 578b229718e8f15fa779e20f086c4b6bb3776106 / the WITH OID removal, and therein specifically the pg_dump changes. Greetings, Andres Freund
Re: Proposal for Signal Detection Refactoring
On Thu, Jan 17, 2019 at 7:08 PM Andres Freund wrote: > Hi, > > On 2018-10-09 16:04:35 +0200, Peter Eisentraut wrote: > > More generally, I'd like this material to be code comments. It's the > > kind of stuff that gets outdated before long if it's kept separate. > > I'm not sure I buy this here - we don't have (but perhaps should?) a > convenient location for an overview comment around this. There's no > "signal_handling.c" where it'd clearly belong - given the lack of a > clear point to look to, I don't think a README.SIGNAL_HANDLING would get > out-of-date more quickly than code comments in mildly related place (say > postgres.c or miscinit.c) would get out of date at a different pace. > The other point is that "This is the way to do it" is a little easier when in a README rather than in code comments sine those might be scattered in a bunch of different places. > > Greetings, > > Andres Freund > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: Feature: temporary materialized views
Andreas Karlsson writes: > On 1/17/19 4:57 PM, Tom Lane wrote: >> What is the stumbling block to just leaving that alone? > I think the issue Mitar ran into is that the temporary materialized view > is created in the rStartup callback of the receiver which happens after > SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the > creation of the view itself is denied. Hm. > From a cursory glance it looks like it would be possible to move the > setting of SECURITY_RESTRICTED_OPERATION to inside the rStartup > callabck, other than that the code for resetting the security context > might get a bit ugly. Do you see any flaws with that solution? Don't think that works: the point here is to restrict what can happen during planning/execution of the view query, so letting planning and query startup happen first is no good. Creating the view object inside the rStartup callback is itself pretty much of a kluge; you'd expect that to happen earlier. I think the reason it was done that way was it was easier to find out the view's column set there, but I'm sure we can find another way --- doing the object creation more like a regular view does it is the obvious approach. regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On Thu, Jan 17, 2019 at 11:33:35AM -0500, Robert Haas wrote: > The flaw in your thinking, as it seems to me, is that in your concern > for "the likelihood that cache flushes will simply remove entries > we'll soon have to rebuild," you're apparently unwilling to consider > the possibility of workloads where cache flushes will remove entries > we *won't* soon have to rebuild. Every time that issue gets raised, > you seem to blow it off as if it were not a thing that really happens. > I can't make sense of that position. Is it really so hard to imagine > a connection pooler that switches the same connection back and forth > between two applications with different working sets? Or a system > that keeps persistent connections open even when they are idle? Do > you really believe that a connection that has not accessed a cache > entry in 10 minutes still derives more benefit from that cache entry > than it would from freeing up some memory? Well, I think everyone agrees there are workloads that cause undesired cache bloat. What we have not found is a solution that doesn't cause code complexity or undesired overhead, or one that >1% of users will know how to use. Unfortunately, because we have not found something we are happy with, we have done nothing. I agree LRU can be expensive. What if we do some kind of clock sweep and expiration like we do for shared buffers? I think the trick is figuring how frequently to do the sweep. What if we mark entries as unused every 10 queries, mark them as used on first use, and delete cache entries that have not be used in the past 10 queries. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Peter Geoghegan writes: > On Tue, Dec 18, 2018 at 2:26 PM Tom Lane wrote: >> Hm, that definitely leads me to feel that we've got bug(s) in >> dependency.c. I'll take a look sometime soon. > Thanks! > I'm greatly relieved that I probably won't have to become an expert on > dependency.c after all. :-) So I poked around for awhile with running the regression tests under ignore_system_indexes. There seem to be a number of issues involved here. To a significant extent, they aren't bugs, at least not according to the original conception of the dependency code: it was not a design goal that different dependencies of the same object-to-be-deleted would be processed in a fixed order. That leads to the well-known behavior that cascade drops report the dropped objects in an unstable order. Now, perhaps we should make such stability a design goal, as it'd allow us to get rid of some "suppress the cascade outputs" hacks in the regression tests. But it's a bit of a new feature. If we wanted to do that, I'd be inclined to do it by absorbing all the pg_depend entries for a particular object into an ObjectAddress array and then sorting them before we process them. The main stumbling block here is "what would the sort order be?". The best idea I can come up with offhand is to sort by OID, which at least for regression test purposes would mean objects would be listed/processed more or less in creation order. However, there are a couple of places where the ignore_system_indexes output does something weird like DROP TABLE PKTABLE; ERROR: cannot drop table pktable because other objects depend on it -DETAIL: constraint constrname2 on table fktable depends on table pktable +DETAIL: constraint constrname2 on table fktable depends on index pktable_pkey HINT: Use DROP ... CASCADE to drop the dependent objects too. The reason for this is that the reported "nearest dependency" is the object that we first reached the named object by recursing from. If there are multiple dependency paths to the same object then it's order-traversal-dependent which path we take first. Sorting the pg_depend entries before processing, as I suggested above, would remove the instability, but it's not real clear whether we'd get a desirable choice of reported object that way. Perhaps we could improve matters by having the early exits from findDependentObjects (at stack_address_present_add_flags and object_address_present_add_flags) consider replacing the already-recorded dependee with the current source object, according to some set of rules. One rule that I think would be useful is to compare dependency types: I think a normal dependency is more interesting than an auto dependency which is more interesting than an internal one. Beyond that, though, I don't see anything we could do but compare OIDs. (If we do compare OIDs, then the result should be stable with or without pre-sorting of the pg_depend entries.) I also noticed some places where the output reports a different number of objects dropped by a cascade. This happens when a table column is reached by some dependency path, while the whole table is reached by another one. If we find the whole table first, then when we find the table column we just ignore it. But in the other order, they're reported as two separate droppable objects. The reasoning is explained in object_address_present_add_flags: * We get here if we find a need to delete a whole table after * having already decided to drop one of its columns. We * can't report that the whole object is in the array, but we * should mark the subobject with the whole object's flags. * * It might seem attractive to physically delete the column's * array entry, or at least mark it as no longer needing * separate deletion. But that could lead to, e.g., dropping * the column's datatype before we drop the table, which does * not seem like a good idea. This is a very rare situation * in practice, so we just take the hit of doing a separate * DROP COLUMN action even though we know we're gonna delete * the table later. I think this reasoning is sound, and we should continue to do the separate DROP COLUMN step, but it occurs to me that just because we drop the column separately doesn't mean we have to report it separately to the user. I propose that we handle this case by adding a new DEPFLAG_IS_SUBOBJECT flag to the column object's flags, denoting that we know the whole table will be dropped later. The only effect of this flag is to suppress reporting of the column object in reportDependentObjects. Another order-dependent effect that can be seen in the regression tests comes from the fact that the first loop in findDependentObjects (over the target's referenced objects) errors out immediately on the first DE
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
se elog(). > 3. I've not really got into understanding how the new statistics types > are applied yet, but I found this: > > * If asked to build both MCV and histogram, first build the MCV part > * and then histogram on the remaining rows. > > I guess that means we'll get different estimates with: > > create statistic a_stats (mcv,histogram) on a,b from t; > > vs > > create statistic a_stats1 (mcv) on a,b from t; > create statistic a_stats2 (histogram) on a,b from t; > > Is that going to be surprising to people? Well, I don't have a good answer to this, except for mentioning this in the SGML docs. > 5. serialize_histogram() and statext_histogram_deserialize(), should > these follow the same function naming format? Perhaps, although serialize_histogram() is static and so it's kinda internal API. > 8. Looking at statext_clauselist_selectivity() I see it calls > choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV | > STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to > attempt to find the best match plus the one with the most statistics > kinds? > > It might only matter if someone had: > > create statistic a_stats1 (mcv) on a,b from t; > create statistic a_stats2 (histogram) on a,b from t; > create statistic a_stats3 (mcv,histogram) on a,b from t; > > Is it fine to just return a_stats1 and ignore the fact that a_stats3 > is probably better? Or too corner case to care? I don't know. My assumption is people will not create such overlapping statics. > 9. examine_equality_clause() assumes it'll get a Var. I see we should > only allow clauses that pass statext_is_compatible_clause_internal(), > so maybe it's worth an Assert(IsA(var, Var)) along with a comment to > mention anything else could not have been allowed. Maybe. > 10. Does examine_equality_clause need 'root' as an argument? Probably not. I guess it's a residue some older version. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-multivariate-MCV-lists-20190117.patch.gz Description: application/gzip 0002-multivariate-histograms-20190117.patch.gz Description: application/gzip
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 06:04, Tom Lane wrote: Although we've got a few NetBSD and OpenBSD buildfarm critters, none of them are doing --enable-tap-tests. If they were, we'd have noticed the pgbench regression tests falling over: For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) and NetBSD 7 (sidewinder) animals now. /Mikael
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: > On 2019-01-17 06:04, Tom Lane wrote: >> Although we've got a few NetBSD and OpenBSD buildfarm critters, >> none of them are doing --enable-tap-tests. If they were, we'd >> have noticed the pgbench regression tests falling over: > For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) > and NetBSD 7 (sidewinder) animals now. Oh, thanks! I'm guessing they'll fail their next runs, but I'll wait to see confirmation of that before I do anything about the test bug. regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 22:16, Tom Lane wrote: For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) and NetBSD 7 (sidewinder) animals now. Oh, thanks! I'm guessing they'll fail their next runs, but I'll wait to see confirmation of that before I do anything about the test bug. They should run the next time within the hour or hour and a half so I guess we will find out soon enough. /Mikael
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 22:19, Mikael Kjellström wrote: On 2019-01-17 22:16, Tom Lane wrote: For what it's worth I've enabled tap-tests for my OpenBSD 5.9 (curculio) and NetBSD 7 (sidewinder) animals now. Oh, thanks! I'm guessing they'll fail their next runs, but I'll wait to see confirmation of that before I do anything about the test bug. They should run the next time within the hour or hour and a half so I guess we will find out soon enough. Hm, that didn't go so well. It says: configure: error: Additional Perl modules are required to run TAP tests so how do I find out with Perl modules that are required? /Mikael
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: > It says: > configure: error: Additional Perl modules are required to run TAP tests > so how do I find out with Perl modules that are required? If you look into the configure log it should say just above that, but I'm betting you just need p5-IPC-Run. regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 22:42, Tom Lane wrote: =?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: It says: configure: error: Additional Perl modules are required to run TAP tests so how do I find out with Perl modules that are required? If you look into the configure log it should say just above that, but I'm betting you just need p5-IPC-Run. Yes it seems to be IPC::Run that is missing. I've installed it manually through CPAN. Let's see if it works better this time. /Mikael
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
I wrote: > I propose that we handle this case by adding a new DEPFLAG_IS_SUBOBJECT > flag to the column object's flags, denoting that we know the whole table > will be dropped later. The only effect of this flag is to suppress > reporting of the column object in reportDependentObjects. Here's a proposed patch for that bit. As expected, it seems to eliminate the variation in number-of-cascaded-drops-reported under ignore_system_indexes. I do not see any regression outputs change otherwise. regards, tom lane diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 3529084..6b8c735 100644 *** a/src/backend/catalog/dependency.c --- b/src/backend/catalog/dependency.c *** typedef struct *** 102,107 --- 102,108 #define DEPFLAG_INTERNAL 0x0008 /* reached via internal dependency */ #define DEPFLAG_EXTENSION 0x0010 /* reached via extension dependency */ #define DEPFLAG_REVERSE 0x0020 /* reverse internal/extension link */ + #define DEPFLAG_SUBOBJECT 0x0040 /* subobject of another deletable object */ /* expansible list of ObjectAddresses */ *** reportDependentObjects(const ObjectAddre *** 886,891 --- 887,896 if (extra->flags & DEPFLAG_ORIGINAL) continue; + /* Also ignore sub-objects; we'll report the whole object elsewhere */ + if (extra->flags & DEPFLAG_SUBOBJECT) + continue; + objDesc = getObjectDescription(obj); /* *** object_address_present_add_flags(const O *** 2320,2332 * DROP COLUMN action even though we know we're gonna delete * the table later. * * Because there could be other subobjects of this object in * the array, this case means we always have to loop through * the whole array; we cannot exit early on a match. */ ObjectAddressExtra *thisextra = addrs->extras + i; ! thisextra->flags |= flags; } } } --- 2325,2343 * DROP COLUMN action even though we know we're gonna delete * the table later. * + * What we can do, though, is mark this as a subobject so that + * we don't report it separately, which is confusing because + * it's unpredictable whether it happens or not. But do so + * only if flags != 0 (flags == 0 is a read-only probe). + * * Because there could be other subobjects of this object in * the array, this case means we always have to loop through * the whole array; we cannot exit early on a match. */ ObjectAddressExtra *thisextra = addrs->extras + i; ! if (flags) ! thisextra->flags |= (flags | DEPFLAG_SUBOBJECT); } } } *** stack_address_present_add_flags(const Ob *** 2374,2380 * object_address_present_add_flags(), we should propagate * flags for the whole object to each of its subobjects. */ ! stackptr->flags |= flags; } } } --- 2385,2392 * object_address_present_add_flags(), we should propagate * flags for the whole object to each of its subobjects. */ ! if (flags) ! stackptr->flags |= (flags | DEPFLAG_SUBOBJECT); } } }
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 22:47, Mikael Kjellström wrote: On 2019-01-17 22:42, Tom Lane wrote: =?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: It says: configure: error: Additional Perl modules are required to run TAP tests so how do I find out with Perl modules that are required? If you look into the configure log it should say just above that, but I'm betting you just need p5-IPC-Run. Yes it seems to be IPC::Run that is missing. I've installed it manually through CPAN. Let's see if it works better this time. Hmmm, nope: == pgsql.build/src/bin/pg_ctl/tmp_check/log/003_promote_standby.log === 2019-01-17 23:09:20.343 CET [9129] LOG: listening on Unix socket "/tmp/g66P1fpMFK/.s.PGSQL.64980" 2019-01-17 23:09:20.343 CET [9129] FATAL: could not create semaphores: No space left on device 2019-01-17 23:09:20.343 CET [9129] DETAIL: Failed system call was semget(64980002, 17, 03600). 2019-01-17 23:09:20.343 CET [9129] HINT: This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter. The PostgreSQL documentation contains more information about configuring your system for PostgreSQL. 2019-01-17 23:09:20.345 CET [9129] LOG: database system is shut down will try and increase SEMMNI and see if that helps. /Mikael
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On 2019-Jan-17, Tom Lane wrote: > DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code > has no hesitation about making multiple entries of that kind. After > rather cursorily looking at that code, I'm leaning to the position > that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be > nuked from orbit. In the cases where it's being used, such as > partitioned indexes, I think that probably the right design is one > DEPENDENCY_INTERNAL dependency on the partition master index, and > then one DEPENDENCY_AUTO dependency on the matching partitioned table. As I recall, the problem with that approach is that you can't drop the partition when a partitioned index exists, because it follows the link to the parent index and tries to drop that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: >> Let's see if it works better this time. > Hmmm, nope: > 2019-01-17 23:09:20.343 CET [9129] FATAL: could not create semaphores: > No space left on device Yeah, you might've been able to get by with OpenBSD/NetBSD's default semaphore settings before, but they really only let one postmaster run at a time; and the TAP tests want to start more than one. For me it seems to work to append this to /etc/sysctl.conf: kern.seminfo.semmni=100 kern.seminfo.semmns=2000 and either reboot, or install those settings manually with sysctl. regards, tom lane
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Alvaro Herrera writes: > On 2019-Jan-17, Tom Lane wrote: >> DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code >> has no hesitation about making multiple entries of that kind. After >> rather cursorily looking at that code, I'm leaning to the position >> that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be >> nuked from orbit. In the cases where it's being used, such as >> partitioned indexes, I think that probably the right design is one >> DEPENDENCY_INTERNAL dependency on the partition master index, and >> then one DEPENDENCY_AUTO dependency on the matching partitioned table. > As I recall, the problem with that approach is that you can't drop the > partition when a partitioned index exists, because it follows the link > to the parent index and tries to drop that. Hm. Still, I can't believe that it's appropriate for a partitioned index to have exactly the same kind of dependency on the master index as it does on the associated table. regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 23:23, Tom Lane wrote: Yeah, you might've been able to get by with OpenBSD/NetBSD's default semaphore settings before, but they really only let one postmaster run at a time; and the TAP tests want to start more than one. For me it seems to work to append this to /etc/sysctl.conf: kern.seminfo.semmni=100 kern.seminfo.semmns=2000 and either reboot, or install those settings manually with sysctl. Looks that way. I've increased the values and rebooted the machines. Let's hope 5th time is the charm :-) /Mikael
Re: Feature: temporary materialized views
On 1/11/19 8:47 PM, Mitar wrote: Thanks for doing the review! I did some functional testing today and everything seems to work as expected other than that the tab completion for psql seems to be missing. Andreas
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On 2019-Jan-17, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jan-17, Tom Lane wrote: > >> DEPENDENCY_INTERNAL_AUTO, however, broke this completely, as the code > >> has no hesitation about making multiple entries of that kind. After > >> rather cursorily looking at that code, I'm leaning to the position > >> that DEPENDENCY_INTERNAL_AUTO is broken-by-design and needs to be > >> nuked from orbit. In the cases where it's being used, such as > >> partitioned indexes, I think that probably the right design is one > >> DEPENDENCY_INTERNAL dependency on the partition master index, and > >> then one DEPENDENCY_AUTO dependency on the matching partitioned table. > > > As I recall, the problem with that approach is that you can't drop the > > partition when a partitioned index exists, because it follows the link > > to the parent index and tries to drop that. > > Hm. Still, I can't believe that it's appropriate for a partitioned index > to have exactly the same kind of dependency on the master index as it > does on the associated table. So there are three necessary features: * The partition can be dropped, which takes down the index partition * The master index can be dropped, which takes down the index partition * The index partition must never be allowed to be dropped on its own. There is a symmetry to these that led me to have the same kind of dependency from the index partition to the other two. I'm now wondering if it would work to have the one from index partition to table partition as DEPENDENCY_AUTO and the one from index partition to parent index as DEPENDENCY_INTERNAL_AUTO. I stared at a lot of possible dependency graphs, but I'm not sure if this one was one of them. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On 18/01/2019 08:48, Bruce Momjian wrote: On Thu, Jan 17, 2019 at 11:33:35AM -0500, Robert Haas wrote: The flaw in your thinking, as it seems to me, is that in your concern for "the likelihood that cache flushes will simply remove entries we'll soon have to rebuild," you're apparently unwilling to consider the possibility of workloads where cache flushes will remove entries we *won't* soon have to rebuild. Every time that issue gets raised, you seem to blow it off as if it were not a thing that really happens. I can't make sense of that position. Is it really so hard to imagine a connection pooler that switches the same connection back and forth between two applications with different working sets? Or a system that keeps persistent connections open even when they are idle? Do you really believe that a connection that has not accessed a cache entry in 10 minutes still derives more benefit from that cache entry than it would from freeing up some memory? Well, I think everyone agrees there are workloads that cause undesired cache bloat. What we have not found is a solution that doesn't cause code complexity or undesired overhead, or one that >1% of users will know how to use. Unfortunately, because we have not found something we are happy with, we have done nothing. I agree LRU can be expensive. What if we do some kind of clock sweep and expiration like we do for shared buffers? I think the trick is figuring how frequently to do the sweep. What if we mark entries as unused every 10 queries, mark them as used on first use, and delete cache entries that have not be used in the past 10 queries. If you take that approach, then this number should be configurable. What if I had 12 common queries I used in rotation? The ARM3 processor cache logic was to simply eject an entry at random, as the obviously Acorn felt that the silicon required to have a more sophisticated algorithm would reduce the cache size too much! I upgraded my Acorn Archimedes that had an 8MHZ bus, from an 8MHz ARM2 to a 25MZ ARM3. that is a clock rate improvement of about 3 times. However BASIC programs ran about 7 times faster, which I put down to the ARM3 having a cache. Obviously for Postgres this is not directly relevant, but I think it suggests that it may be worth considering replacing cache items at random. As there are no pathological corner cases, and the logic is very simple. Cheers, Gavin
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 23:37, Mikael Kjellström wrote: On 2019-01-17 23:23, Tom Lane wrote: Yeah, you might've been able to get by with OpenBSD/NetBSD's default semaphore settings before, but they really only let one postmaster run at a time; and the TAP tests want to start more than one. For me it seems to work to append this to /etc/sysctl.conf: kern.seminfo.semmni=100 kern.seminfo.semmns=2000 and either reboot, or install those settings manually with sysctl. Looks that way. I've increased the values and rebooted the machines. Let's hope 5th time is the charm :-) Nope! But it looks like in NetBSD the options are called: netbsd7-pgbf# sysctl -a | grep semmn kern.ipc.semmni = 10 kern.ipc.semmns = 60 kern.ipc.semmnu = 30 so I will try and set that in /etc/sysctl.conf and reboot and see what happens. /Mikael
Re: problems with foreign keys on partitioned tables
On 2019-Jan-09, Amit Langote wrote: > 1. Foreign keys of partitions stop working correctly after being detached > from the parent table > This happens because the action triggers defined on the PK relation (pk) > refers to p as the referencing relation. On detaching p1 from p, p1's > data is no longer accessible to that trigger. Ouch. > To fix this problem, we need create action triggers on PK relation > that refer to p1 when it's detached (unless such triggers already > exist which might be true in some cases). Attached patch 0001 shows > this approach. Hmm, okay. I'm not in love with the idea that such triggers might already exist -- seems unclean. We should remove redundant action triggers when we attach a table as a partition, no? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-17 23:54, Mikael Kjellström wrote: But it looks like in NetBSD the options are called: netbsd7-pgbf# sysctl -a | grep semmn kern.ipc.semmni = 10 kern.ipc.semmns = 60 kern.ipc.semmnu = 30 so I will try and set that in /etc/sysctl.conf and reboot and see what happens. That seems to have done the trick: netbsd7-pgbf# sysctl -a | grep semmn kern.ipc.semmni = 100 kern.ipc.semmns = 2000 kern.ipc.semmnu = 30 I just started another run on sidewinder (NetBSD 7), let's see how that goes. but the OpenBSD machine went further and now fails on: pgbenchCheck instead. Is that the failure you expected to get? /Mikael
Re: Libpq support to connect to standby server as priority
> On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii wrote: > >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] >> >> But pg_is_in_recovery() returns true even for a promoting standby. So >> >> you have to wait and retry to send pg_is_in_recovery() until it >> >> finishes the promotion to find out it is now a primary. I am not sure >> >> if backend out to be responsible for this process. If not, libpq would >> >> need to handle it but I doubt it would be possible. >> > >> > Yes, the application needs to retry connection attempts until success. >> That's not different from PgJDBC and other DBMSs. >> >> I don't know what PgJDBC is doing, however I think libpq needs to do >> more than just retrying. >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If >>found, then we assume that is the primary. We also assume that >>other nodes are standbys. done. >> >> 2) If there's no node on which pg_is_in_recovery() returns false, then >>we need to retry until we find it. To not retry forever, there >>should be a timeout counter parameter. >> >> > IIRC this is essentially what pgJDBC does. Thanks for clarifying that. Pgpool-II also does that too. Seems like a common technique to find out a primary node. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-18 00:00, Mikael Kjellström wrote: I just started another run on sidewinder (NetBSD 7), let's see how that goes. but the OpenBSD machine went further and now fails on: pgbenchCheck instead. Is that the failure you expected to get? And now also the NetBSD machine failed on pgbenchCheck. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2019-01-17%2022%3A57%3A14 should I leave it as it is for now? /Mikael
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: >> But it looks like in NetBSD the options are called: Sorry about that, I copied-and-pasted from the openbsd machine I was looking at without remembering that netbsd is just a shade different. > but the OpenBSD machine went further and now fails on: > pgbenchCheck instead. > Is that the failure you expected to get? Yup, sure is. Thanks! regards, tom lane
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Alvaro Herrera writes: > On 2019-Jan-17, Tom Lane wrote: >> Hm. Still, I can't believe that it's appropriate for a partitioned index >> to have exactly the same kind of dependency on the master index as it >> does on the associated table. > So there are three necessary features: > * The partition can be dropped, which takes down the index partition > * The master index can be dropped, which takes down the index partition > * The index partition must never be allowed to be dropped on its own. > There is a symmetry to these that led me to have the same kind of > dependency from the index partition to the other two. It's symmetric as long as you suppose that the above are the only requirements. However, there's another requirement, which is that if you do try to drop the index partition directly, we would like the error message to suggest dropping the master index, not the table. The only way to be sure about what will be suggested is if there can be only one "owning object". Also, I am suspicious that this implementation fails on point 3 anyway. It looks to me like if a recursive drop reaches the index partition from a dependency other than either the table partition or the master index, it will let the index partition be dropped by itself. Ordinarily, this would likely not matter because the index partition would share any other dependencies (opclasses, for instance) with one or the other owning object. But I don't think it's too hard to construct cases where that's not true. If nothing else, it looks like ALTER OBJECT DEPENDS ON EXTENSION can be used to attach a random dependency to just about anything. regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: > And now also the NetBSD machine failed on pgbenchCheck. Indeed, as expected. > should I leave it as it is for now? Please. I'll push a fix for the broken test case in a bit --- I just wanted to confirm that somebody else's machines agreed that it's broken. regards, tom lane
Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead
You introduced new macros IsHeapRelation and IsViewRelation, but I don't want to introduce such API. Such things have been heavily contested and I don't to have one more thing to worry about for this patch, so please just put the relkind directly in the code. On 2019-Jan-07, Nikolay Shaplov wrote: > I also reworked some comments. BTW do you know what is this comments spoke > about: > > * RelationGetFillFactor() and RelationGetTargetPageFreeSpace() can only > > * be applied to relations that use this format or a superset for > > * private options data. > > It is speaking about some old times when there can be some other type of > options? 'cause I do not understand what it is speaking about. > I've removed it, I hope I did not remove anything important. As I understand it's talking about the varlenas being StdRdOptions and not anything else. I think extensibility could cause some relkinds to use different options. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On Thu, Jan 17, 2019 at 3:20 PM Tom Lane wrote: > Alvaro Herrera writes: > > There is a symmetry to these that led me to have the same kind of > > dependency from the index partition to the other two. > > It's symmetric as long as you suppose that the above are the only > requirements. However, there's another requirement, which is that > if you do try to drop the index partition directly, we would like > the error message to suggest dropping the master index, not the > table. The only way to be sure about what will be suggested is > if there can be only one "owning object". +1. This is certainly a necessary requirement. Absurd error messages are not okay. -- Peter Geoghegan
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-01-18 00:31, Tom Lane wrote: =?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: And now also the NetBSD machine failed on pgbenchCheck. Indeed, as expected. Ok. should I leave it as it is for now? Please. I'll push a fix for the broken test case in a bit --- I just wanted to confirm that somebody else's machines agreed that it's broken. Ok, I will leave it on then. /Mikael
Re: draft patch for strtof()
> "Andrew" == Andrew Gierth writes: Andrew> Because it turns out that Windows (at least the version running Andrew> on Appveyor) completely fucks this up; strtof() is apparently Andrew> returning infinity or zero _without setting errno_ for values Andrew> out of range for float: input of "10e70" returns +inf with no Andrew> error, input of "10e-70" returns (exactly) 0.0 with no error. This bug turns out to be dependent on compiler/SDK versions, not surprisingly. So far I have figured out how to invoke these combinations on appveyor: VS2013 / SDK 7.1 (as per cfbot): fails VS2015 / SDK 8.1: works Trying to figure out how to get other combinations to test. -- Andrew (irc:RhodiumToad)
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
I wrote: > Also, I am suspicious that this implementation fails on point 3 > anyway ... If nothing else, it looks like ALTER OBJECT DEPENDS ON > EXTENSION can be used to attach a random dependency to just > about anything. Yup: regression=# drop table if exists idxpart cascade; DROP TABLE regression=# create table idxpart (a int) partition by range (a); CREATE TABLE regression=# create index on idxpart (a); CREATE INDEX regression=# create table idxpart1 partition of idxpart for values from (0) to (10); CREATE TABLE regression=# drop index idxpart1_a_idx; -- no way ERROR: cannot drop index idxpart1_a_idx because index idxpart_a_idx requires it HINT: You can drop index idxpart_a_idx instead. regression=# \d idxpart1 Table "public.idxpart1" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | Partition of: idxpart FOR VALUES FROM (0) TO (10) Indexes: "idxpart1_a_idx" btree (a) regression=# create extension cube; CREATE EXTENSION regression=# alter index idxpart1_a_idx depends on extension cube; ALTER INDEX regression=# drop extension cube; DROP EXTENSION regression=# \d idxpart1 Table "public.idxpart1" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | Partition of: idxpart FOR VALUES FROM (0) TO (10) regards, tom lane
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii wrote: > > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii wrote: > > > >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > >> >> But pg_is_in_recovery() returns true even for a promoting standby. So > >> >> you have to wait and retry to send pg_is_in_recovery() until it > >> >> finishes the promotion to find out it is now a primary. I am not sure > >> >> if backend out to be responsible for this process. If not, libpq > would > >> >> need to handle it but I doubt it would be possible. > >> > > >> > Yes, the application needs to retry connection attempts until success. > >> That's not different from PgJDBC and other DBMSs. > >> > >> I don't know what PgJDBC is doing, however I think libpq needs to do > >> more than just retrying. > >> > >> 1) Try to find a node on which pg_is_in_recovery() returns false. If > >>found, then we assume that is the primary. We also assume that > >>other nodes are standbys. done. > >> > >> 2) If there's no node on which pg_is_in_recovery() returns false, then > >>we need to retry until we find it. To not retry forever, there > >>should be a timeout counter parameter. > >> > >> > > IIRC this is essentially what pgJDBC does. > > Thanks for clarifying that. Pgpool-II also does that too. Seems like a > common technique to find out a primary node. > > Checking the code I see we actually use show transaction_read_only. Sorry for the confusion Dave Cramer da...@postgresintl.com www.postgresintl.com > >
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On Thu, Jan 17, 2019 at 12:42 PM Tom Lane wrote: > So I poked around for awhile with running the regression tests under > ignore_system_indexes. There seem to be a number of issues involved > here. To a significant extent, they aren't bugs, at least not according > to the original conception of the dependency code: it was not a design > goal that different dependencies of the same object-to-be-deleted would > be processed in a fixed order. I agree that it's the exceptional cases that are of concern here. The vast majority of the changes you'll see with "ignore_system_indexes=on" are noise. > Now, perhaps we should make such stability a design goal, as it'd allow > us to get rid of some "suppress the cascade outputs" hacks in the > regression tests. But it's a bit of a new feature. If we wanted to > do that, I'd be inclined to do it by absorbing all the pg_depend entries > for a particular object into an ObjectAddress array and then sorting > them before we process them. The main stumbling block here is "what > would the sort order be?". The best idea I can come up with offhand > is to sort by OID, which at least for regression test purposes would > mean objects would be listed/processed more or less in creation order. I think that we might as well have a stable order. Maybe an explicit sort step is unnecessary -- we can actually rely on scan order, while accepting you'll get a different order with "ignore_system_indexes=on" (though without getting substantively different/incorrect messages). I'm slightly concerned that an explicit sort step might present difficulties in extreme cases. How much memory are we prepared to allocate, just to get a stable order? It probably won't really matter what the specific order is, once the current problems (the DEPENDENCY_INTERNAL_AUTO issue and the issue you'll fix with DEPFLAG_IS_SUBOBJECT) are handled in a direct manner. As I've pointed out a couple of times already, we can add a 4 byte tie-breaker column to both pg_depend indexes without increasing the size of the on-disk representation, since the extra space is already lost to alignment (we could even add a new 4 byte column to the table without any storage overhead, if that happened to make sense). What is the likelihood that somebody will ever find a better use for this alignment padding? These two indexes are typically the largest system catalog indexes by far, so the opportunity cost matters. I don't think that the direct cost (more cycles) is worth worrying about, though. Nobody has added a pg_depend column since it was first introduced back in 2002. -- Peter Geoghegan
RE: Libpq support to connect to standby server as priority
From: Dave Cramer [mailto:p...@fastcrypt.com] > >> 2) If there's no node on which pg_is_in_recovery() returns false, > then > >>we need to retry until we find it. To not retry forever, there > >>should be a timeout counter parameter. > Checking the code I see we actually use show transaction_read_only. Also, does PgJDBC really repeat connection attempts for a user-specified duration? Having a quick look at the code, it seemed to try each host once in a while loop. Regards Takayuki Tsunakawa
Re: Libpq support to connect to standby server as priority
> On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii wrote: > >> > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii wrote: >> > >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] >> >> >> But pg_is_in_recovery() returns true even for a promoting standby. So >> >> >> you have to wait and retry to send pg_is_in_recovery() until it >> >> >> finishes the promotion to find out it is now a primary. I am not sure >> >> >> if backend out to be responsible for this process. If not, libpq >> would >> >> >> need to handle it but I doubt it would be possible. >> >> > >> >> > Yes, the application needs to retry connection attempts until success. >> >> That's not different from PgJDBC and other DBMSs. >> >> >> >> I don't know what PgJDBC is doing, however I think libpq needs to do >> >> more than just retrying. >> >> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If >> >>found, then we assume that is the primary. We also assume that >> >>other nodes are standbys. done. >> >> >> >> 2) If there's no node on which pg_is_in_recovery() returns false, then >> >>we need to retry until we find it. To not retry forever, there >> >>should be a timeout counter parameter. >> >> >> >> >> > IIRC this is essentially what pgJDBC does. >> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like a >> common technique to find out a primary node. >> >> > Checking the code I see we actually use show transaction_read_only. > > Sorry for the confusion So if all PostgreSQL servers returns transaction_read_only = on, how does pgJDBC find the primary node? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
Fabien COELHO writes: >> I am, TBH, inclined to fix this by removing that test case rather >> than teaching it another spelling to accept. I think it's very >> hard to make the case that tests like this one are anything but >> a waste of developer and buildfarm time. When they are also a >> portability hazard, it's time to cut our losses. (I also note >> that this test has caused us problems before, cf 869aa40a2 and >> 933851033.) > I'd rather keep it by simply adding the "|unknown" alternative. 30 years > of programming have taught me that testing limit & error cases is useful, > although you never know when it will be proven so. Sorry, I don't buy this line of argument. Reasonable test design requires making cost/benefit tradeoffs: the cost to run the test over and over, and the cost to maintain the test itself (e.g. fix portability issues in it) have to be balanced against the probability of it finding something useful. I judge that the chance of this particular test finding something is small, and I've had quite enough of the maintenance costs. Just to point up that we're still not clearly done with the maintenance costs of supposing that we know how every version of getopt_long will spell this error message, I note that my Linux box seems to have two variants of it: $ pgbench -z pgbench: invalid option -- 'z' Try "pgbench --help" for more information. $ pgbench --z pgbench: unrecognized option '--z' Try "pgbench --help" for more information. of which the "invalid" alternative is also not in our list right now. Who's to say how many more versions of getopt_long are out there, or what the maintainers thereof might do in the future? > I agree that some tests can be useless, but I do not think that it applies > to this one. This test also checks that under a bad option pgbench stops > with an appropriate 1 exit status. It's possible that it's worth the trouble to check for exit status 1, but I entirely fail to see the point of checking exactly what is the spelling of a message that is issued by code not under our control. Looking closer at the test case: [ 'bad option', '-h home -p 5432 -U calvin -d --bad-option', [ qr{(unrecognized|illegal) option}, qr{--help.*more information} ] ], ISTM that just removing the first qr{} pattern, and checking only that we get the text that *is* under our control, is a reasonable compromise here. regards, tom lane
Re: pgsql: Remove references to Majordomo
On Thu, Jan 17, 2019 at 01:04:44PM +, Magnus Hagander wrote: > Remove references to Majordomo > > Lists are not handled by Majordomo anymore and haven't been for a while, > so remove the reference and instead direct people to the list server. Wouldn't it be better to also switch the references to pgsql-bugs in all the C code for the different --help outputs? -- Michael signature.asc Description: PGP signature
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 19:15, Tatsuo Ishii wrote: > > On Thu, 17 Jan 2019 at 18:03, Tatsuo Ishii wrote: > > > >> > On Wed, 16 Jan 2019 at 01:02, Tatsuo Ishii > wrote: > >> > > >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > >> >> >> But pg_is_in_recovery() returns true even for a promoting > standby. So > >> >> >> you have to wait and retry to send pg_is_in_recovery() until it > >> >> >> finishes the promotion to find out it is now a primary. I am not > sure > >> >> >> if backend out to be responsible for this process. If not, libpq > >> would > >> >> >> need to handle it but I doubt it would be possible. > >> >> > > >> >> > Yes, the application needs to retry connection attempts until > success. > >> >> That's not different from PgJDBC and other DBMSs. > >> >> > >> >> I don't know what PgJDBC is doing, however I think libpq needs to do > >> >> more than just retrying. > >> >> > >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If > >> >>found, then we assume that is the primary. We also assume that > >> >>other nodes are standbys. done. > >> >> > >> >> 2) If there's no node on which pg_is_in_recovery() returns false, > then > >> >>we need to retry until we find it. To not retry forever, there > >> >>should be a timeout counter parameter. > >> >> > >> >> > >> > IIRC this is essentially what pgJDBC does. > >> > >> Thanks for clarifying that. Pgpool-II also does that too. Seems like a > >> common technique to find out a primary node. > >> > >> > > Checking the code I see we actually use show transaction_read_only. > > > > Sorry for the confusion > > So if all PostgreSQL servers returns transaction_read_only = on, how > does pgJDBC find the primary node? > > well preferSecondary would return a connection. I'm curious; under what circumstances would the above occur? Regards, Dave
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 19:09, Tsunakawa, Takayuki < tsunakawa.ta...@jp.fujitsu.com> wrote: > From: Dave Cramer [mailto:p...@fastcrypt.com] > > >> 2) If there's no node on which pg_is_in_recovery() returns > false, > > then > > >>we need to retry until we find it. To not retry forever, > there > > >>should be a timeout counter parameter. > > > Checking the code I see we actually use show transaction_read_only. > > Also, does PgJDBC really repeat connection attempts for a user-specified > duration? Having a quick look at the code, it seemed to try each host once > in a while loop. > You are correct looking at the code again. On the initial connection attempt we only try once. Dave Cramer da...@postgresintl.com www.postgresintl.com > >
Re: Libpq support to connect to standby server as priority
>> >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] >> >> >> >> But pg_is_in_recovery() returns true even for a promoting >> standby. So >> >> >> >> you have to wait and retry to send pg_is_in_recovery() until it >> >> >> >> finishes the promotion to find out it is now a primary. I am not >> sure >> >> >> >> if backend out to be responsible for this process. If not, libpq >> >> would >> >> >> >> need to handle it but I doubt it would be possible. >> >> >> > >> >> >> > Yes, the application needs to retry connection attempts until >> success. >> >> >> That's not different from PgJDBC and other DBMSs. >> >> >> >> >> >> I don't know what PgJDBC is doing, however I think libpq needs to do >> >> >> more than just retrying. >> >> >> >> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. If >> >> >>found, then we assume that is the primary. We also assume that >> >> >>other nodes are standbys. done. >> >> >> >> >> >> 2) If there's no node on which pg_is_in_recovery() returns false, >> then >> >> >>we need to retry until we find it. To not retry forever, there >> >> >>should be a timeout counter parameter. >> >> >> >> >> >> >> >> > IIRC this is essentially what pgJDBC does. >> >> >> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like a >> >> common technique to find out a primary node. >> >> >> >> >> > Checking the code I see we actually use show transaction_read_only. >> > >> > Sorry for the confusion >> >> So if all PostgreSQL servers returns transaction_read_only = on, how >> does pgJDBC find the primary node? >> >> well preferSecondary would return a connection. This is not my message :-) > I'm curious; under what circumstances would the above occur? Former primary goes down and one of standbys is promoting but it is not promoted to new primary yet. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 19:38, Tatsuo Ishii wrote: > >> >> >> > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > >> >> >> >> But pg_is_in_recovery() returns true even for a promoting > >> standby. So > >> >> >> >> you have to wait and retry to send pg_is_in_recovery() until it > >> >> >> >> finishes the promotion to find out it is now a primary. I am > not > >> sure > >> >> >> >> if backend out to be responsible for this process. If not, > libpq > >> >> would > >> >> >> >> need to handle it but I doubt it would be possible. > >> >> >> > > >> >> >> > Yes, the application needs to retry connection attempts until > >> success. > >> >> >> That's not different from PgJDBC and other DBMSs. > >> >> >> > >> >> >> I don't know what PgJDBC is doing, however I think libpq needs to > do > >> >> >> more than just retrying. > >> >> >> > >> >> >> 1) Try to find a node on which pg_is_in_recovery() returns false. > If > >> >> >>found, then we assume that is the primary. We also assume that > >> >> >>other nodes are standbys. done. > >> >> >> > >> >> >> 2) If there's no node on which pg_is_in_recovery() returns false, > >> then > >> >> >>we need to retry until we find it. To not retry forever, there > >> >> >>should be a timeout counter parameter. > >> >> >> > >> >> >> > >> >> > IIRC this is essentially what pgJDBC does. > >> >> > >> >> Thanks for clarifying that. Pgpool-II also does that too. Seems like > a > >> >> common technique to find out a primary node. > >> >> > >> >> > >> > Checking the code I see we actually use show transaction_read_only. > >> > > >> > Sorry for the confusion > >> > >> So if all PostgreSQL servers returns transaction_read_only = on, how > >> does pgJDBC find the primary node? > >> > >> well preferSecondary would return a connection. > > This is not my message :-) > > > I'm curious; under what circumstances would the above occur? > > Former primary goes down and one of standbys is promoting but it is > not promoted to new primary yet. > seems like JDBC might have some work to do...Thanks I'm going to wait to implement until we resolve this discussion Dave > >
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Peter Geoghegan writes: > On Thu, Jan 17, 2019 at 12:42 PM Tom Lane wrote: >> Now, perhaps we should make such stability a design goal, as it'd allow >> us to get rid of some "suppress the cascade outputs" hacks in the >> regression tests. But it's a bit of a new feature. If we wanted to >> do that, I'd be inclined to do it by absorbing all the pg_depend entries >> for a particular object into an ObjectAddress array and then sorting >> them before we process them. The main stumbling block here is "what >> would the sort order be?". The best idea I can come up with offhand >> is to sort by OID, which at least for regression test purposes would >> mean objects would be listed/processed more or less in creation order. > I think that we might as well have a stable order. Maybe an explicit > sort step is unnecessary -- we can actually rely on scan order, while > accepting you'll get a different order with "ignore_system_indexes=on" > (though without getting substantively different/incorrect messages). Yeah, that's the policy we've followed so far, but I remain concerned about its effects on the regression tests. There are a lot of places where we print full DROP CASCADE output because "it hasn't failed yet". I fear every one of those is a gotcha that's waiting to bite us. Also, is index scan order really guaranteed for equal-keyed items? It isn't today, and I didn't think your patches were going to make it so. > I'm slightly concerned that an explicit sort step might present > difficulties in extreme cases. How much memory are we prepared to > allocate, just to get a stable order? We're going to stick all these items into an ObjectAddress array anyway, so at worst it'd be 2X growth, most likely a lot less since we'd only be sorting one level of dependency at a time. > As I've pointed out a couple of times already, we can add a 4 byte > tie-breaker column to both pg_depend indexes without increasing the > size of the on-disk representation, since the extra space is already > lost to alignment (we could even add a new 4 byte column to the table > without any storage overhead, if that happened to make sense). Meh ... where do you get the 4-byte value from? regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On Thu, Jan 17, 2019 at 07:21:08PM -0500, Tom Lane wrote: > Sorry, I don't buy this line of argument. Reasonable test design requires > making cost/benefit tradeoffs: the cost to run the test over and over, > and the cost to maintain the test itself (e.g. fix portability issues in > it) have to be balanced against the probability of it finding something > useful. I judge that the chance of this particular test finding something > is small, and I've had quite enough of the maintenance costs. Yes, I agree with Tom's line of thoughts here. It seems to me that just dropping this part of the test is just but fine. -- Michael signature.asc Description: PGP signature
Re: Libpq support to connect to standby server as priority
>> > I'm curious; under what circumstances would the above occur? >> >> Former primary goes down and one of standbys is promoting but it is >> not promoted to new primary yet. >> > > seems like JDBC might have some work to do...Thanks > > I'm going to wait to implement until we resolve this discussion If you need some input from me regarding finding a primary node, please say so. While working on Pgpool-II project, I learned the necessity in a hard way. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Libpq support to connect to standby server as priority
On Thu, 17 Jan 2019 at 19:56, Tatsuo Ishii wrote: > >> > I'm curious; under what circumstances would the above occur? > >> > >> Former primary goes down and one of standbys is promoting but it is > >> not promoted to new primary yet. > >> > > > > seems like JDBC might have some work to do...Thanks > > > > I'm going to wait to implement until we resolve this discussion > > If you need some input from me regarding finding a primary node, > please say so. While working on Pgpool-II project, I learned the > necessity in a hard way. > > I would really like to have a consistent way of doing this, and consistent terms for the connection parameters. that said yes, I would like input from you. Thanks, Dave
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On Thu, Jan 17, 2019 at 4:40 PM Tom Lane wrote: > Yeah, that's the policy we've followed so far, but I remain concerned > about its effects on the regression tests. There are a lot of places > where we print full DROP CASCADE output because "it hasn't failed yet". > I fear every one of those is a gotcha that's waiting to bite us. Right. I don't want to have to play whack-a-mole with this later on, and risk suppressing the wrong thing. > Also, is index scan order really guaranteed for equal-keyed items? > It isn't today, and I didn't think your patches were going to make > it so. The idea is that you'd have an extra column, so they wouldn't be equal-keyed (the keys that the scan was interested in would be duplicated, but we could still rely on the ordering). Are you suggesting that there'd be a stability issue regardless? My patch does guarantee order for equal-keyed items, since heap TID is treated as just another attribute, at least as far as the nbtree key space is concerned. The big unknown is how exactly that works out when the regression tests are run in parallel, on a busy system, since heap TID isn't just another column outside of nbtree. I think that this will probably cause test flappiness if we don't nail it down now. That was certainly my experience before I nailed it down in my own tentative way. My sense is that without addressing this issue, we're going to be sensitive to concurrent heap TID recycling by VACUUM in a way that might become an annoyance. I see no reason to not go fix it once and for all, since the vast majority of all problems are around the two pg_depend indexes. > We're going to stick all these items into an ObjectAddress array anyway, > so at worst it'd be 2X growth, most likely a lot less since we'd only > be sorting one level of dependency at a time. It sounds like we don't have a good reason to not just sort them explicitly, then. I'm happy to go that way. I mostly just wanted to be sure that you were aware that we could add a tie-breaker column without any storage overhead. > > As I've pointed out a couple of times already, we can add a 4 byte > > tie-breaker column to both pg_depend indexes without increasing the > > size of the on-disk representation, since the extra space is already > > lost to alignment (we could even add a new 4 byte column to the table > > without any storage overhead, if that happened to make sense). > > Meh ... where do you get the 4-byte value from? In the kludgey patch that I posted, the 4-byte value is manufactured artificially within a backend in descending order. That may have a slight advantage over object oid, even after the pg_depend correctness issues are addressed. A fixed order within a backend or originating transaction seems like it might avoid a few remaining instability issues. Not sure. I seem to recall there being some disagreement between you and Andres on this very point (is object oid a perfectly stable tie-breaker in practice?) on a similar thread from 2017. -- Peter Geoghegan
Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
On Thu, Jan 17, 2019 at 5:09 PM Peter Geoghegan wrote: > In the kludgey patch that I posted, the 4-byte value is manufactured > artificially within a backend in descending order. That may have a > slight advantage over object oid, even after the pg_depend correctness > issues are addressed. A fixed order within a backend or originating > transaction seems like it might avoid a few remaining instability > issues. Not sure. I seem to recall there being some disagreement > between you and Andres on this very point (is object oid a perfectly > stable tie-breaker in practice?) on a similar thread from 2017. Here are your remarks about it on that 2017 thread: https://www.postgresql.org/message-id/11852.1501610262%40sss.pgh.pa.us -- Peter Geoghegan
Re: [HACKERS] Block level parallel vacuum
On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada wrote: > > Rebased. > I started reviewing the patch, I didn't finish my review yet. Following are some of the comments. +PARALLEL N + + + Execute index vacuum and cleanup index in parallel with I doubt that user can understand the terms index vacuum and cleanup index. May be it needs some more detailed information. - VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */ + VACOPT_PARALLEL = 1 << 7, /* do lazy VACUUM in parallel */ + VACOPT_DISABLE_PAGE_SKIPPING = 1 << 8 /* don't skip any pages */ +} VacuumOptionFlag; Any specific reason behind not adding it as last member of the enum? -typedef enum VacuumOption +typedef enum VacuumOptionFlag { I don't find the new name quite good, how about VacuumFlags? +typedef struct VacuumOption +{ How about VacuumOptions? Because this structure can contains all the options provided to vacuum operation. + vacopt1->flags |= vacopt2->flags; + if (vacopt2->flags == VACOPT_PARALLEL) + vacopt1->nworkers = vacopt2->nworkers; + pfree(vacopt2); + $$ = vacopt1; + } As the above statement indicates the the last parallel number of workers is considered into the account, can we explain it in docs? postgres=# vacuum (parallel 2, verbose) tbl; With verbose, no parallel workers related information is available. I feel giving that information is required even when it is not parallel vacuum also. Regards, Haribabu Kommi Fujitsu Australia
Re: Feature: temporary materialized views
Hi! On Thu, Jan 17, 2019 at 9:53 AM Andreas Karlsson wrote: > > What is the stumbling block to just leaving that alone? > > I think the issue Mitar ran into is that the temporary materialized view > is created in the rStartup callback of the receiver which happens after > SECURITY_RESTRICTED_OPERATION is set in ExecCreateTableAs(), so the > creation of the view itself is denied. Yes, the error without that change is: ERROR: cannot create temporary table within security-restricted operation Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Re: Feature: temporary materialized views
Hi! On Thu, Jan 17, 2019 at 2:40 PM Andreas Karlsson wrote: > I did some functional testing today and everything seems to work as > expected other than that the tab completion for psql seems to be missing. Thanks. I can add those as soon as I figure how. :-) So what are next steps here besides tab autocompletion? It is OK to remove that security check? If I understand correctly, there are some general refactoring of code Tom is proposing, but I am not sure if I am able to do that/understand that. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m
Shouldn't current_schema() be at least PARALLEL RESTRICTED?
Hi all, While working on the recent issues with 2PC and temporary objects I have added a test case based on current_schema() for the first time in history, and the buildfarm complained about it, as mentioned here: https://www.postgresql.org/message-id/20190118005949.gd1...@paquier.xyz The has been causing crake and lapwing to complain about current_schema() failing to create a temporary schema, which can happen when invoked for the first time of a session: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-01-18%2000%3A34%3A20 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing&dt=2019-01-18%2001%3A20%3A01 Here is the problem: SET search_path TO 'pg_temp'; BEGIN; SELECT current_schema() ~ 'pg_temp' AS is_temp_schema; - is_temp_schema - - t -(1 row) - +ERROR: cannot create temporary tables during a parallel operation PREPARE TRANSACTION 'twophase_search'; -ERROR: cannot PREPARE a transaction that has operated on temporary namespace current_schemas() also has this problem. For now I have stabilized the test by making sure that non-parallel plans get used, which makes the buildfarm happy, still that's just a workaround in my opinion. One of the reasons why current_schema() can create temporary schemas is that there are string dependencies with search_path and the way sessions use it, hence it seems to me that it would be better to mark the function at least parallel restricted on HEAD and avoid any unstable behavior? Thoughts? -- Michael signature.asc Description: PGP signature
RE: Libpq support to connect to standby server as priority
From: Laurenz Albe [mailto:laurenz.a...@cybertec.at] > I think that transaction_read_only is good. > > If it is set to false, we are sure to be on a replication primary or > stand-alone server, which is enough to know for the load balancing use case. As Tatsuo-san said, setting default_transaction_read_only leads to a misjudgement of the primary. > I deem it unlikely that someone will set default_transaction_read_only to > FALSE and then complain that the feature is not working as expected, but > again > I cannot prove that claim. I wonder what default_transaction_read_only exists for. For maing the database by default and allowing only specific users to write to the database with "CREATE/ALTER USER SET default_transaction_read_only = false"? I'm sorry to repeat myself, but anyway, I think we need a method to connect to a standby as the original desire, because the primary instance may be read only by default while only limited users update data. That's for reducing the burdon on the primary and minimizing the impact on users who update data. For example, * run data reporting on the standby * backup the database from the standby * cascade replication from the standby > As Robert said, transaction_read_only might even give you the option to > use the feature for more than just load balancing between replication master > and standby. What use case do you think of? If you want to load balance the workload between the primary and standbys, we can follow PgJDBC -- targetServerType=any. Regards Takayuki Tsunakawa
Fix function name in commet in vacuumlazy.c
Hi, Attached small patch fixes the function name heap_vacuum_rel in the comment. s/vacuum_heap_rel/heap_vacuum_rel/ Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_heap_vacuum_rel.patch Description: Binary data
Thread-unsafe coding in ecpg
I've found that a couple of different OpenBSD 6.4 machines fall over badly in the ecpg regression tests, with output like test sql/parser ... ok test thread/thread... stdout stderr FAILED (test process was terminated by signal 6: Abort trap) test thread/thread_implicit ... stdout FAILED (test process was terminated by signal 10: Bus error) test thread/prep ... ok (test process was terminated by signal 10: Bus error) test thread/alloc ... stderr FAILED (test process was terminated by signal 6: Abort trap) test thread/descriptor... ok It's somewhat variable as to which tests fail, but it's always thread tests. Examining the core dumps shows traces like #0 thrkill () at -:3 #1 0x0c04f427dd6e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51 #2 0x0c04f425f7e9 in wrterror (d=Variable "d" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:291 #3 0x0c04f42628fb in find_chunknum (d=Variable "d" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:1043 #4 0x0c04f425fe23 in ofree (argpool=Variable "argpool" is not available. ) at /usr/src/lib/libc/stdlib/malloc.c:1359 #5 0x0c04f425f8ec in free (ptr=0xc04df0e26e0) at /usr/src/lib/libc/stdlib/malloc.c:1419 #6 0x0c04f427ec83 in freegl (oldgl=0xc05a022d080) at /usr/src/lib/libc/locale/setlocale.c:32 #7 0x0c04f427eb49 in _libc_setlocale (category=4, locname=0xc059b605180 "C") at /usr/src/lib/libc/locale/setlocale.c:177 #8 0x0c0531a6f955 in ecpg_do_epilogue (stmt=0xc0587bb0c00) at execute.c:1986 #9 0x0c0531a6fa65 in ecpg_do (lineno=Variable "lineno" is not available. ) at execute.c:2018 #10 0x0c0531a6fb31 in ECPGdo (lineno=Variable "lineno" is not available. ) at execute.c:2037 #11 0x0c02a9f00b19 in test_thread (arg=Variable "arg" is not available. ) at thread.pgc:131 #12 0x0c04b180b26e in _rthread_start (v=Variable "v" is not available. ) at /usr/src/lib/librthread/rthread.c:96 #13 0x0c04f42ba77b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #14 0x in ?? () or this one: #0 _libc_strlcpy (dst=0x2c61e133ee0 "C", src=0xdfdfdfdfdfdfdfdf , dsize=256) at /usr/src/lib/libc/string/strlcpy.c:36 #1 0x02c61de99a71 in _libc_setlocale (category=4, locname=0x0) from /usr/lib/libc.so.92.5 #2 0x02c5ae2693a8 in ecpg_do_prologue (lineno=59, compat=0, force_indicator=1, connection_name=0x0, questionmarks=false, statement_type=ECPGst_execute, query=0x2c333701418 "i", args=0x2c61a96f6d0, stmt_out=0x2c61a96f5e0) at execute.c:1776 #3 0x02c5ae269a20 in ecpg_do (lineno=Variable "lineno" is not available. ) at execute.c:2001 #4 0x02c5ae269b31 in ECPGdo (lineno=Variable "lineno" is not available. ) at execute.c:2037 #5 0x02c333600b47 in fn (arg=Variable "arg" is not available. ) at prep.pgc:59 #6 0x02c56a00b26e in _rthread_start (v=Variable "v" is not available. ) at /usr/src/lib/librthread/rthread.c:96 #7 0x02c61ded577b in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75 #8 0x in ?? () The common denominator is always a call to setlocale(), and that generally is calling malloc or free or some other libc function that is unhappy. When output appears on stderr, it's usually free complaining about double-frees or some such. So my conclusion is that this version of setlocale() has some thread-safety issues. I was all set to go file a bug report when I noticed this in the POSIX spec for setlocale: The setlocale() function need not be thread-safe. as well as this: The global locale established using setlocale() shall only be used in threads for which no current locale has been set using uselocale() or whose current locale has been set to the global locale using uselocale(LC_GLOBAL_LOCALE). IOW, not only is setlocale() not necessarily thread-safe itself, but using it to change locales in a multithread program is seriously unsafe because of concurrent effects on other threads. Therefore, it's plain crazy for ecpg to be calling setlocale() inside threaded code. It looks to me like what ecpg is doing is trying to defend itself against non-C LC_NUMERIC settings, which is laudable, but this implementation of that is totally unsafe. Don't know what's the best way out of this. The simplest thing would be to just remove that code and document that you'd better run ecpg in LC_NUMERIC locale, but it'd be nice if we could do better. regards, tom lane
Re: current_logfiles not following group access and instead follows log_file_mode permissions
On Thu, Jan 17, 2019 at 5:49 AM Stephen Frost wrote: > Greetings, > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Stephen Frost writes: > > > * Michael Paquier (mich...@paquier.xyz) wrote: > > >> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote: > > >>> On reflection, maybe the problem is not that we're giving the file > > >>> the wrong permissions, but that we're putting it in the wrong place? > > > > > I'm not really sure putting it into the logfile directory is such a hot > > > idea as users might have set up external log file rotation of files in > > > that directory. Of course, in that case they'd probably signal PG > right > > > afterwards and PG would go write out a new file, but it still seems > > > pretty awkward. I'm not terribly against solving this issue that way > > > either though, but I tend to think the originally proposed patch is > more > > > sensible. > > > > I dunno, I think that the current design was made without any thought > > whatsoever about the log-files-outside-the-data-directory case. If > > you're trying to set things up that way, it's because you want to give > > logfile read access to people who shouldn't be able to look into the > > data directory proper. That makes current_logfiles pretty useless > > to such people, as it's now designed. > > ... or you just want to move the log files to a more sensible location > than the data directory. The justification for log_file_mode existing > is because you might want to have log files with different privileges, > but that's quite a different thing. > Thanks for sharing your opinions. The current_logfiles is used to store the meta data information of current writing log files, that is different to log files, so giving permissions of the log file may not be correct, > Now, if the expectation is that current_logfiles is just an internal > > working file that users shouldn't access directly, then this argument > > is wrong --- but then why is it documented in user-facing docs? > > I really couldn't say why it's documented in the user-facing docs, and > for my 2c I don't really think it should be- there's a function to get > that information. Sprinkling the data directory with files for users to > access directly doesn't exactly fit my view of what a good API looks > like. > > The fact that there isn't any discussion about where that file actually > lives does make me suspect you're right that log files outside the data > directory wasn't really contemplated. > I can only think of reading this file by the user directly when the server is not available, but I don't find any scenario where that is required? > > If we're going to accept the patch as-is, then it logically follows > > that we should de-document current_logfiles, because we're taking the > > position that it's an internal temporary file not meant for user access. > > ... and hopefully we'd get rid of it one day entirely. > If there is no use of it when server is offline, it will be better to remove that file with an alternative to provide the current log file name. With group access mode, the default value of log_file_mode is changed, Attached patch reflects the same in docs. Regards, Haribabu Kommi Fujitsu Australia 0001-log_file_mode-default-value-update.patch Description: Binary data
Re: pg_dump multi VALUES INSERT
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen wrote: > The attache patch use your method mostly I disagree with the "mostly" part. As far as I can see, you took the idea and then made a series of changes to completely break it. For bonus points, you put back my comment change to make it incorrect again. Here's what I got after applying your latest patch: $ pg_dump --table=t --inserts --rows-per-insert=4 postgres [...] INSERT INTO public.t VALUES (1); ) INSERT INTO public.t VALUES (, ( 2); ) INSERT INTO public.t VALUES (, ( 3); ) INSERT INTO public.t VALUES (, ( 4); ); INSERT INTO public.t VALUES (5); ) INSERT INTO public.t VALUES (, ( 6); ) INSERT INTO public.t VALUES (, ( 7); ) INSERT INTO public.t VALUES (, ( 8); ); INSERT INTO public.t VALUES (9); ) ; I didn't test, but I'm pretty sure that's not valid INSERT syntax. I'd suggest taking my changes and doing the plumbing work to tie the rows_per_statement into the command line arg instead of how I left it hardcoded as 3. >> +When using --inserts, this allows the maximum >> number >> +of rows per INSERT statement to be specified. >> +This setting defaults to 1. >> > i change it too except "This setting defaults to 1" because it doesn't have > default value. > 1 row per statement means --inserts option . If it does not default to 1 then what happens when the option is not specified? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Tid scan improvements
Hi all, I am a bit stuck and I think it's best to try to explain where. I'm still rebasing the patches for the changes Tom made to support parameterised TID paths for joins. While the addition of join support itself does not really touch the same code, the modernisation -- in particular, returning a list of RestrictInfos rather than raw quals -- does rewrite quite a bit of tidpath.c. The original code returned: List (with OR semantics) CTID = ? or CTID = ANY (...) or IS CURRENT OF (more items) That changed recently to return: List (with OR semantics) RestrictInfo CTID = ? or ... (more items) My last set of patches extended the tidqual extraction to pull out lists (with AND semantics) of range quals of the form CTID < ?, etc. Each list of more than one item was converted into an AND clause before being added to the tidqual list; a single range qual can be added to tidquals as is. This required looking across multiple RestrictInfos at the top level, for example: - "WHERE ctid > ? AND ctid < ?" would arrive at tidpath as a list of two RestrictInfos, from which we extract a single tidqual in the form of an AND clause. - "WHERE ctid = ? OR (ctid > ? AND ctid < ?)" arrives as only one RestrictInfo, but we extract two tidquals (an OpExpr, and an AND clause). The code could also ignore additional unusable quals from a list of top-level RestrictInfos, or from a list of quals from an AND clause, for example: - "WHERE foo = ? AND ctid > ? AND ctid < ?" gives us the single tidqual "ctid > ? AND ctid < ?". - "WHERE (ctid = ? AND bar = ?) OR (foo = ? AND ctid > ? AND ctid < ?)" gives us the two tidquals "ctid = ?" and "ctid > ? AND ctid < ?". As the extracted tidquals no longer match the original query quals, they aren't removed from scan_clauses in createplan.c, and hence are correctly checked by the filter. Aside: The analogous situation with an indexed user attribute "x" behaves a bit differently: - "WHERE x = ? OR (x > ? AND x < ?)", won't use a regular index scan, but might use a bitmap index scan. My patch uses the same path type and executor for all extractable tidquals. This worked pretty well, but I am finding it difficult to reimplement it in the new tidpath.c code. In the query information given to the path generator, there is no existing RestrictInfo relating to the whole expression "ctid > ? AND ctid < ?". I am still learning about RestrictInfos, but my understanding is it doesn't make sense to have a RestrictInfo for an AND clause, anyway; you're supposed to have them for the sub-expressions of it. And it doesn't seem a good idea to try to create new RestrictInfos in the path generation just to pass the tidquals back to plan creation. They're complicated objects. There's also the generation of scan_clauses in create_tidscan_plan (createplan.c:3107). This now uses RestrictInfos -- I'd image we'd need each AND clause to be wrapped in a RestrictInfo to be able to check it properly. To summarise, I'm not sure what kind of structure I should add to the tidquals list to represent a compound range expression. Maybe it's better to create a different path (either a new path type, or a flag in TidPath to say what kind of quals are attached) ? Edmund
Re: draft patch for strtof()
This one builds ok on appveyor with at least three different VS versions. Though I've not tried the exact combination of commands used by cfbot... yet. -- Andrew (irc:RhodiumToad) diff --git a/configure b/configure index 06fc3c6835..e3176e24e9 100755 --- a/configure +++ b/configure @@ -15802,6 +15802,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof" +if test "x$ac_cv_func_strtof" = xyes; then : + $as_echo "#define HAVE_STRTOF 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" strtof.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS strtof.$ac_objext" + ;; +esac + +fi + case $host_os in diff --git a/configure.in b/configure.in index 4efb912c4d..bdaab717d7 100644 --- a/configure.in +++ b/configure.in @@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([ strlcat strlcpy strnlen + strtof ])) case $host_os in diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 117ded8d1d..31f53878a2 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -104,13 +104,39 @@ is_infinite(double val) /* * float4in - converts "num" to float4 + * + * Note that this code now uses strtof(), where it used to use strtod(). + * + * The motivation for using strtof() is to avoid a double-rounding problem: + * for certain decimal inputs, if you round the input correctly to a double, + * and then round the double to a float, the result is incorrect in that it + * does not match the result of rounding the decimal value to float directly. + * + * One of the best examples is 7.038531e-26: + * + * 0xAE43FDp-107 = 7.03853069185120912085...e-26 + * midpoint 7.038531022281...e-26 + * 0xAE43FEp-107 = 7.03853130814879132477...e-26 + * + * making 0xAE43FDp-107 the correct float result, but if you do the conversion + * via a double, you get + * + * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26 + * midpoint 7.038530964884...e-26 + * 0xAE43FD.8000p-107 = 7.038531022281...e-26 + * 0xAE43FD.8008p-107 = 7.038531137076...e-26 + * + * so the value rounds to the double exactly on the midpoint between the two + * nearest floats, and then rounding again to a float gives the incorrect + * result of 0xAE43FEp-107. + * */ Datum float4in(PG_FUNCTION_ARGS) { char *num = PG_GETARG_CSTRING(0); char *orig_num; - double val; + float val; char *endptr; /* @@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS) "real", orig_num))); errno = 0; - val = strtod(num, &endptr); + val = strtof(num, &endptr); /* did we not see anything that looks like a double? */ if (endptr == num || errno != 0) @@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS) int save_errno = errno; /* - * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf, + * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf, * but not all platforms support all of these (and some accept them * but set ERANGE anyway...) Therefore, we check for these inputs - * ourselves if strtod() fails. + * ourselves if strtof() fails. * * Note: C99 also requires hexadecimal input as well as some extended * forms of NaN, but we consider these forms unportable and don't try - * to support them. You can use 'em if your strtod() takes 'em. + * to support them. You can use 'em if your strtof() takes 'em. */ if (pg_strncasecmp(num, "NaN", 3) == 0) { @@ -195,8 +221,17 @@ float4in(PG_FUNCTION_ARGS) * precision). We'd prefer not to throw error for that, so try to * detect whether it's a "real" out-of-range condition by checking * to see if the result is zero or huge. + * + * Use isinf() rather than HUGE_VALF on VS2013 because it generates + * a spurious overflow warning for -HUGE_VALF. */ - if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL) + if (val == 0.0 || +#if defined(_MSC_VER) && (_MSC_VER < 1900) +isinf(val) +#else +(val >= HUGE_VALF || val <= -HUGE_VALF) +#endif +) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"%s\" is out of range for type real", @@ -232,13 +267,7 @@ float4in(PG_FUNCTION_ARGS) errmsg("invalid input syntax for type %s: \"%s\"", "real", orig_num))); - /* - * if we get here, we have a legal double, still need to check to see if - * it's a legal float4 - */ - check_float4_val((float4) val, isinf(val), val == 0); - - PG_RETURN_FLOAT4((float4) val); + PG_RETURN_FLOAT4(val); } /* diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 9d99816eae..84ca929439 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -555,6 +555,9 @@ /* Define to 1 if you have the `strsignal' function. */ #undef HAVE_STRSIGNAL +/* Define to 1 if you have the `strtof' function. */ +#undef HAVE_STRTOF + /* Define to 1 if you have the `strtoll' function. */ #undef HAVE_STRTOLL diff --git a/src/include/pg_con
Re: ArchiveEntry optional arguments refactoring
On Wed, 16 Jan 2019 at 17:45, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > Hi, > > During the discussion in [1] an idea about refactoring ArchiveEntry was > suggested. The reason is that currently this function has significant number > of > arguments that are "optional", and every change that has to deal with it > introduces a lot of useless diffs. In the thread, mentioned above, such an > example is tracking current table access method, and I guess "Remove WITH > OIDS" > commit 578b229718e is also similar. > > Proposed idea is to refactor out all/optional arguments into a separate data > structure, so that adding/removing a new argument wouldn't change that much of > unrelated code. Then for every invocation of ArchiveEntry this structure needs > to be prepared before the call, or as Andres suggested: > > ArchiveEntry((ArchiveArgs){.tablespace = 3, >.dumpFn = somefunc, >...}); I didn't know we could do it this way. I thought we would have to declare a variable and have to initialize fields with non-const values separately. This looks nice. We could even initialize fields with non-const values. +1 from me. I think, we could use the same TocEntry structure as parameter, rather than a new structure. Most of the arguments already resemble fields of this structure. Also, we could pass pointer to that structure : ArchiveEntry( &(TocEntry){.tablespace = 3, .dumpFn = somefunc, ...}); -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: Pluggable Storage - Andres's take
On Tue, 15 Jan 2019 at 17:58, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Tue, Jan 15, 2019 at 10:52 AM Amit Khandekar > > wrote: > > > > Need to bump K_VERS_MINOR as well. > > I've bumped it up, but somehow this change escaped the previous version. Now > should be there, thanks! > > > On Mon, 14 Jan 2019 at 18:36, Amit Khandekar wrote: > > > +static void _selectTableAccessMethod(ArchiveHandle *AH, const char > > > *tablespace); > > > tablespace => tableam > > > > This is yet to be addressed. > > Fixed. Thanks, the patch looks good to me. Of course there's the other thread about ArchiveEntry arguments which may alter this patch, but otherwise, I have no more comments on this patch. > > Also I guess another attached patch should address the psql part, namely > displaying a table access method with \d+ and possibility to hide it with a > psql variable (HIDE_TABLEAM, but I'm open for suggestion about the name). Will have a look at this one. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: Fix function name in commet in vacuumlazy.c
On Fri, Jan 18, 2019 at 12:50:15PM +0900, Masahiko Sawada wrote: > Attached small patch fixes the function name heap_vacuum_rel in the comment. > > s/vacuum_heap_rel/heap_vacuum_rel/ Fixed, thanks. -- Michael signature.asc Description: PGP signature
Re: pg_dump multi VALUES INSERT
On Thu, Jan 17, 2019 at 5:15 AM Surafel Temesgen wrote: > On Fri, Jan 4, 2019 at 3:08 PM David Rowley > wrote: >> >> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen wrote: >> >> >> 2. Is --insert-multi a good name? What if they do --insert-multi=1? >> That's not very "multi". Is --rows-per-insert better? >> > > --rows-per-insert is better for me . Some thoughts/suggestions: + int dump_inserts_multiple; The option name uses rows, seems like this should mirror that and be named "dump_inserts_max_rows" + + --rows-per-insert + + +When using --rows-per-insert, this allows the maximum number +of rows per INSERT statement to be specified. + + + "When using ..." - no other option description uses this redundant language and this should not either. Just say what it does. This specifies the maximum number of rows (default 1) that will be attached to each INSERT command generated by the --inserts or --column-inserts options; exactly one of which must be specified as well. (see my note at the end) + printf(_(" --rows-per-insertnumber of row per INSERT command\n")); (maximum?) number of row[s] per INSERT command + qr/\Qpg_dump: option --on-conflict-do-nothing requires option --inserts , --rows-per-insert or --column-inserts\E/, + 'pg_dump: option --on-conflict-do-nothing requires option --inserts , --rows-per-insert or --column-inserts'); You don't put spaces on both sides of the comma, just after; and add a comma before the "or" (I think...not withstanding the below) I suggest we require that --rows-per-insert be dependent upon exactly one of --inserts or --column-inserts being set and not let it be set by itself (in which case the original message for --on-conflict-do-nothing is OK). David J.
RE: speeding up planning with partitions
Imai-san, From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp] > On 2019/01/09 11:08, Imai, Yoshikazu wrote: > > I wonder why force_custom_plan is faster than auto after applied the patch. > > > > When we use PREPARE-EXECUTE, a generic plan is created and used if its > cost is > > cheaper than creating and using a custom plan with plan_cache_mode='auto', > > while a custom plan is always created and used with > plan_cache_mode='force_custom_plan'. > > So one can think the difference in above results is because of creating > or > > using a generic plan. > > > > I checked how many times a generic plan is created during executing pgbench > and > > found a generic plan is created only once and custom plans are created > at other > > times with plan_cache_mode='auto'. I also checked the time of creating > a > > generic plan, but it didn't take so much(250ms or so with 4096 partitions). > So > > the time of creating a generic plan does not affect the performance. > > > > Currently, a generic plan is created at sixth time of executing EXECUTE > query. > > I changed it to more later (ex. at 400,000th time of executing EXECUTE > query on > > master with 4096 partitions, because 7000TPS x 60sec=420, > transactions are > > run while executing pgbench.), then there are almost no difference between > auto > > and force_custom_plan. I think that creation of a generic plan affects > the time > > of executing queries which are ordered after creating generic plan. > > > > If my assumption is right, we can expect some additional process is > occurred at > > executing queries ordered after creating a generic plan, which results > in auto is > > slower than force_custom_plan because of additional process. But looking > at > > above results, on master with 4096 partitions, auto is faster than > force_custom_plan. > > So now I am confused. > > > > Do you have any ideas what does affect the performance? > > Are you saying that, when using auto mode, all executions of the query > starting from 7th are slower than the first 5 executions? That is, the > latency of creating and using a custom plan increases *after* a generic > plan is created and discarded on the 6th execution of the query? If so, > that is inexplicable to me. Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called in every EXECUTE after 6th one due to some unknow issue? Does choose_custom_plan() always return true after 6th EXECUTE? Regards Takayuki Tsunakawa
Re: problems with foreign keys on partitioned tables
On 2019/01/18 7:54, Alvaro Herrera wrote: > On 2019-Jan-09, Amit Langote wrote: > >> 1. Foreign keys of partitions stop working correctly after being detached >> from the parent table > >> This happens because the action triggers defined on the PK relation (pk) >> refers to p as the referencing relation. On detaching p1 from p, p1's >> data is no longer accessible to that trigger. > > Ouch. > >> To fix this problem, we need create action triggers on PK relation >> that refer to p1 when it's detached (unless such triggers already >> exist which might be true in some cases). Attached patch 0001 shows >> this approach. > > Hmm, okay. I'm not in love with the idea that such triggers might > already exist -- seems unclean. We should remove redundant action > triggers when we attach a table as a partition, no? OK, I agree. I have updated the patch to make things work that way. With the patch: create table pk (a int primary key); create table p (a int references pk) partition by list (a); -- this query shows the action triggers on the referenced rel ('pk'), name -- of the constraint that the trigger is part of and the foreign key rel -- ('p', etc.) select tgrelid::regclass as pkrel, c.conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼───┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p (2 rows) create table p1 ( a int references pk, foreign key (a) references pk (a) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE, foreign key (a) references pk (a) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE ) partition by list (a); -- p1_a_fkey on 'p1' is equivalent to p_a_fkey on 'p', but they're not -- attached yet select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey │ p1 pk│ p1_a_fkey │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 (8 rows) create table p11 (like p1, foreign key (a) references pk); -- again, p11_a_fkey, p1_a_fkey, and p_a_fkey are equivalent select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey │ p1 pk│ p1_a_fkey │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p11_a_fkey │ p11 pk│ p11_a_fkey │ p11 (10 rows) alter table p1 attach partition p11 for values in (1); -- p1_a_fkey and p11_a_fkey merged, so triggers for the latter dropped select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey │ p1 pk│ p1_a_fkey │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 (8 rows) -- p_a_fkey and p1_a_fkey merged, so triggers for the latter dropped alter table p attach partition p1 for values in (1); select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 (6 rows) alter table p detach partition p1; -- p1_a_fkey needs its own triggers again select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey │ p1 pk│ p1_a_fkey │ p1 (8 rows) alter table p1 detach partition p11; -- p11_a_fkey needs its own triggers again select tgrelid::regclass as pkrel, conname as fkconname, tgconstrrelid::regclass as fkrel from pg_trigger t, pg_constraint c where tgrelid = 'pk'::regclass and tgconstraint = c.oid; pkrel │ fkconname │ fkrel ───┼┼─── pk│ p_a_fkey │ p pk│ p_a_fkey │ p pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey1 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey2 │ p1 pk│ p1_a_fkey │ p1 pk│ p1_a_fke
Re: speeding up planning with partitions
Tsunakawa-san, On 2019/01/18 14:12, Tsunakawa, Takayuki wrote: > From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp] >> Are you saying that, when using auto mode, all executions of the query >> starting from 7th are slower than the first 5 executions? That is, the >> latency of creating and using a custom plan increases *after* a generic >> plan is created and discarded on the 6th execution of the query? If so, >> that is inexplicable to me. > > Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called in every > EXECUTE after 6th one due to some unknow issue? CheckCachedPlan is only called if choose_custom_plan() returns false resulting in generic plan being created/reused. With plan_cache_mode = auto, I see it always returns true, because a custom plan containing a single partition to scan is way cheaper than the generic plan. > Does choose_custom_plan() always return true after 6th EXECUTE? Yes. Thanks, Amit