Re: GSoC 2019
Hi! > 8 янв. 2019 г., в 22:57, Alexander Korotkov > написал(а): > > On Tue, Jan 8, 2019 at 1:06 AM Stephen Frost wrote: >> All the entries are marked with '2018' to indicate they were pulled from >> last year. If the project from last year is still relevant, please >> update it to be '2019' and make sure to update all of the information >> (in particular, make sure to list yourself as a mentor and remove the >> other mentors, as appropriate). > > I can confirm that I'm ready to mentor projects, where I'm listed as > potential mentor. I've updated GiST API and amcheck project year, removed mentors (except Alexander). Please, put your names back if you still wish to mentor this projects. Also, we are planning to add new WAL-G project, Vladimir Leskov is now composing multiple tasks WAL-G to single project. Vladimir had done 2018's WAL-G project during Yandex internship, so I'll remove this project from page. Best regards, Andrey Borodin.
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
On 2019/01/10 14:25, Michael Paquier wrote: > On Tue, Jan 08, 2019 at 04:42:35PM +0900, Michael Paquier wrote: >> I can see your point, though I would stick with >> ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and >> because the code is intended to not work on anything else than plain >> tables, at least now. > > Attached are my suggestions shaped as a patch. Thoughts? Thanks for updating the patch and sorry couldn't reply sooner. Your rewritten version is perhaps fine, although I remain a bit concerned that some users might be puzzled when they see this error, that is, if they interpret the message as "it's impossible to use a partitioned table as logical replication target". Thanks, Amit
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
> > On Tue, Jan 08, 2019 at 04:42:35PM +0900, Michael Paquier wrote: > >> I can see your point, though I would stick with > >> ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and > >> because the code is intended to not work on anything else than plain > >> tables, at least now. > > > > Attached are my suggestions shaped as a patch. Thoughts? > > Thanks for updating the patch and sorry couldn't reply sooner. > > Your rewritten version is perhaps fine, although I remain a bit concerned > that some users might be puzzled when they see this error, that is, if > they interpret the message as "it's impossible to use a partitioned table > as logical replication target". > > >From [documentation]( https://www.postgresql.org/docs/current/logical-replication-restrictions.html ) : > Attempts to replicate tables other than base tables will result in an error. That's basicaly what I had understood about logicial replication... Cheers, Lætitia -- Adoptez l'éco-attitude N'imprimez ce mail que si c'est vraiment nécessaire
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
Hi, On 2019/01/10 17:46, Arkhena wrote: >> Your rewritten version is perhaps fine, although I remain a bit concerned >> that some users might be puzzled when they see this error, that is, if >> they interpret the message as "it's impossible to use a partitioned table >> as logical replication target". >> >> >>From [documentation]( > https://www.postgresql.org/docs/current/logical-replication-restrictions.html > ) : >> Attempts to replicate tables other than base tables will result in an > error. > > That's basicaly what I had understood about logicial replication... Ah, if the documentation contains such description then maybe it's fine. The reason I started this thread is due to this Stack Overflow question: https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11 So, it appears that there may be an element of surprise involved in encountering such an error (despite the documentation). Thanks, Amit
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Alvaro, Here are my proposed final changes. Thanks again for improving the patch! I noticed that you were allocating the prefixes for all cases even when there were no cset/gset in the command; I changed it to delay the allocation until needed. Ok, why not. I also noticed you were skipping the Meta enum dance for no good reason; Indeed. I think that the initial version of the patch was before the "dance" was added, and it did not keep up with it. added that makes conditionals simpler. The read_response routine seemed misplaced and I gave it a name in a style closer to the rest of the pgbench code. Fine. Also, you were missing to free the ->lines pqexpbuffer when the command is discarded. I grant that the free_command() stuff might be bogus since it's only tested with a command that's barely initialized, but it seems better to make it bogus in this way (fixable if we ever extend its use) than to forever leak memory silently. Ok. However, I switched "pg_free" to "termPQExpBuffer", which seems more appropriate, even if it just does the same thing. I also ensured that prefixes are allocated & freed. I've commented about expr which is not freed. I didn't test this beyond running "make check". That's a good start. I'm not keen on having the command array size checked and updated *after* the command is appended, even if the initial allocation ensures that there is no overflow, but I let it as is. Further tests did not yield any new issue. Attached a v29 with the above minor changes wrt your version. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index b5e3a62a33..cc369c423e 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -954,6 +954,91 @@ pgbench options d + + + \cset [prefix] + + + + + This command may be used to end SQL queries, replacing an embedded + semicolon (\;) within a compound SQL command. + + + + When this command is used, the preceding SQL query is expected to + return one row, the columns of which are stored into variables named after + column names, and prefixed with prefix if provided. + + + + The following example sends four queries as one compound SQL command, + inducing one message sent at the protocol level. + The result of the first query is stored into variable one, + the results of the third query are stored into variables z_three + and z_four, + whereas the results of the other queries are discarded. + +-- compound of four queries +SELECT 1 AS one \cset +SELECT 2 AS two \; +SELECT 3 AS three, 4 AS four \cset z_ +SELECT 5; + + + + + +\cset does not work when empty SQL queries appear +within a compound SQL command. + + + + + + + + \gset [prefix] + + + + + This commands may be used to end SQL queries, replacing a final semicolon + (;). + + + + When this command is used, the preceding SQL query is expected to + return one row, the columns of which are stored into variables named after + column names, and prefixed with prefix if provided. + + + + The following example puts the final account balance from the first query + into variable abalance, and fills variables + p_two and p_three + with integers from the last query. + The result of the second query is discarded. + +UPDATE pgbench_accounts + SET abalance = abalance + :delta + WHERE aid = :aid + RETURNING abalance \gset +-- compound of two queries +SELECT 1 \; +SELECT 2 AS two, 3 AS three \gset p_ + + + + + +\gset does not work when empty SQL queries appear +within a compound SQL command. + + + + + \if expression \elif expression diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 649297ae4f..8bac939ff8 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -482,6 +482,8 @@ typedef enum MetaCommand META_SETSHELL,/* \setshell */ META_SHELL, /* \shell */ META_SLEEP, /* \sleep */ + META_CSET, /* \cset */ + META_GSET, /* \gset */ META_IF, /* \if */ META_ELIF, /* \elif */ META_ELSE, /* \else */ @@ -499,16 +501,39 @@ typedef enum QueryMode static QueryMode querymode = QUERY_SIMPLE; static const char *QUERYMODE[] = {"simple", "extended", "prepared"}; -typedef struct +/* + * struct Command represents one command in a script. + * + * lines The raw, possibly multi-line command text. Variable substitution + *not applied. + * first_line A short, single-line extract of 'lines', for error reporting. + * type SQL_COMMAND or META_COMMAND + * meta The type of meta-command, or META_NONE if command is SQL + * argc Number of arguments of the command, 0 if not yet processed. + * argv Command arguments, the first of which i
Re: speeding up planning with partitions
On Thu, 10 Jan 2019 at 21:41, Amit Langote wrote: > In the v13 that I will try to post tomorrow, I will have hopefully > addressed David's and Imai-san's review comments. Thank you both! I'd been looking at v11's 0002 and started on 0003 too. I'll include my notes so far if you're about to send a v13. v11 0002 18. There's a missing case in the following code. I understand that makeNode() will 0 the member here, but that does not stop the various other initialisations that set the default value for the field. Below there's a missing case where parent != NULL && parent->rtekind != RTE_RELATION. You might be better just always zeroing the field below "rel->partitioned_child_rels = NIL;" + + /* + * For inheritance child relations, we also set inh_root_parent. + * Note that 'parent' might itself be a child (a sub-partitioned + * partition), in which case we simply use its value of + * inh_root_parent. + */ + if (parent->rtekind == RTE_RELATION) + rel->inh_root_parent = parent->inh_root_parent > 0 ? + parent->inh_root_parent : + parent->relid; } else + { rel->top_parent_relids = NULL; + rel->inh_root_parent = 0; + } 19. Seems strange to have a patch that adds a new field that is unused. I also don't quite understand yet why top_parent_relids can't be used instead. I think I recall being confused about that before, so maybe it's worth writing a comment to mention why it cannot be used. v11 0003 20. This code looks wrong: + /* + * expand_inherited_tables may have proved that the relation is empty, so + * check if it's so. + */ + else if (rte->inh && !IS_DUMMY_REL(rel)) Likely you'll want: else if rte->inh) { if (IS_DUMMY_REL(rel)) return; // other stuff } otherwise, you'll end up in the else clause when you shouldn't be. 21. is -> was + * The child rel's RelOptInfo is created during + * expand_inherited_tables(). */ childrel = find_base_rel(root, childRTindex); since you're talking about something that already happened. I'll continue looking at v12. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #15446: Crash on ALTER TABLE
On Wed, 9 Jan 2019 at 23:24, Andrew Dunstan wrote: > Here's another attempt. For the rewrite case it kept the logic of the > previous patch to clear all the missing attributes, but if we're not > rewriting we reconstruct the missing value according to the new type > settings. > Looks good to me. Regards, Dean
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
On Thu, Jan 10, 2019 at 7:12 AM Amit Langote wrote: > Fujita-san, > > On 2019/01/09 20:20, Etsuro Fujita wrote: > > (2019/01/09 9:30), Amit Langote wrote: > >> So, while the patch at [1] can take care of this issue as I also > mentioned > >> upthread, I was trying to come up with a solution that can be > back-patched > >> to PG 11. The patch I posted above is one such solution and as Ashutosh > >> points out it's perhaps not the best, because it can result in > potentially > >> creating many copies of the same child EC member if we do it in > joinrel.c, > >> as the patch proposes. I will try to respond to the concerns he raised > in > >> the next week if possible. > > > > Thanks for working on this! > > > > I like your patch in general. I think one way to address Ashutosh's > > concerns would be to use the consider_partitionwise_join flag: > originally, > > that was introduced for partitioned relations to show that they can be > > partitionwise-joined, but I think that flag could also be used for > > non-partitioned relations to show that they have been set up properly for > > partitionwise-joins, and I think by checking that flag we could avoid > > creating those copies for child dummy rels in try_partitionwise_join. > > Ah, that's an interesting idea. > > If I understand the original design of it correctly, > consider_partitionwise_join being true for a given relation (simple or > join) means that its RelOptInfo contains properties to consider it to be > joined with another relation (simple or join) using partitionwise join > mechanism. Partitionwise join will occur between the pair if the other > relation also has relevant properties (hence its > consider_partitionwise_join set to true) and properties on the two sides > match. > > Though this will solve a problem for performance when partition-wise join is not possible, we still have the same problem when partition-wise join is possible. And that problem really happens because our inheritance mechanism requires expression translation from parent to child everywhere. That consumes memory, eats CPU cycles and generally downgrades performance of partition related query planning. I think a better way would be to avoid these translations and use Parent var to represent a Var of the child being dealt with. That will be a massive churn on inheritance based planner code, but it will improve planning time for queries involving thousands of partitions. -- Best Wishes, Ashutosh Bapat
Policy on cross-posting to multiple lists
Has the policy on cross-posting to multiple lists been hardened recently? The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's message on 8 Jan with an initial proposed patch and my response later that day both CC'ed -hackers and seem to have been rejected, and so are missing from the archives. In that case, it's not a big deal because subsequent replies included the text from the missing messages, so it's still possible to follow the discussion, but I wanted to check whether this was an intentional change of policy. If so, it seems a bit harsh to flat-out reject these messages. My prior understanding was that cross-posting, while generally discouraged, does still sometimes have value. [1] https://www.postgresql.org/message-id/flat/CAEZATCVqksrnXybSaogWOzmVjE3O-NqMSpoHDuDw9_7mhNpeLQ%40mail.gmail.com#2c25e9a783d4685912dcef8b3f3edd63 Regards, Dean
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Hello > I was just double-checking the code, and it is possible to remove the > part from RequestXLogStreaming() which sets slotname and conninfo to > '\0', so I removed that part from my local branch to simplify the > code. Without both ready_to_display and cleaning in RequestXLogStreaming we do not show outdated PrimaryConnInfo in case walreceiver just started, but does not connected yet? While waiting walrcv_connect for example. regards, Sergei
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
On Thu, Jan 10, 2019 at 05:58:10PM +0900, Amit Langote wrote: > The reason I started this thread is due to this Stack Overflow question: > > https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11 > > So, it appears that there may be an element of surprise involved in > encountering such an error (despite the documentation). Improving the user experience is definitely a good thing in my opinion because the current error message can be confusing, so you were right to start this thread. Still I don't agree that classifying those relkinds as not supported is right either for consistency with the code existing for two years and for the way the code is designed to work as rows are replicated on a per-physically-defined relation basis. -- Michael signature.asc Description: PGP signature
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Thu, Jan 10, 2019 at 01:10:16PM +0300, Sergei Kornilov wrote: > Without both ready_to_display and cleaning in RequestXLogStreaming > we do not show outdated PrimaryConnInfo in case walreceiver just > started, but does not connected yet? While waiting walrcv_connect > for example. Good point. There is a gap between the moment the WAL receiver PID is set when the WAL receiver starts up and the moment where the different fields are reset to 0 which is not good as it could be possible that the user sees the information from the previous WAL receiver, so we should move the memset calls when the PID is set, reaching a state where the PID is alive but where there is no connection yet. The port number needs a similar treatment. Looking closer at the code, it seems to me that it would be a good thing if we update the shared memory state when the WAL receiver dies, so as not only the PID is set to 0, but all connection-related information gets the call. With all those comments I am finishing with the updated attached. Thoughts? -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9823b75767..4191b23621 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11857,8 +11857,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, tli, curFileTLI); } curFileTLI = tli; - RequestXLogStreaming(tli, ptr, PrimaryConnInfo, - PrimarySlotName); + RequestXLogStreaming(tli, ptr); receivedUpto = 0; } diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 2e90944ad5..7e3ff63a44 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -188,9 +188,7 @@ DisableWalRcvImmediateExit(void) void WalReceiverMain(void) { - char conninfo[MAXCONNINFO]; char *tmp_conninfo; - char slotname[NAMEDATALEN]; XLogRecPtr startpoint; TimeLineID startpointTLI; TimeLineID primaryTLI; @@ -248,10 +246,6 @@ WalReceiverMain(void) walrcv->pid = MyProcPid; walrcv->walRcvState = WALRCV_STREAMING; - /* Fetch information required to start streaming */ - walrcv->ready_to_display = false; - strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO); - strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN); startpoint = walrcv->receiveStart; startpointTLI = walrcv->receiveStartTLI; @@ -262,6 +256,16 @@ WalReceiverMain(void) /* Report the latch to use to awaken this process */ walrcv->latch = &MyProc->procLatch; + /* + * Reset all connection information when the PID is set, which makes + * the information visible at SQL level, still we are not connected + * yet. + */ + memset(walrcv->conninfo, 0, MAXCONNINFO); + memset(walrcv->slotname, 0, NAMEDATALEN); + memset(walrcv->sender_host, 0, NI_MAXHOST); + walrcv->sender_port = 0; + SpinLockRelease(&walrcv->mutex); /* Arrange to clean up at walreceiver exit */ @@ -291,32 +295,36 @@ WalReceiverMain(void) /* Unblock signals (they were blocked when the postmaster forked us) */ PG_SETMASK(&UnBlockSig); + if (PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0') + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined"))); + /* Establish the connection to the primary for XLOG streaming */ EnableWalRcvImmediateExit(); - wrconn = walrcv_connect(conninfo, false, "walreceiver", &err); + wrconn = walrcv_connect(PrimaryConnInfo, false, "walreceiver", &err); if (!wrconn) ereport(ERROR, (errmsg("could not connect to the primary server: %s", err))); DisableWalRcvImmediateExit(); /* - * Save user-visible connection string. This clobbers the original - * conninfo, for security. Also save host and port of the sender server - * this walreceiver is connected to. + * Save user-visible connection string. Also save host and port of the + * sender server this walreceiver is connected to. */ tmp_conninfo = walrcv_get_conninfo(wrconn); walrcv_get_senderinfo(wrconn, &sender_host, &sender_port); SpinLockAcquire(&walrcv->mutex); - memset(walrcv->conninfo, 0, MAXCONNINFO); if (tmp_conninfo) strlcpy((char *) walrcv->conninfo, tmp_conninfo, MAXCONNINFO); - memset(walrcv->sender_host, 0, NI_MAXHOST); + if (PrimarySlotName) + strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN); + if (sender_host) strlcpy((char *) walrcv->sender_host, sender_host, NI_MAXHOST); walrcv->sender_port = sender_port; - walrcv->ready_to_display = true; SpinLockRelease(&walrcv->mutex); if (tmp_conninfo) @@ -387,7 +395,8 @@ WalReceiverMain(void) */ options.logical = false; options.startpoint = startpoint; - options.slotname = slotname[0] != '\0' ? slotname : NULL; + options.slotname = (PrimarySlotName && PrimarySlotName[0] != '\0') ? + PrimarySlotName : NULL; options.proto.physical.startpointTLI = start
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Hi Thank you, patch looks good for me. regards, Sergei
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat wrote: > Though this will solve a problem for performance when partition-wise join is > not possible, we still have the same problem when partition-wise join is > possible. And that problem really happens because our inheritance mechanism > requires expression translation from parent to child everywhere. That > consumes memory, eats CPU cycles and generally downgrades performance of > partition related query planning. I think a better way would be to avoid > these translations and use Parent var to represent a Var of the child being > dealt with. That will be a massive churn on inheritance based planner code, > but it will improve planning time for queries involving thousands of > partitions. Yeah, it would be nice going forward to overhaul inheritance planning such that parent-to-child Var translation is not needed, especially where no pruning can occur or many partitions remain even after pruning. Thanks, Amit
Strange wording in advanced tutorial
The file header in the advanced tutorial has what seems like incorrect (or at least odd) wording: "Tutorial on advanced more PostgreSQL features”. Attached patch changes to “more advanced” which I think is what was the intention. I can willingly admit that I had never even noticed the tutorial directory until I yesterday stumbled across it. The commit introducing the above wording is by now old enough to drive. cheers ./daniel tutorial_wording.patch Description: Binary data
Re: PostgreSQL vs SQL/XML Standards
Hello Pavel, On 09.11.2018 07:07, Pavel Stehule wrote: I used your patch and append regress tests. I checked the result against Oracle. I checked the patch with Chap cases. The patch fixes handling of boolean, number types which mentioned in the wiki. I have a few comments related to the code and the documentation. I attached the patch, which fixes it. There is an example in the documentation: SELECT xmltable.* FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text); element -- Hello2a2 bbbCC With the patch XMLTABLE returns different result now. copy_and_safe_free_xmlchar() function should be hid by #ifdef USE_LIBXML, otherwise I get an error if I build the Postgres without --with-libxml. There is a comment within XmlTableGetValue(). I changed it, mainly I used Markus patch from the related thread mentioned by Alvaro. Please see the changes in the patch. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 90d67f1acf..06f3f69073 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10999,9 +10999,9 @@ $$ AS data; SELECT xmltable.* FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text); - element --- - Hello2a2 bbbCC + element +- + Hello2a2 bbbxxxCC ]]> diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index a83882f5de..df7f0cc20d 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -4427,6 +4427,7 @@ XmlTableFetchRow(TableFuncScanState *state) #endif /* not USE_LIBXML */ } +#ifdef USE_LIBXML /* * Copy XmlChar string to PostgreSQL memory. Ensure releasing of * source xmllib string. @@ -4455,6 +4456,7 @@ copy_and_safe_free_xmlchar(xmlChar *str) return result; } +#endif /* * XmlTableGetValue @@ -4504,9 +4506,9 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, /* * There are four possible cases, depending on the number of nodes * returned by the XPath expression and the type of the target column: - * a) XPath returns no nodes. b) One node is returned, and column is - * of type XML. c) One node, column type other than XML. d) Multiple - * nodes are returned. + * a) XPath returns no nodes. b) The target type is XML (return all + * as XML). For non-XML types: c) One node (return content). + * d) Multiple nodes (error). */ if (xpathobj->type == XPATH_NODESET) {
Re: Remove Deprecated Exclusive Backup Mode
Stephen Frost wrote: > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > > Clearly, not having to do that at all is better, but if this is all > > there is to it, then I'm confused by the characterizations of how awful > > and terrible this feature is and how we must rush to remove it. > > It's not all there is to it though. > > This issue leads to extended downtime regularly and is definitely a huge > 'gotcha' for users, even if you don't want to call it outright broken, Only if PostgreSQL crashes regularly, right? Yours, Laurenz Albe
Re: Remove Deprecated Exclusive Backup Mode
Greetings, * Laurenz Albe (laurenz.a...@cybertec.at) wrote: > Stephen Frost wrote: > > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > > > Clearly, not having to do that at all is better, but if this is all > > > there is to it, then I'm confused by the characterizations of how awful > > > and terrible this feature is and how we must rush to remove it. > > > > It's not all there is to it though. > > > > This issue leads to extended downtime regularly and is definitely a huge > > 'gotcha' for users, even if you don't want to call it outright broken, > > Only if PostgreSQL crashes regularly, right? This happens anytime a backup is in progress and the system crashes in any way- PostgreSQL going down, the container Postgres is running in, the server Postgres is running on, or if the filesystem that Postgres is running on goes away, etc. There's certainly no shortage of cases where this can happen. Thanks! Stephen signature.asc Description: PGP signature
Re: PostgreSQL vs SQL/XML Standards
Hi čt 10. 1. 2019 v 14:00 odesílatel Arthur Zakirov napsal: > Hello Pavel, > > On 09.11.2018 07:07, Pavel Stehule wrote: > > I used your patch and append regress tests. I checked the result against > > Oracle. > > I checked the patch with Chap cases. The patch fixes handling of > boolean, number types which mentioned in the wiki. > > I have a few comments related to the code and the documentation. I > attached the patch, which fixes it. > > There is an example in the documentation: > > SELECT xmltable.* >FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text); > element > -- > Hello2a2 bbbCC > > With the patch XMLTABLE returns different result now. > > copy_and_safe_free_xmlchar() function should be hid by #ifdef > USE_LIBXML, otherwise I get an error if I build the Postgres without > --with-libxml. > > There is a comment within XmlTableGetValue(). I changed it, mainly I > used Markus patch from the related thread mentioned by Alvaro. > > Please see the changes in the patch. > I merged your changes, and fixed regress tests. Thank you for patch Regards Pavel > -- > Arthur Zakirov > Postgres Professional: http://www.postgrespro.com > Russian Postgres Company > diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 90d67f1acf..06f3f69073 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -10999,9 +10999,9 @@ $$ AS data; SELECT xmltable.* FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text); - element --- - Hello2a2 bbbCC + element +- + Hello2a2 bbbxxxCC ]]> diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 1cec168b2a..df7f0cc20d 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3674,15 +3674,15 @@ SPI_sql_row_to_xmlelement(uint64 rownum, StringInfo result, char *tablename, #ifdef USE_LIBXML /* - * Convert XML node to text (dump subtree in case of element, - * return value otherwise) + * Convert XML node to text (dump subtree), for attribute and text + * returns escaped text. */ static text * xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; - if (cur->type == XML_ELEMENT_NODE) + if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE) { xmlBufferPtr buf; xmlNodePtr cur_copy; @@ -4427,6 +4427,37 @@ XmlTableFetchRow(TableFuncScanState *state) #endif /* not USE_LIBXML */ } +#ifdef USE_LIBXML +/* + * Copy XmlChar string to PostgreSQL memory. Ensure releasing of + * source xmllib string. + */ +static char * +copy_and_safe_free_xmlchar(xmlChar *str) +{ + char *result; + + if (str) + { + PG_TRY(); + { + result = pstrdup((char *) str); + } + PG_CATCH(); + { + xmlFree(str); + PG_RE_THROW(); + } + PG_END_TRY(); + xmlFree(str); + } + else + result = NULL; + + return result; +} +#endif + /* * XmlTableGetValue * Return the value for column number 'colnum' for the current row. If @@ -4475,9 +4506,9 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, /* * There are four possible cases, depending on the number of nodes * returned by the XPath expression and the type of the target column: - * a) XPath returns no nodes. b) One node is returned, and column is - * of type XML. c) One node, column type other than XML. d) Multiple - * nodes are returned. + * a) XPath returns no nodes. b) The target type is XML (return all + * as XML). For non-XML types: c) One node (return content). + * d) Multiple nodes (error). */ if (xpathobj->type == XPATH_NODESET) { @@ -4490,85 +4521,72 @@ XmlTableGetValue(TableFuncScanState *state, int colnum, { *isnull = true; } - else if (count == 1 && typid == XMLOID) - { -text *textstr; - -/* simple case, result is one value */ -textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0], - xtCxt->xmlerrcxt); -cstr = text_to_cstring(textstr); - } - else if (count == 1) + else { -xmlChar*str; -xmlNodePtr node; - -/* - * Most nodes (elements and even attributes) store their data - * in children nodes. If they don't have children nodes, it - * means that they are empty (e.g. ). Text nodes and - * CDATA sections are an exception: they don't have children - * but have content in the Text/CDATA node itself. - */ -node = xpathobj->nodesetval->nodeTab[0]; -if (node->type != XML_CDATA_SECTION_NODE && - node->type != XML_TEXT_NODE) - node = node->xmlChildrenNode; - -str = xmlNodeListGetString(xtCxt->doc, node, 1); -if (str != NULL) +if (typid == XMLOID) { - PG_TRY(); - { - cstr = pstrdup((char *) str); - } - PG_CATCH(); + text *textstr; + StringInfoData str; + int i; + + /* Concatenate serialized values */ + initStringInfo(&str);
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Wed, 9 Jan 2019 at 15:40, Tomas Vondra wrote: > On 1/8/19 3:18 PM, Dean Rasheed wrote: > > So actually, the estimate for a group of values will be either the MCV > > item's frequency (if the MCV item is kept), or (roughly) the MCV > > item's base_frequency (if the MCV item is not kept). That suggests > > that we should simply keep items that are significantly more or less > > common than the item's base frequency -- i.e., keep rule (b) and ditch > > rule (a). > > > > Hmmm, but won't that interfere with how we with how we extrapolate the > MCV estimate to the non-MCV part? Currently the patch does what you > proposed, i.e. > > other_sel = simple_sel - mcv_basesel; > > I'm worried that if we only include the items that are significantly > more or less common than the base frequency, it may skew the other_sel > estimate. > I don't see how that would skew other_sel. Items close to the base frequency would also tend to be close to simple_sel, making other_sel approximately zero, so excluding them should have little effect. However... Re-reading the thread where we enhanced the per-column MCV stats last year [1], it was actually the case that an algorithm based on just looking at the relative standard error worked pretty well for a very wide range of data distributions. The final algorithm chosen in analyze_mcv_list() was only a marginal improvement on that, and was directly based upon the fact that, in the univariate statistics case, all the values not included in the MCV list are assigned the same selectivity. However, that's not the case for multivariate stats, because each group not included in the multivariate MCV list gets assigned a different selectivity based on its per-column stats. So perhaps what we should do for multivariate stats is simply use the relative standard error approach (i.e., reuse the patch in [2] with a 20% RSE cutoff). That had a lot of testing at the time, against a wide range of data distributions, and proved to be very good, not to mention being very simple. That approach would encompass both groups more and less common than the base frequency, because it relies entirely on the group appearing enough times in the sample to infer that any errors on the resulting estimates will be reasonably well controlled. It wouldn't actually look at the base frequency at all in deciding which items to keep. Moreover, if the group appears sufficiently often in the sample to justify being kept, each of the individual column values must also appear at least that often as well, which means that the errors on the base frequency estimate are also well controlled. That was one of my concerns about other algorithms such as "keep items significantly more or less common than the base frequency" -- in the less common case, there's no lower bound on the number of occurrences seen, and so no guarantee that the errors are kept under control. Regards, Dean [1] https://www.postgresql.org/message-id/flat/CAMkU%3D1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO%2Ba%3D4DVkzpuQ2tw%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAEZATCUEmHCZeOHJN8JO5O9LK_VuFeCecy_AxTk7S_2SmLXeyw%40mail.gmail.com
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 1/10/19 4:20 PM, Dean Rasheed wrote: > On Wed, 9 Jan 2019 at 15:40, Tomas Vondra > wrote: >> On 1/8/19 3:18 PM, Dean Rasheed wrote: >>> So actually, the estimate for a group of values will be either the MCV >>> item's frequency (if the MCV item is kept), or (roughly) the MCV >>> item's base_frequency (if the MCV item is not kept). That suggests >>> that we should simply keep items that are significantly more or less >>> common than the item's base frequency -- i.e., keep rule (b) and ditch >>> rule (a). >>> >> >> Hmmm, but won't that interfere with how we with how we extrapolate the >> MCV estimate to the non-MCV part? Currently the patch does what you >> proposed, i.e. >> >> other_sel = simple_sel - mcv_basesel; >> >> I'm worried that if we only include the items that are significantly >> more or less common than the base frequency, it may skew the other_sel >> estimate. >> > > I don't see how that would skew other_sel. Items close to the base > frequency would also tend to be close to simple_sel, making other_sel > approximately zero, so excluding them should have little effect. Oh, I see. You're right those items should contribute very little to other_sel, I should have realized that. > However... > > Re-reading the thread where we enhanced the per-column MCV stats last > year [1], it was actually the case that an algorithm based on just > looking at the relative standard error worked pretty well for a very > wide range of data distributions. > > The final algorithm chosen in analyze_mcv_list() was only a marginal > improvement on that, and was directly based upon the fact that, in the > univariate statistics case, all the values not included in the MCV > list are assigned the same selectivity. However, that's not the case > for multivariate stats, because each group not included in the > multivariate MCV list gets assigned a different selectivity based on > its per-column stats. > > So perhaps what we should do for multivariate stats is simply use the > relative standard error approach (i.e., reuse the patch in [2] with a > 20% RSE cutoff). That had a lot of testing at the time, against a wide > range of data distributions, and proved to be very good, not to > mention being very simple. > > That approach would encompass both groups more and less common than > the base frequency, because it relies entirely on the group appearing > enough times in the sample to infer that any errors on the resulting > estimates will be reasonably well controlled. It wouldn't actually > look at the base frequency at all in deciding which items to keep. > > Moreover, if the group appears sufficiently often in the sample to > justify being kept, each of the individual column values must also > appear at least that often as well, which means that the errors on the > base frequency estimate are also well controlled. That was one of my > concerns about other algorithms such as "keep items significantly more > or less common than the base frequency" -- in the less common case, > there's no lower bound on the number of occurrences seen, and so no > guarantee that the errors are kept under control. > Yep, that looks like a great approach. Simple and tested. I'll try tweaking the patch accordingly over the weekend. Thanks! -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Policy on cross-posting to multiple lists
Dean Rasheed writes: > Has the policy on cross-posting to multiple lists been hardened recently? Not that I've heard of. > The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's > message on 8 Jan with an initial proposed patch and my response later > that day both CC'ed -hackers and seem to have been rejected, and so > are missing from the archives. I've done the same recently, without problems. I'd suggest inquiring on the pgsql-www list; project infrastructure issues are not really on-topic here. regards, tom lane
Re: Policy on cross-posting to multiple lists
On Fri, Jan 11, 2019 at 12:58 AM Tom Lane wrote: > Dean Rasheed writes: > > The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's > > message on 8 Jan with an initial proposed patch and my response later > > that day both CC'ed -hackers and seem to have been rejected, and so > > are missing from the archives. > > I've done the same recently, without problems. I'd suggest inquiring > on the pgsql-www list; project infrastructure issues are not really > on-topic here. Fwiw, an email I sent yesterday was rejected similarly because I'd tried to send it to both pgsql-hackers and pgsql-performance. I mentioned about that when I resent the same email successfully [1] after dropping pgsql-performance from the list of recipients. Thanks, Amit [1] https://www.postgresql.org/message-id/96720c99-ffa0-01ad-c594-0504c8eda708%40lab.ntt.co.jp
Re: [HACKERS] pgbench - allow to store select results into variables
On 2019-Jan-10, Fabien COELHO wrote: > However, I switched "pg_free" to "termPQExpBuffer", which seems more > appropriate, even if it just does the same thing. I also ensured that > prefixes are allocated & freed. I've commented about expr which is not > freed. Oops, of course, thanks. > I'm not keen on having the command array size checked and updated *after* > the command is appended, even if the initial allocation ensures that there > is no overflow, but I let it as is. It was already done that way, only it was done in two places rather than one. I just refactored it. (In fairness, I think the assignment of the new command to the array could also be done in one place instead of two, but it seems slightly clearer like this.) > Attached a v29 with the above minor changes wrt your version. Thanks, pushed. I fixed a couple of very minor issues in the docs. Now let's see how the buildfarm likes this ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Wed, 26 Dec 2018 at 22:09, Tomas Vondra wrote: > > Attached is an updated version of the patch - rebased and fixing the > warnings reported by Thomas Munro. > Here are a few random review comments based on what I've read so far: On the CREATE STATISTICS doc page, the syntax in the new examples added to the bottom of the page is incorrect. E.g., instead of CREATE STATISTICS s2 WITH (mcv) ON (a, b) FROM t2; it should read CREATE STATISTICS s2 (mcv) ON a, b FROM t2; I think perhaps there should be also be a short explanatory sentence after each example (as in the previous one) just to explain what the example is intended to demonstrate. E.g., for the new MCV example, perhaps say These statistics give the planner more detailed information about the specific values that commonly appear in the table, as well as an upper bound on the selectivities of combinations of values that do not appear in the table, allowing it to generate better estimates in both cases. I don't think there's a need for too much detail there, since it's explained more fully elsewhere, but it feels like it needs a little more just to explain the purpose of the example. There is additional documentation in perform.sgml that needs updating -- about what kinds of stats the planner keeps. Those docs are actually quite similar to the ones on planstats.sgml. It seems the former focus more one what stats the planner stores, while the latter focus on how the planner uses those stats. In func.sgml, the docs for pg_mcv_list_items need extending to include the base frequency column. Similarly for the example query in planstats.sgml. Tab-completion for the CREATE STATISTICS statement should be extended for the new kinds. Looking at mcv_update_match_bitmap(), it's called 3 times (twice recursively from within itself), and I think the pattern for calling it is a bit messy. E.g., /* by default none of the MCV items matches the clauses */ bool_matches = palloc0(sizeof(char) * mcvlist->nitems); if (or_clause(clause)) { /* OR clauses assume nothing matches, initially */ memset(bool_matches, STATS_MATCH_NONE, sizeof(char) * mcvlist->nitems); } else { /* AND clauses assume everything matches, initially */ memset(bool_matches, STATS_MATCH_FULL, sizeof(char) * mcvlist->nitems); } /* build the match bitmap for the OR-clauses */ mcv_update_match_bitmap(root, bool_clauses, keys, mcvlist, bool_matches, or_clause(clause)); the comment for the AND case directly contradicts the initial comment, and the final comment is wrong because it could be and AND clause. For a NOT clause it does: /* by default none of the MCV items matches the clauses */ not_matches = palloc0(sizeof(char) * mcvlist->nitems); /* NOT clauses assume nothing matches, initially */ memset(not_matches, STATS_MATCH_FULL, sizeof(char) * mcvlist->nitems); /* build the match bitmap for the NOT-clause */ mcv_update_match_bitmap(root, not_args, keys, mcvlist, not_matches, false); so the second comment is wrong. I understand the evolution that lead to this function existing in this form, but I think that it can now be refactored into a "getter" function rather than an "update" function. I.e., something like mcv_get_match_bitmap() which first allocates the array to be returned and initialises it based on the passed-in value of is_or. That way, all the calling sites can be simplified to one-liners like /* get the match bitmap for the AND/OR clause */ bool_matches = mcv_get_match_bitmap(root, bool_clauses, keys, mcvlist, or_clause(clause)); In the previous discussion around UpdateStatisticsForTypeChange(), the consensus appeared to be that we should just unconditionally drop all extended statistics when ALTER TABLE changes the type of an included column (just as we do for per-column stats), since such a type change can rewrite the data in arbitrary ways, so there's no reason to assume that the old stats are still valid. I think it makes sense to extract that as a separate patch to be committed ahead of these ones, and I'd also argue for back-patching it. That's it for now. I'll try to keep reviewing if time permits. Regards, Dean
Re: speeding up planning with partitions
On 2019-Jan-10, Amit Langote wrote: > Here's v12, which is more or less same as v11 but with the order of > patches switched so that the code movement patch is first. Note that the > attached v12-0001 contains no functional changes (but there's tiny note in > the commit message mentioning the addition of a tiny function which is > just old code). Pushed 0001 with some minor tweaks, thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] pgbench - allow to store select results into variables
Alvaro Herrera writes: > Now let's see how the buildfarm likes this ... This \cset thing seem like an incredibly badly thought out kluge. What is its excuse to live? regards, tom lane
Re: [HACKERS] pgbench - allow to store select results into variables
On 2019-Jan-10, Tom Lane wrote: > Alvaro Herrera writes: > > Now let's see how the buildfarm likes this ... > > This \cset thing seem like an incredibly badly thought out kluge. > What is its excuse to live? The reason is that you can set variables from several queries in one network trip. We can take it out I guess, but my impression was that we already pretty much had a consensus that it was wanted. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] pgbench - allow to store select results into variables
Alvaro Herrera writes: > On 2019-Jan-10, Tom Lane wrote: >> This \cset thing seem like an incredibly badly thought out kluge. >> What is its excuse to live? > The reason is that you can set variables from several queries in one > network trip. So who needs that? Just merge the queries, if it's so important that you avoid multiple round trips. > We can take it out I guess, but my impression was that we already pretty > much had a consensus that it was wanted. Maybe if the implementation weren't a pile of junk it'd be all right, but as-is this is a mess. The dependency on counting \; in particular is setting me off, because that has little if anything to do with the number of query results to be expected. I imagine the argument will be that nobody would write the sort of queries that break that assumption in a pgbench script; but I don't find that kind of design to be up to project standards, especially not when the argument for the feature is tissue-thin in the first place. regards, tom lane
Re: [HACKERS] pgbench - allow to store select results into variables
On 2019-Jan-10, Tom Lane wrote: > Alvaro Herrera writes: > > On 2019-Jan-10, Tom Lane wrote: > >> This \cset thing seem like an incredibly badly thought out kluge. > >> What is its excuse to live? > > > The reason is that you can set variables from several queries in one > > network trip. > > So who needs that? Just merge the queries, if it's so important that > you avoid multiple round trips. Hmm, I suppose that's true. > > We can take it out I guess, but my impression was that we already pretty > > much had a consensus that it was wanted. > > Maybe if the implementation weren't a pile of junk it'd be all right, > but as-is this is a mess. The dependency on counting \; in particular > is setting me off, because that has little if anything to do with the > number of query results to be expected. I imagine the argument will > be that nobody would write the sort of queries that break that assumption > in a pgbench script; but I don't find that kind of design to be up > to project standards, especially not when the argument for the feature > is tissue-thin in the first place. There's a lot of the new code in pgbench that can be simplified if we remove \cset. I'll leave time for others to argue for or against cset, and then act accordingly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Commitfest delayed: extend it?
Folks, We're 10 days into the Commitfest, the first few having been the new year, with people maybe paying attention to other things. I'd like to propose extending this CF by some period, maybe as long as ten days, so people get all the opportunities they might have had if it had started on time. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Commitfest delayed: extend it?
David Fetter writes: > We're 10 days into the Commitfest, the first few having been the new > year, with people maybe paying attention to other things. > I'd like to propose extending this CF by some period, maybe as long > as ten days, so people get all the opportunities they might have had > if it had started on time. I think it *did* start on time, at least people have been acting like it was on. It just wasn't very official. regards, tom lane
Re: [Proposal] Add accumulated statistics
On Thu, Dec 20, 2018 at 8:48 PM Yotsunaga, Naoki wrote: > If so, is not that the number of wait events is useful information? My theory is that the number of wait events is NOT useful information, or at least not nearly as useful the results of a sampling approach. The data that LWLOCK_STATS produce are downright misleading -- they lead you to think that the bottlenecks are in different places than they really are, because the locks that produce the most waiting can be 5th or 10th in terms of the number of wait events. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Commitfest delayed: extend it?
On 2019-Jan-10, Tom Lane wrote: > David Fetter writes: > > We're 10 days into the Commitfest, the first few having been the new > > year, with people maybe paying attention to other things. > > I'd like to propose extending this CF by some period, maybe as long > > as ten days, so people get all the opportunities they might have had > > if it had started on time. > > I think it *did* start on time, at least people have been acting like > it was on. It just wasn't very official. It has definitely started, at least for me :-) We're going to have a bit of a triage session in the FOSDEM dev's meeting, on Jan 31st, right at the end. I think that will be a good opportunity to give some final cleanup, and we should close it then or shortly thereafter. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Improve selectivity estimate for range queries
On Fri, Dec 21, 2018 at 11:50 AM Tom Lane wrote: > A smaller-footprint way to fix the immediate problem might be to > change the values of DEFAULT_INEQ_SEL and friends so that they're > even less likely to be matched by accident. That is, instead of > 0., use 0.342 or some such. It might > seem that that's just moving the problem around, but I think it > might be possible to show that such a value couldn't be computed > by scalarltsel given a histogram with no more than 1 members. > (I haven't tried to actually prove that, but it seems intuitive > that the set of possible results would be quantized with no more > than about 5 digits precision. That's not a dumb idea, but it seems pretty unprincipled to me, and to be honest I'm kind of surprised that you're not proposing something cleaner. If you admit the need to distinguish between real estimates and last-ditch ones, why shouldn't we have an explicit representation of that rather than trying to pick a last-ditch value that is unlikely to be confused with a real one? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: add_partial_path() may remove dominated path but still in use
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai wrote: > So, is it sufficient if set_rel_pathlist_hook is just relocated in > front of the generate_gather_paths? > If we have no use case for the second hook, here is little necessity > to have the post_rel_pathlist_hook() here. > (At least, PG-Strom will use the first hook only.) +1. That seems like the best way to be consistent with the principle that we need to have all the partial paths before generating any Gather paths. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: NOTIFY and pg_notify performance when deduplicating notifications
Julien Demoor writes: > [ postgres-notify-all-v8.patch ] I took a quick look through this. A few comments: * I find the proposed syntax extension for NOTIFY to be fairly bizarre; it's unlike the way that we handle options for any other utility statement. It's also non-orthogonal, since you can't specify a collapse mode without giving a payload string. I think possibly a better answer is the way that we've been adding optional parameters to VACUUM and suchlike recently: NOTIFY [ (collapse = off/on) ] channel [ , payload ] This'd be more extensible if we ever find a need for other options, too. * I'm also unimpressed with the idea of having a "maybe" collapse mode. What's that supposed to do? It doesn't appear to be different from "always", so why not just reduce this to a boolean collapse-or-not flag? * The documentation doesn't agree with the code, since it fails to mention "always" mode. * I was kind of disappointed to find that the patch doesn't really do anything to fix the performance problem for duplicate notifies. The thread title had led me to hope for more ;-). I wonder if we couldn't do something involving hashing. OTOH, it's certainly arguable that that would be an independent patch, and that this one should be seen as a feature patch ("make NOTIFY's behavior for duplicate notifies more flexible and more clearly specified") rather than a performance patch. regards, tom lane
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Tom, So who needs that? Just merge the queries, if it's so important that you avoid multiple round trips. Pgbench is about testing (measuring) performance in various settings and realistic scenarii: queries prepared or not, possible combined, and so on. As postgres allows to send several queries into one message, I think it interesting to be able to test the performance impact of doing that (the answer is very significant, esp wrt lantency), and it should not preclude to get the results out as a client application could do. Queries can be "merged", but ISTM syntax is not especially friendly (UNION, SELECT of SELECT, CROSS JOIN not sure which one you had in mind...) nor reprensentative of what a client application would really do, and it would mess with readability, message size, planning and so. Also, compound queries need to return all one row, but this constraint is neeeded for variable. We can take it out I guess, but my impression was that we already pretty much had a consensus that it was wanted. Maybe if the implementation weren't a pile of junk it'd be all right, but as-is this is a mess. Thanks:-) The dependency on counting \; in particular is setting me off, because that has little if anything to do with the number of query results to be expected. The kludge is because there is a kludge (aka optimizations:-) server side to silently ignore empty queries. On "SELECT 1 \; /**/ \; SELECT 2 ;" the server sends two results back instead of 3, whereas it should logically return an empty result on the empty query. Now pgbench could detect that there is an empty query (possibly skipping comments and so), an early version of the patch did that AFAICR, but the code did not seem worth it, it seemed cleaner to just document the restriction, so it was removed. I imagine the argument will be that nobody would write the sort of queries that break that assumption in a pgbench script; Detecting empty queries is possible, although the code for doing that is kind of ugly and would look bad in the lexer, on which it seemed desirable to have minimum changes. but I don't find that kind of design to be up to project standards, especially not when the argument for the feature is tissue-thin in the first place. The "first place" is to be able to implement more realistic scenarii, and to have getting results into variables orthogonal to combined queries. Although I'm not specially thrilled by the resulting syntax, the point is to provide a feature pertinent to performance testing, not to have a incredibly well designed syntax. It just goes on with the existing backslash approach used by psql & pgbench, which has the significant advantage that it is mostly SQL with a few things around. -- Fabien.
Re: MERGE SQL statement for PG12
On Thu, Jan 3, 2019 at 2:11 AM Pavan Deolasee wrote: > On Tue, Nov 27, 2018 at 4:48 PM Pavan Deolasee > wrote: >> In the meanwhile, I am posting a rebased version. > > Another rebase on the current master. I feel like there has been some other thread where this was discussed, but I can't find it right now. I think that the "query construction" logic in transformMergeStmt is fundamentally the wrong way to do this. I think several others have said the same. And I don't think this should be considered for commit until that is rewritten. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] pgbench - allow to store select results into variables
Hello Alvaro, There's a lot of the new code in pgbench that can be simplified if we remove \cset. I'm not very happy with the resulting syntax, but IMO the feature is useful. My initial design was to copy PL/pgSQL "into" with some "\into" orthogonal to \; and ;, but the implementation was not especially nice and I was told to use psql's \gset approach, which I did. If we do not provide \cset, then combined queries and getting results are not orthogonal, although from a performance testing point of view an application could do both, and the point is to allow pgbench to test the performance impact of doing that. There are other existing restrictions which are a arbitrary, eg you cannot use prepared with combined. I wish not to add more of this kind of restrictions, which are not "up to project standard" in my opinion. I may try to remove this particular restriction in the future. Not many people know that queries can be combined, but if you are interested in latency that is really an interesting option, and being able to check how much can be gained from doing that is a point of a tool like pgbench. -- Fabien.
Re: speeding up planning with partitions
On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera wrote: > Pushed 0001 with some minor tweaks, thanks. Thanks for pushing. I had looked at 0001 last night and there wasn't much to report, just: v12 0001 1. I see you've moved translate_col_privs() out of prepunion.c into appendinfo.c. This required making it an external function, but it's only use is in inherit.c, so would it not be better to put it there and keep it static? 2. The following two lines I think need to swap their order. +#include "utils/rel.h" +#include "utils/lsyscache.h" Both are pretty minor details but thought I'd post them anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Improve selectivity estimate for range queries
Robert Haas writes: > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane wrote: >> A smaller-footprint way to fix the immediate problem might be to >> change the values of DEFAULT_INEQ_SEL and friends so that they're >> even less likely to be matched by accident. That is, instead of >> 0., use 0.342 or some such. > That's not a dumb idea, but it seems pretty unprincipled to me, and to > be honest I'm kind of surprised that you're not proposing something > cleaner. The problem is the invasiveness of such a change (large) vs the benefit (not so large). The upthread patch attempted to add a separate signaling path, but it was very incomplete --- and yet both I and Horiguchi-san thought it was already too messy. Maybe at some point we'll go over to something reasonably principled, like adding confidence intervals to all selectivity estimates. That would be *really* invasive but perhaps would bring enough benefit to justify itself. But the current patch is just attempting to fix one extremely narrow pain point, and if that is all it's doing it should have a commensurately small footprint. So I don't think the submitted patch looks good from a cost/benefit standpoint. regards, tom lane
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila wrote: > Thanks, Mithun for performance testing, it really helps us to choose > the right strategy here. Once John provides next version, it would be > good to see the results of regular pgbench (read-write) runs (say at > 50 and 300 scale factor) and the results of large copy. I don't think > there will be any problem, but we should just double check that. Attached is v12 using the alternating-page strategy. I've updated the comments and README as needed. In addition, I've -handled a possible stat() call failure during pg_upgrade -added one more assertion -moved the new README material into a separate paragraph -added a comment to FSMClearLocalMap() about transaction abort -corrected an outdated comment that erroneously referred to extension rather than creation -fleshed out the draft commit messages From 854ba27740ed26e0d3e89f6228186f2b0914b21e Mon Sep 17 00:00:00 2001 From: John Naylor Date: Thu, 10 Jan 2019 16:37:57 -0500 Subject: [PATCH v12 1/2] Avoid creation of the free space map for small heap relations. Previously, all heaps had FSMs. For very small tables, this means that the FSM took up more space than the heap did. This is wasteful, so now we refrain from creating the FSM for heaps with 4 pages or fewer. If the last known target block has insufficient space, we still try to insert into some other page before before giving up and extending the relation, since doing otherwise leads to table bloat. Testing showed that trying every page penalized performance slightly, so we compromise and try every other page. This way, we visit at most two pages. Any pages with wasted free space become visible at next relation extension, so we still control table bloat. As a bonus, directly attempting one or two pages can even be faster than consulting the FSM would have been. If a heap with a FSM is truncated back to below the threshold, the FSM stays around and can be used as usual. --- contrib/pageinspect/expected/page.out | 77 +++ contrib/pageinspect/sql/page.sql | 33 +-- doc/src/sgml/storage.sgml | 13 +- src/backend/access/brin/brin.c| 2 +- src/backend/access/brin/brin_pageops.c| 10 +- src/backend/access/heap/hio.c | 47 ++-- src/backend/access/transam/xact.c | 14 ++ src/backend/commands/vacuumlazy.c | 17 +- src/backend/storage/freespace/README | 13 +- src/backend/storage/freespace/freespace.c | 262 +- src/backend/storage/freespace/indexfsm.c | 6 +- src/include/storage/freespace.h | 9 +- src/test/regress/expected/fsm.out | 62 + src/test/regress/parallel_schedule| 6 + src/test/regress/serial_schedule | 1 + src/test/regress/sql/fsm.sql | 46 16 files changed, 516 insertions(+), 102 deletions(-) create mode 100644 src/test/regress/expected/fsm.out create mode 100644 src/test/regress/sql/fsm.sql diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out index 3fcd9fbe6d..83e5910453 100644 --- a/contrib/pageinspect/expected/page.out +++ b/contrib/pageinspect/expected/page.out @@ -1,48 +1,69 @@ CREATE EXTENSION pageinspect; -CREATE TABLE test1 (a int, b int); -INSERT INTO test1 VALUES (16777217, 131584); -VACUUM test1; -- set up FSM +CREATE TABLE test_rel_forks (a int); +-- Make sure there are enough blocks in the heap for the FSM to be created. +INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g; +-- set up FSM and VM +VACUUM test_rel_forks; -- The page contents can vary, so just test that it can be read -- successfully, but don't keep the output. -SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0; main_0 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1; -ERROR: block number 1 is out of range for relation "test1" -SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100; +ERROR: block number 100 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0; fsm_0 --- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1; - fsm_1 - 8192 -(1 row) - -SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0; +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10; +ERROR: block number 10 is out of range for relation "test_rel_forks" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0; vm_0 -- 8192 (1 row) -SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1; -ERROR: block number 1 is out of range for relation "test1" +SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1; +ERROR: block number 1 is out of range for relation
Re: unnecessary creation of FSM files during bootstrap mode
Thought I'd ping... (sorry for the top post) On Sat, Dec 15, 2018 at 12:02 AM Amit Kapila wrote: > > As pointed out by John Naylor [1], it seems during bootstrap mode, we > are always creating FSM files which are not required. In commit's > b9d01fe288 and 3908473c80, we have added some code where we allowed > the creation of files during mdopen even if they didn't exist during > the bootstrap mode. The comments in the code say: "During bootstrap, > there are cases where a system relation will be accessed (by internal > backend processes) before the bootstrap script nominally creates it." > I am sure this will be the case when that code is added but is it > required today? While going through commit 3908473c80, I came across > below comment: > > - * During bootstrap processing, we skip that check, because pg_time, > - * pg_variable, and pg_log get created before their .bki file entries > - * are processed. > - */ > + fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, 0600); > > The system tables mentioned in above commit are not present today, so > do we really need that code and even if it is required shall we do it > only for 'main' or 'init' forks? > > Tom, as you are a committer of the commits b9d01fe288 and 3908473c80, > do you remember anything in this regard? > > > [1] - > https://www.postgresql.org/message-id/CAJVSVGVtf%2B-2sQGVyEZJQh15dpVicpFA6BBiPLugaD4oBEaiHg%40mail.gmail.com > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: [PATCH] kNN for btree
Hi! I've couple more notes regarding this patch. 1) There are two loops over scan key determining scan strategy: existing loop in _bt_first(), and in new function _bt_select_knn_search_strategy(). It's kind of redundant that we've to process scan keys twice for knn searches. I think scan keys processing should be merged into one loop. 2) We're not supporting knn ordering only using the first key. Supporting multiple knn keys would require significant reword of B-tree traversal algorithm making it closer to GiST and SP-GiST. So, I think supporting only one knn key is reasonable restriction, at least for now. But we could support single-key knn ordering in more cases. For instance, knn search in "SELECT * FROM tbl WHERE a = const1 ORDER BY b <-> const2" could benefit from (a, b) B-tree index. So, we can support knn search on n-th key if there are equality scan keys for [1, n-1] index keys. I've already discussed these issues with Nikita personally. Hopefully, new version will be published soon. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: MERGE SQL statement for PG12
On Thu, Jan 10, 2019 at 1:15 PM Robert Haas wrote: > I feel like there has been some other thread where this was discussed, > but I can't find it right now. I think that the "query construction" > logic in transformMergeStmt is fundamentally the wrong way to do this. > I think several others have said the same. And I don't think this > should be considered for commit until that is rewritten. I agree with that. I think that it's worth acknowledging that Pavan is in a difficult position with this patch. As things stand, Pavan really needs input from a couple of people to put the query construction stuff on the right path, and that input has not been forthcoming. I'm not suggesting that anybody has failed to meet an obligation to Pavan to put time in here, or that anybody has suggested that this is down to a failure on Pavan's part. I'm merely pointing out that Pavan is in an unenviable position with this patch, through no fault of his own, and despite a worthy effort. I hope that he sticks it out, because that seems to be what it takes to see something like this through. I don't know how to do better at that. -- Peter Geoghegan
Re: unnecessary creation of FSM files during bootstrap mode
John Naylor writes: > Thought I'd ping... Sorry, I'd not been paying attention to this thread. > On Sat, Dec 15, 2018 at 12:02 AM Amit Kapila wrote: >> As pointed out by John Naylor [1], it seems during bootstrap mode, we >> are always creating FSM files which are not required. In commit's >> b9d01fe288 and 3908473c80, we have added some code where we allowed >> the creation of files during mdopen even if they didn't exist during >> the bootstrap mode. The comments in the code say: "During bootstrap, >> there are cases where a system relation will be accessed (by internal >> backend processes) before the bootstrap script nominally creates it." >> I am sure this will be the case when that code is added but is it >> required today? I'm surprised to hear that it isn't, but if we get through initdb then it must not be. The special case was certainly still necessary as of 3908473c80, for the BKI_BOOTSTRAP catalogs. It didn't bother me at the time that those were special --- how are you going to create pg_class, in particular, without cheating like mad? But I guess we must have cleaned up something in higher levels of bootstrapping to the point where those do get mdcreate'd in advance of being opened. It's also possible that you just aren't exercising the cases where trouble occurs. In particular, noting this bit in InsertOneValue(): /* * We use ereport not elog here so that parameters aren't evaluated unless * the message is going to be printed, which generally it isn't */ ereport(DEBUG4, (errmsg_internal("inserted -> %s", OidOutputFunctionCall(typoutput, values[i]; I'd counsel running initdb under DEBUG4 or higher before deciding you're out of the woods. (If that does turn out to be a problem, and it's the only problem, we could discuss whether we really need that debug message at all.) regards, tom lane
Remove all "INTERFACE ROUTINES" style comments
Hi, A number of postgres files have sections like heapam's * INTERFACE ROUTINES * relation_open - open any relation by relation OID * relation_openrv - open any relation specified by a RangeVar * relation_close - close any relation * heap_open - open a heap relation by relation OID * heap_openrv - open a heap relation specified by a RangeVar * heap_close - (now just a macro for relation_close) * heap_beginscan - begin relation scan * heap_rescan - restart a relation scan * heap_endscan- end relation scan * heap_getnext- retrieve next tuple in scan * heap_fetch - retrieve tuple with given tid * heap_insert - insert tuple into a relation * heap_multi_insert - insert multiple tuples into a relation * heap_delete - delete a tuple from a relation * heap_update - replace a tuple in a relation with another tuple * heap_sync - sync heap, for when no WAL has been written They're often out-of-date, and I personally never found them to be useful. A few people, including yours truly, have been removing a few here and there when overhauling a subsystem to avoid having to update and then adjust them. I think it might be a good idea to just do that for all at once. Having to consider separately committing a removal, updating them without fixing preexisting issues, or just leaving them outdated on a regular basis imo is a usless distraction. Comments? Greetings, Andres Freund
Re: What to name the current heap after pluggable storage / what to rename?
Hi, On 2018-12-19 14:21:29 -0500, Robert Haas wrote: > On Tue, Dec 18, 2018 at 11:17 PM Andres Freund wrote: > > Another would be to be aggressive in renaming, and deconflict by > > renaming functions like heap_create[_with_catalog] etc to sound more > > accurate. I think that has some appeal, because a lot of those names > > aren't describing their tasks particularly well. > > I like that option. I'd like to start doing that by moving the functions in the following heapam.h block elsewhere: /* in heap/heapam.c */ extern Relation relation_open(Oid relationId, LOCKMODE lockmode); extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode); extern Relation relation_openrv(const RangeVar *relation, LOCKMODE lockmode); extern Relation relation_openrv_extended(const RangeVar *relation, LOCKMODE lockmode, bool missing_ok); extern void relation_close(Relation relation, LOCKMODE lockmode); extern Relation heap_open(Oid relationId, LOCKMODE lockmode); extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode); extern Relation heap_openrv_extended(const RangeVar *relation, LOCKMODE lockmode, bool missing_ok); ISTM that the first block would best belong into new files like access/rel[ation].h and access/common/rel[ation].h. I think the second set should be renamed to be table_open() (with backward compat redirects, there's way way too many references) and should go into access/table.h access/table/table.c alongside tableam.[ch], but I could also see just putting them into relation.[ch]. Comments? Greetings, Andres Freund
Re: Remove all "INTERFACE ROUTINES" style comments
On Fri, Jan 11, 2019 at 12:58 PM Andres Freund wrote: > A number of postgres files have sections like heapam's > > * INTERFACE ROUTINES > * relation_open - open any relation by relation OID > * relation_openrv - open any relation specified by a RangeVar > * relation_close - close any relation > * heap_open - open a heap relation by relation OID > * heap_openrv - open a heap relation specified by a RangeVar > * heap_close - (now just a macro for relation_close) > * heap_beginscan - begin relation scan > * heap_rescan - restart a relation scan > * heap_endscan- end relation scan > * heap_getnext- retrieve next tuple in scan > * heap_fetch - retrieve tuple with given tid > * heap_insert - insert tuple into a relation > * heap_multi_insert - insert multiple tuples into a relation > * heap_delete - delete a tuple from a relation > * heap_update - replace a tuple in a relation with another tuple > * heap_sync - sync heap, for when no WAL has been written > > They're often out-of-date, and I personally never found them to be > useful. A few people, including yours truly, have been removing a few > here and there when overhauling a subsystem to avoid having to update > and then adjust them. > > I think it might be a good idea to just do that for all at once. Having > to consider separately committing a removal, updating them without > fixing preexisting issues, or just leaving them outdated on a regular > basis imo is a usless distraction. > > Comments? +1 -- Thomas Munro http://www.enterprisedb.com
RE: [Proposal] Add accumulated statistics
On Thu, Jan 10, 2019 at 8:42 PM, Robert Hass wrote: Thanks for comments. >or at least not nearly as useful the results of a sampling approach. I agree with your opinion. Because it can't be asserted that the wait event is a bottleneck just because the number of wait event is large. The same thing is mentioned in Oracle. It also suggests that it is important to acquire waiting time for that. https://docs.oracle.com/en/database/oracle/oracle-database/18/tgdba/measuring-database-performance.html#GUID-811E9E65-C64A-4028-A90E-102BBFF6E68F 5.2.3 Using Wait Events without Timed Statistics >The data that LWLOCK_STATS produce are downright misleading Is that so? I do not know the need for this function. --- Naoki Yotsunaga
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote: > Thank you, patch looks good for me. Thanks Sergei for the review. I have been looking at the patch again this morning with a fresh set of eyes and the thing looks in a committable shape (the GUC values for NULL checks are a bit inconsistent in the last patch by the way, so I have fixed them locally). I'll do an extra pass on it in the next couple of days and commit. Then let's move on with the reloadable portions. -- Michael signature.asc Description: PGP signature
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
Hi, On 2019-01-11 09:34:28 +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote: > > Thank you, patch looks good for me. > > Thanks Sergei for the review. I have been looking at the patch again > this morning with a fresh set of eyes and the thing looks in a > committable shape (the GUC values for NULL checks are a bit > inconsistent in the last patch by the way, so I have fixed them > locally). > > I'll do an extra pass on it in the next couple of days and commit. > Then let's move on with the reloadable portions. I still think this whole direction of accessing the GUC in walreceiver is a bad idea and shouldn't be pursued further. There's definite potential for startup process and WAL receiver having different states of GUCs, the code doesn't get meaningfully simpler, the GUC value checks in walreceiver make for horrible reporting up the chain. Greetings, Andres Freund
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote: > I still think this whole direction of accessing the GUC in walreceiver > is a bad idea and shouldn't be pursued further. There's definite > potential for startup process and WAL receiver having different states > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > in walreceiver make for horrible reporting up the chain. Did you notice the set of messages from upthread? The code *gets* simpler by removing ready_to_display and the need to manipulate the non-clobbered connection string sent directly from the startup process. In my opinion that's a clear gain. We gain also the possibility to track down that a WAL receiver is started but not connected yet for monitoring tools. -- Michael signature.asc Description: PGP signature
Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT
On Sat, Dec 29, 2018 at 6:50 PM David Rowley wrote: > > Areas of extension: (given index `(a, b, c)`) include `a = 1 and b in > > (...) order by c` and `a in (...) and b = 1 order by c` (and further > > similar derivations with increasing numbers of equality quals). > > I don't quite understand this the above paragraph, but I assume this > would be possible to do with some new index am routine which allowed > multiple different values for the initial key. I'm confused about James' use of the term "new index am". I guess he just meant new support function or something? > > Note: Another (loosely) related problem is that sorting can't > > currently take advantage of cases where an index provides a partial > > (prefix of requested pathkeys) ordering. > > There has been a patch [2] around for about 4 years now that does > this. I'm unsure of the current status, other than not yet committed. > > [1] > https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873489%40BlueTreble.com > [2] > https://www.postgresql.org/message-id/flat/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etze...@mail.gmail.com I can see why you'd mention these two, but I also expected you to mention the skip scan project, since that involves pushing down knowledge about how the index is to be accessed to the index am (at least, I assume that it does), and skipping leading attributes to use the sort order from a suffix attribute. Actually, the partial sort idea that you linked to is more or less a dual of skip scan, at least to my mind (the former *extends* the sort order by adding a suffix tie-breaker, while the latter *skips* a leading attribute to get to an interesting suffix attribute). The way James constructed his example suggested that there'd be some kind of natural locality, that we'd expect to be able to take advantage of at execution time when the new strategy is favorable. I'm not sure if that was intended -- James? I think it might help James to construct a more obviously realistic/practical motivating example. I'm perfectly willing to believe that this idea would help his real world queries, and having an example that can easily be played with is helpful in other ways. But I'd like to know why this idea is important is in detail, since I think that it would help me to place it in the wider landscape of ideas that are like this. Placing it in that wider landscape, and figuring out next steps at a high level seem to be the problem right now. -- Peter Geoghegan
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On 2019-01-11 09:50:49 +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote: > > I still think this whole direction of accessing the GUC in walreceiver > > is a bad idea and shouldn't be pursued further. There's definite > > potential for startup process and WAL receiver having different states > > of GUCs, the code doesn't get meaningfully simpler, the GUC value checks > > in walreceiver make for horrible reporting up the chain. > > Did you notice the set of messages from upthread? The code *gets* > simpler by removing ready_to_display and the need to manipulate the > non-clobbered connection string sent directly from the startup > process. It's a minor simplification for code that's existed for quite a while. Not worth it.
Re: Commitfest delayed: extend it?
On Thu, Jan 10, 2019 at 05:44:34PM -0300, Alvaro Herrera wrote: > It has definitely started, at least for me :-) Yes, there is no point in extending or delaying it. > We're going to have a bit of a triage session in the FOSDEM dev's > meeting, on Jan 31st, right at the end. I think that will be a good > opportunity to give some final cleanup, and we should close it then or > shortly thereafter. A lot of folks are going to be there then (not me)? If it is possible to get the CF closed more or less on time using this method that would be nice. -- Michael signature.asc Description: PGP signature
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote: > It's a minor simplification for code that's existed for quite a > while. Not worth it. Meh, I disagree for the ready_to_display part as it has been around for a long time: commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d author: Alvaro Herrera date: Wed, 29 Jun 2016 16:57:17 -0400 Add conninfo to pg_stat_wal_receiver I was honestly unhappy from the start with it because there was no actual way to have the WAL receiver *not* save the original connection string. In my opinion, getting rid of it is worth it because this really simplifies the way the WAL receiver data visibility is handled at SQL level and we don't finish by using again and again the same field for different purposes, consolidating the code. For the reloading part, we also make the WAL receiver rely on the GUC context and stop it if there is a change in primary_conninfo and primary_slot_name. It would be inconsistent to rely on the GUC context for one thing, and the startup process for the other one. Adding a safeguard to fail WAL receiver startup is actually more of sanity check that anything else (that could be an assertion but for forks a failure is of better benefit). At this stage, I would like to hear more opinions before doing or not doing something. Alvaro, you got involved in the introduction of ready_to_siplay. Peter, you have committed the merge of the recovery params. Perhaps you have an opinion to share? -- Michael signature.asc Description: PGP signature
RE: [Proposal] Add accumulated statistics
From: Robert Haas [mailto:robertmh...@gmail.com] > My theory is that the number of wait events is NOT useful information, > or at least not nearly as useful the results of a sampling approach. > The data that LWLOCK_STATS produce are downright misleading -- they > lead you to think that the bottlenecks are in different places than > they really are, because the locks that produce the most waiting can > be 5th or 10th in terms of the number of wait events. I understood you're saying that the number of waits alone does not necessarily indicate the bottleneck, because a wait with fewer counts but longer time can take a large portion of the entire SQL execution time. So, wait time is also useful. I think that's why Oracle describes and MySQL provides precise count and time without sampling. Haven't LOCK_STATS been helpful for PG developers? IIRC, it was used to pinpoint the bottleneck and evaluate the patch to improve shared buffers, WAL buffers, ProcArray, etc. Regards Takayuki Tsunakawa
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Fri, Jan 11, 2019 at 10:09:04AM +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote: > > It's a minor simplification for code that's existed for quite a > > while. Not worth it. > > Meh, I disagree for the ready_to_display part as it has been around > for a long time: > commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d > author: Alvaro Herrera > date: Wed, 29 Jun 2016 16:57:17 -0400 > Add conninfo to pg_stat_wal_receiver This should read "ready_to_display has not been around for a long time" :) -- Michael signature.asc Description: PGP signature
Re: speeding up planning with partitions
On 2019/01/11 2:56, Alvaro Herrera wrote: > On 2019-Jan-10, Amit Langote wrote: > >> Here's v12, which is more or less same as v11 but with the order of >> patches switched so that the code movement patch is first. Note that the >> attached v12-0001 contains no functional changes (but there's tiny note in >> the commit message mentioning the addition of a tiny function which is >> just old code). > > Pushed 0001 with some minor tweaks, thanks. Thank you for the tweaks and committing. Regards, Amit
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
On 2019/01/10 19:27, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 05:58:10PM +0900, Amit Langote wrote: >> The reason I started this thread is due to this Stack Overflow question: >> >> https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11 >> >> So, it appears that there may be an element of surprise involved in >> encountering such an error (despite the documentation). > > Improving the user experience is definitely a good thing in my > opinion because the current error message can be confusing, so you > were right to start this thread. Still I don't agree that classifying > those relkinds as not supported is right either for consistency with > the code existing for two years and for the way the code is designed > to work as rows are replicated on a per-physically-defined relation > basis. Okay, I withdraw my objection to the wording proposed by you. Thanks, Amit
Re: Commitfest delayed: extend it?
On Fri, Jan 11, 2019 at 09:53:16AM +0900, Michael Paquier wrote: > On Thu, Jan 10, 2019 at 05:44:34PM -0300, Alvaro Herrera wrote: > > It has definitely started, at least for me :-) > > Yes, there is no point in extending or delaying it. > > > We're going to have a bit of a triage session in the FOSDEM dev's > > meeting, on Jan 31st, right at the end. I think that will be a > > good opportunity to give some final cleanup, and we should close > > it then or shortly thereafter. > > A lot of folks are going to be there then (not me)? If it is > possible to get the CF closed more or less on time using this method > that would be nice. Consensus having been reached, I'll aim for 1/31 or 2/1 at the latest. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: speeding up planning with partitions
Sorry, I hadn't read this email before sending my earlier "thank you for committing" email. On 2019/01/11 6:47, David Rowley wrote: > On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera wrote: >> Pushed 0001 with some minor tweaks, thanks. > > Thanks for pushing. I had looked at 0001 last night and there wasn't > much to report, just: > > v12 0001 > > 1. I see you've moved translate_col_privs() out of prepunion.c into > appendinfo.c. This required making it an external function, but it's > only use is in inherit.c, so would it not be better to put it there > and keep it static? Actually, I *was* a bit puzzled where to put it. I tend to agree with you now that it might be define it locally within inherit.c as it might not be needed elsewhere. Let's hear what Alvaro thinks. I'm attaching a patch anyway. > 2. The following two lines I think need to swap their order. > > +#include "utils/rel.h" > +#include "utils/lsyscache.h" Oops, fixed. > Both are pretty minor details but thought I'd post them anyway. Thank you for reporting. Attached find the patch. Regards, Amit diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index d48e3a09b3..2f23e7bf49 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -15,13 +15,12 @@ #include "postgres.h" #include "access/htup_details.h" -#include "access/sysattr.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" #include "parser/parsetree.h" -#include "utils/rel.h" #include "utils/lsyscache.h" +#include "utils/rel.h" #include "utils/syscache.h" @@ -167,58 +166,6 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, } /* - * translate_col_privs - * Translate a bitmapset representing per-column privileges from the - * parent rel's attribute numbering to the child's. - * - * The only surprise here is that we don't translate a parent whole-row - * reference into a child whole-row reference. That would mean requiring - * permissions on all child columns, which is overly strict, since the - * query is really only going to reference the inherited columns. Instead - * we set the per-column bits for all inherited columns. - */ -Bitmapset * -translate_col_privs(const Bitmapset *parent_privs, - List *translated_vars) -{ - Bitmapset *child_privs = NULL; - boolwhole_row; - int attno; - ListCell *lc; - - /* System attributes have the same numbers in all tables */ - for (attno = FirstLowInvalidHeapAttributeNumber + 1; attno < 0; attno++) - { - if (bms_is_member(attno - FirstLowInvalidHeapAttributeNumber, - parent_privs)) - child_privs = bms_add_member(child_privs, - attno - FirstLowInvalidHeapAttributeNumber); - } - - /* Check if parent has whole-row reference */ - whole_row = bms_is_member(InvalidAttrNumber - FirstLowInvalidHeapAttributeNumber, - parent_privs); - - /* And now translate the regular user attributes, using the vars list */ - attno = InvalidAttrNumber; - foreach(lc, translated_vars) - { - Var*var = lfirst_node(Var, lc); - - attno++; - if (var == NULL)/* ignore dropped columns */ - continue; - if (whole_row || - bms_is_member(attno - FirstLowInvalidHeapAttributeNumber, - parent_privs)) - child_privs = bms_add_member(child_privs, - var->varattno - FirstLowInvalidHeapAttributeNumber); - } - - return child_privs; -} - -/* * adjust_appendrel_attrs * Copy the specified query or expression and translate Vars referring to a * parent rel to refer to the corresponding child rel instead. We also diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c index 350e6afe27..db474acbc5 100644 --- a/src/backend/optimizer/util/inherit.c +++ b/src/backend/optimizer/util/inherit.c @@ -15,6 +15,7 @@ #include "postgres.h" #include "access/heapam.h" +#include "access/sysattr.h" #include "catalog/partition.h" #include "catalog/pg_inherits.h" #include "miscadmin.h" @@ -38,6 +39,8 @@ static void expand_single_inheritance_child(PlannerInfo *root, PlanRowMark *top_parentrc, Relation childrel, List **appinfos, RangeTblEntry **childrte_p,
Re: add_partial_path() may remove dominated path but still in use
2019年1月11日(金) 5:52 Robert Haas : > > On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai wrote: > > So, is it sufficient if set_rel_pathlist_hook is just relocated in > > front of the generate_gather_paths? > > If we have no use case for the second hook, here is little necessity > > to have the post_rel_pathlist_hook() here. > > (At least, PG-Strom will use the first hook only.) > > +1. That seems like the best way to be consistent with the principle > that we need to have all the partial paths before generating any > Gather paths. > Patch was updated, just for relocation of the set_rel_pathlist_hook. Please check it. Thanks, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch Description: Binary data pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch Description: Binary data
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
(2019/01/10 21:23), Amit Langote wrote: On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat wrote: Though this will solve a problem for performance when partition-wise join is not possible, we still have the same problem when partition-wise join is possible. And that problem really happens because our inheritance mechanism requires expression translation from parent to child everywhere. That consumes memory, eats CPU cycles and generally downgrades performance of partition related query planning. I think a better way would be to avoid these translations and use Parent var to represent a Var of the child being dealt with. That will be a massive churn on inheritance based planner code, but it will improve planning time for queries involving thousands of partitions. Yeah, it would be nice going forward to overhaul inheritance planning such that parent-to-child Var translation is not needed, especially where no pruning can occur or many partitions remain even after pruning. I agree on that point, but I think that's an improvement for a future release rather than a fix for the issue reported on this thread. Best regards, Etsuro Fujita
RE: Improve selectivity estimate for range queries
Hi, Thanks for the comments, and I'm sorry for the late reply. > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Friday, January 11, 2019 7:04 AM > > Robert Haas writes: > > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane wrote: > >> A smaller-footprint way to fix the immediate problem might be to > >> change the values of DEFAULT_INEQ_SEL and friends so that they're > >> even less likely to be matched by accident. That is, instead of > >> 0., use 0.342 or some such. > > > That's not a dumb idea, but it seems pretty unprincipled to me, and to > > be honest I'm kind of surprised that you're not proposing something > > cleaner. > > The problem is the invasiveness of such a change (large) vs the benefit (not > so large). The upthread > patch attempted to add a separate signaling path, but it was very incomplete > --- and yet both I and > Horiguchi-san thought it was already too messy. > > Maybe at some point we'll go over to something reasonably principled, like > adding confidence intervals > to all selectivity estimates. That would be *really* invasive but perhaps > would bring enough benefit > to justify itself. But the current patch is just attempting to fix one > extremely narrow pain point, > and if that is all it's doing it should have a commensurately small > footprint. So I don't think the > submitted patch looks good from a cost/benefit standpoint. > Yes, I agree with you. Indeed the patch I attached is insufficient in cost-effectiveness. However, I want to solve problems of that real estimates happened to equal to the default values such as this case, even though it's a narrow pain point. So I tried distinguishing explicitly between real estimates and otherwise as Robert said. The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm concerned whether any range queries really cannot match 0.342 (or some such) by accident in any environments. Is the way which Horiguchi-san did enough to prove that? Best regards, Yuzuko Hosoya NTT Open Source Software Center
Re: Ryu floating point output patch
Rebased this patch onto current master. No functional changes; just fixed up the copyright dates and a couple of stray missing UINT64CONSTs. Regression tests still fail because how to fix them depends on the issues below. Likewise docs are not changed yet for the same reason. Decisions that need to be made in order to proceed: 1. Should exact output become the default, or is it more important to preserve the historical output? I will argue very strongly that the default output should be exact. 2. How far do we need to go to support existing uses of extra_float_digits? For example, do we need a way for clients to request that they get the exact same output as previously for extra_float_digits=2 or 3, rather than just assuming that any round-trip-exact value will do? (this would likely mean adding a new GUC, rather than basing everything off extra_float_digits as the patch does now) 3. Do we need to do anything about how conversions from floats to numeric work? At present they _ignore_ extra_float_digits (presumably on immutability grounds) and convert using only DBL_DIG digits of precision. I have no very strong position on this but incline to keeping the existing behavior, though possibly it would be good to add functions to get the round-trip value and possibly even the true exact value. (It would be sort of nice if CAST(floatval AS numeric(400,18)) or similar could work as you'd expect, but currently we treat that as floatval::numeric::numeric(400,18) so the cast function doesn't know about the eventual typmod.) 4. Can we allow declaration-after-statement please? That would allow keeping this code significantly closer to its upstream. -- Andrew (irc:RhodiumToad) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 117ded8d1d..6c6f031b4a 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -21,6 +21,7 @@ #include "catalog/pg_type.h" #include "common/int.h" +#include "common/ryu.h" #include "libpq/pqformat.h" #include "miscadmin.h" #include "utils/array.h" @@ -31,7 +32,7 @@ /* Configurable GUC parameter */ -int extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */ +int extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */ /* Cached constants for degree-based trig functions */ static bool degree_consts_set = false; @@ -252,6 +253,12 @@ float4out(PG_FUNCTION_ARGS) char *ascii = (char *) palloc(32); int ndig = FLT_DIG + extra_float_digits; + if (extra_float_digits > 0) + { + ryu_f2s_buffered(num, ascii); + PG_RETURN_CSTRING(ascii); + } + (void) pg_strfromd(ascii, 32, ndig, num); PG_RETURN_CSTRING(ascii); } @@ -468,6 +475,12 @@ float8out_internal(double num) char *ascii = (char *) palloc(32); int ndig = DBL_DIG + extra_float_digits; + if (extra_float_digits > 0) + { + ryu_d2s_buffered(num, ascii); + return ascii; + } + (void) pg_strfromd(ascii, 32, ndig, num); return ascii; } diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index f81e0424e7..404ba2bfa0 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] = {"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE, gettext_noop("Sets the number of digits displayed for floating-point values."), gettext_noop("This affects real, double precision, and geometric data types. " - "The parameter value is added to the standard number of digits " - "(FLT_DIG or DBL_DIG as appropriate).") + "A zero or negative parameter value is added to the standard " + "number of digits (FLT_DIG or DBL_DIG as appropriate). " + "Any value greater than zero selects round-trip-safe output.") }, &extra_float_digits, - 0, -15, 3, + 1, -15, 3, NULL, NULL, NULL }, diff --git a/src/common/Makefile b/src/common/Makefile index d0c2b970eb..7e6e1237c5 100644 --- a/src/common/Makefile +++ b/src/common/Makefile @@ -44,9 +44,11 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\"" override CPPFLAGS := -DFRONTEND -I. -I$(top_srcdir)/src/common $(CPPFLAGS) LIBS += $(PTHREAD_LIBS) -OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \ - ip.o keywords.o kwlookup.o link-canary.o md5.o pg_lzcompress.o \ - pgfnames.o psprintf.o relpath.o \ +# If you add objects here, see also src/tools/msvc/Mkvcbuild.pm + +OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \ + file_perm.o ip.o keywords.o kwlookup.o link-canary.o md5.o \ + pg_lzcompress.o pgfnames.o psprintf.o relpath.o \ rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \ username.o wait_error.o diff --git a/src/common/d2s.c b/src/common/d2s.c new file mode 100644 index 00..1c63fb198a --- /dev/null +++ b/src/common/d2s.c @@ -0,0 +1,961 @@ +/*--- + * + * Ryu floating-point ou
How does the planner determine plan_rows ?
Hi, I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found the results have different row numbers, but the tables are all empty. =# CREATE TABLE t1(id INT, data INT); =# EXPLAIN ANALYZE SELECT * FROM t1; Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual time=0.003..0.003 rows=0 loops=1) =# CREATE TABLE t2(data VARCHAR); =# EXPLAIN ANALYZE SELECT * FROM t2; Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual time=0.002..0.002 rows=0 loops=1) =# CREATE TABLE t3(id INT, data VARCHAR); =# EXPLAIN ANALYZE SELECT * FROM t3; Seq Scan on t3 (cost=0.00..22.70 rows=1270 width=36) (actual time=0.001..0.001 rows=0 loops=1) I found this behavior unexpected. I'm still trying to find out how/where the planner determines the plan_rows. Any help will be appreciated! Thank you, Donald Dong
Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
On Thu, Jan 10, 2019 at 12:11 PM Haribabu Kommi wrote: > On Thu, Jan 10, 2019 at 2:32 PM Amit Kapila wrote: >> >> I am happy with this patch now, attached is the new version of the >> patch where I have slightly modified the commit message, see if that >> looks fine to you. I will push this patch tomorrow morning (after >> again going through it) unless someone has any objections or see any >> other issues. > > > Thanks for changes. LGTM. > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Remove all "INTERFACE ROUTINES" style comments
Thomas Munro writes: > On Fri, Jan 11, 2019 at 12:58 PM Andres Freund wrote: >> A number of postgres files have sections like heapam's >> * INTERFACE ROUTINES >> >> They're often out-of-date, and I personally never found them to be >> useful. A few people, including yours truly, have been removing a few >> here and there when overhauling a subsystem to avoid having to update >> and then adjust them. >> I think it might be a good idea to just do that for all at once. > +1 I agree we don't maintain them well, so it'd be better to remove them, as long as we make sure any useful info gets transferred someplace else (like per-function header comments). regards, tom lane
Re: How does the planner determine plan_rows ?
> "Donald" == Donald Dong writes: Donald> Hi, Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on Donald> `SELECT * `. I found the results have different row numbers, Donald> but the tables are all empty. Empty tables are something of a special case, because the planner doesn't assume that they will _stay_ empty, and using an estimate of 0 or 1 rows would tend to create a distorted plan that would likely blow up in runtime as soon as you insert a second row. The place to look for info would be estimate_rel_size in optimizer/util/plancat.c, from which you can see that empty tables get a default size estimate of 10 pages. Thus: Donald> =# CREATE TABLE t1(id INT, data INT); Donald> =# EXPLAIN ANALYZE SELECT * FROM t1; Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual Donald> time=0.003..0.003 rows=0 loops=1) An (int,int) tuple takes about 36 bytes, so you can get about 226 of them on a page, so 10 pages is 2260 rows. Donald> =# CREATE TABLE t2(data VARCHAR); Donald> =# EXPLAIN ANALYZE SELECT * FROM t2; Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual Donald> time=0.002..0.002 rows=0 loops=1) Size of a varchar with no specified length isn't known, so the planner determines an average length of 32 by the time-honoured method of rectal extraction (see get_typavgwidth in lsyscache.c), making 136 rows per page. -- Andrew (irc:RhodiumToad)
Re: How does the planner determine plan_rows ?
Thank you for the great explanation! > On Jan 10, 2019, at 7:48 PM, Andrew Gierth > wrote: > >> "Donald" == Donald Dong writes: > > Donald> Hi, > Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on > Donald> `SELECT * `. I found the results have different row numbers, > Donald> but the tables are all empty. > > Empty tables are something of a special case, because the planner > doesn't assume that they will _stay_ empty, and using an estimate of 0 > or 1 rows would tend to create a distorted plan that would likely blow > up in runtime as soon as you insert a second row. > > The place to look for info would be estimate_rel_size in > optimizer/util/plancat.c, from which you can see that empty tables get > a default size estimate of 10 pages. Thus: > > Donald> =# CREATE TABLE t1(id INT, data INT); > Donald> =# EXPLAIN ANALYZE SELECT * FROM t1; > Donald> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) (actual > Donald> time=0.003..0.003 rows=0 loops=1) > > An (int,int) tuple takes about 36 bytes, so you can get about 226 of > them on a page, so 10 pages is 2260 rows. > > Donald> =# CREATE TABLE t2(data VARCHAR); > Donald> =# EXPLAIN ANALYZE SELECT * FROM t2; > Donald> Seq Scan on t2 (cost=0.00..23.60 rows=1360 width=32) (actual > Donald> time=0.002..0.002 rows=0 loops=1) > > Size of a varchar with no specified length isn't known, so the planner > determines an average length of 32 by the time-honoured method of rectal > extraction (see get_typavgwidth in lsyscache.c), making 136 rows per > page. > > -- > Andrew (irc:RhodiumToad)
Re: How does the planner determine plan_rows ?
Donald Dong writes: > I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found > the results have different row numbers, but the tables are all empty. This isn't a terribly interesting case, since you've neither loaded any data nor vacuumed/analyzed the table, but ... > I found this behavior unexpected. I'm still trying to find out how/where the > planner > determines the plan_rows. ... estimate_rel_size() in plancat.c is where to look to find out about the planner's default estimates when it's lacking hard data. regards, tom lane
Re: speeding up planning with partitions
On Thu, 10 Jan 2019 at 21:41, Amit Langote wrote: > Here's v12, which is more or less same as v11 but with the order of > patches switched so that the code movement patch is first. Note that the > attached v12-0001 contains no functional changes (but there's tiny note in > the commit message mentioning the addition of a tiny function which is > just old code). A few more comments based on reading v12. v12 0002: 1. Missing braces around the else clause. (Should be consistent with the "if" above) + if (!has_live_children) + { + /* + * All children were excluded by constraints, so mark the relation + * ass dummy. We must do this in this phase so that the rel's + * dummy-ness is visible when we generate paths for other rels. + */ + set_dummy_rel_pathlist(rel); + } + else + /* + * Set a non-zero value here to cope with the caller's requirement + * that non-dummy relations are actually not empty. We don't try to + * be accurate here, because we're not going to create a path that + * combines the children outputs. + */ + rel->rows = 1; v12 0004: 2. I wonder if there's a better way, instead of doing this: + if (child_rel1 == NULL) + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts); + if (child_rel2 == NULL) + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts); maybe add some logic in populate_joinrel_with_paths() to allow NULL rels to mean dummy rels. There's a bit of a problem there as the joinrel has already been created by that time, but perhaps populate_joinrel_with_paths is a better place to decide if the dummy rel needs to be built or not. 3. I wonder if there's a better way to handle what build_dummy_partition_rel() does. I see the child's relid to the parent's relid and makes up a fake AppendRelInfo and puts it in the parent's slot. What's going to happen when the parent is not the top-level parent? It'll already have a AppendRelInfo set. I had thought something like the following could break this, but of course, it does not since we build the dummy rel for the pruned sub_parent2, so we don't hit the NULL relation case when doing the next level. i.e we only make dummies for the top-level, never dummys of joinrels. Does that not mean that the if (parent->top_parent_relids) will always be false in build_dummy_partition_rel() and it'll only ever get rtekind == RTE_RELATION? drop table if exists parent; create table parent (id int, a int, b text, c float) partition by range (a); create table sub_parent1 (b text, c float, a int, id int) partition by range (a); create table sub_parent2 (c float, b text, id int, a int) partition by range (a); alter table parent attach partition sub_parent1 for values from (0) to (10); alter table parent attach partition sub_parent2 for values from (10) to (20); create table child11 (id int, b text, c float, a int); create table child12 (id int, b text, c float, a int); create table child21 (id int, b text, c float, a int); create table child22 (id int, b text, c float, a int); alter table sub_parent1 attach partition child11 for values from (0) to (5); alter table sub_parent1 attach partition child12 for values from (5) to (10); alter table sub_parent2 attach partition child21 for values from (10) to (15); alter table sub_parent2 attach partition child22 for values from (15) to (20); insert into parent values(0,1,'test',100.0); select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10; 4. How are dummy rels handled in grouping_planner()? I see you have this: - if (IS_DUMMY_REL(planned_rel)) + if (!parent_rte->inh || IS_DUMMY_REL(planned_rel)) { grouping_planner(root, false, planned_rel, 0.0); return; With the above example I tried to see how it was handled by doing: postgres=# update parent set c = c where a = 333; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. I didn't look into what's causing the crash. 5. Wondering why you removed the if (childOID != parentOID) from: - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation)) - { - heap_close(newrelation, lockmode); - continue; - } Isn't that releasing the only lock we hold on the rel defined in the query? I tested with: -- session 1 create temp table a1(a int); create temp table a2(a int) inherits(a1); -- session 2 select oid::regclass from pg_class where relname = 'a1'; oid -- pg_temp_3.a1 (1 row) explain select * from pg_temp_3.a1; WARNING: you don't own a lock of type AccessShareLock QUERY PLAN -- Result (cost=0.00..0.00 rows=0 width=4) One-Time Filter: false (2 rows) 6. expand_planner_arrays() zeros a portion of the append_rel_array even if it just palloc0'd the array. While it's not a bug, it is repeat work. It should be okay to move the Memset() up to the repalloc(). 7. I see get_relation_info() grew an extra parameter. Would it now be better just
Re: Ryu floating point output patch
Does the project have an established view on non-ASCII names in sources or docs? AFAICS [1], the name of the algorithm may be Ryū. -Chap [1] https://dl.acm.org/citation.cfm?id=3192369
Re: Ryu floating point output patch
Hi, On 2019-01-10 23:02:01 -0500, Chapman Flack wrote: > Does the project have an established view on non-ASCII names in > sources or docs? > > AFAICS [1], the name of the algorithm may be Ryū. I think it'd be a really bad idea to start having non-ascii filenames, and I quite doubt that I'm alone in that. - Andres
Re: Ryu floating point output patch
Andres Freund writes: > On 2019-01-10 23:02:01 -0500, Chapman Flack wrote: >> Does the project have an established view on non-ASCII names in >> sources or docs? >> AFAICS [1], the name of the algorithm may be Ryū. > I think it'd be a really bad idea to start having non-ascii > filenames, and I quite doubt that I'm alone in that. Non-ASCII filenames seem right out. I thought the question was about whether we even want to have non-ASCII characters within source code (my view: avoid if possible) or in docs (we do, but it's better if you can make it into html entities). regards, tom lane
Acceptable/Best formatting of callbacks (for pluggable storage)
Hi, The pluggable storage patchset has a large struct full of callbacks, and a bunch of wrapper functions for calling those callbacks. While starting to polish the patchset, I tried to make the formatting nice. By default pgindent yields formatting like: /* * API struct for a table AM. Note this must be allocated in a * server-lifetime manner, typically as a static const struct, which then gets * returned by FormData_pg_am.amhandler. */ typedef struct TableAmRoutine { NodeTag type; ... void(*relation_set_new_filenode) (Relation relation, char persistence, TransactionId *freezeXid, MultiXactId *minmulti); ... static inline void table_set_new_filenode(Relation rel, char persistence, TransactionId *freezeXid, MultiXactId *minmulti) { rel->rd_tableam->relation_set_new_filenode(rel, persistence, freezeXid, minmulti); } which isn't particularly pretty, especially because there's callbacks with longer names than the example above. Unfortunately pgindent prevents formatting the callbacks like: void(*relation_set_new_filenode) ( Relation relation, char persistence, TransactionId *freezeXid, MultiXactId *minmulti); or something in that vein. What however does work, is: void(*relation_set_new_filenode) (Relation relation, char persistence, TransactionId *freezeXid, MultiXactId *minmulti); I.e. putting the opening ( of the parameter list into a separate line yields somewhat usable formatting. This also has the advantage that the arguments of all callbacks line up, making it a bit easier to scan. Similarly, to reduce the indentation, especially for callbacks with long names and/o with longer paramter names, we can do: static inline void table_set_new_filenode(Relation rel, char persistence, TransactionId *freezeXid, MultiXactId *minmulti) { rel->rd_tableam->relation_set_new_filenode (rel, persistence, freezeXid, minmulti); } So, putting the parameter list, both in use and declaration, entirely into a new line yields decent formatting with pgindent. But it's kinda weird. I can't really come up with a better alternative, and after a few minutes it looks pretty reasonable. Comments? Better alternatives? Greetings, Andres Freund
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
Fujita-san, On 2019/01/10 15:07, Etsuro Fujita wrote: > Amit-san, > > (2019/01/10 10:41), Amit Langote wrote: >> On 2019/01/09 20:20, Etsuro Fujita wrote: >>> I like your patch in general. I think one way to address Ashutosh's >>> concerns would be to use the consider_partitionwise_join flag: originally, >>> that was introduced for partitioned relations to show that they can be >>> partitionwise-joined, but I think that flag could also be used for >>> non-partitioned relations to show that they have been set up properly for >>> partitionwise-joins, and I think by checking that flag we could avoid >>> creating those copies for child dummy rels in try_partitionwise_join. >> >> Ah, that's an interesting idea. >> >> If I understand the original design of it correctly, >> consider_partitionwise_join being true for a given relation (simple or >> join) means that its RelOptInfo contains properties to consider it to be >> joined with another relation (simple or join) using partitionwise join >> mechanism. Partitionwise join will occur between the pair if the other >> relation also has relevant properties (hence its >> consider_partitionwise_join set to true) and properties on the two sides >> match. > > Actually, the flag being true just means that the tlist for a given > partitioned relation (simple or join) doesn't contain any whole-row Vars. > And if two given partitioned relations having the flag being true have > additional properties to be joined using the PWJ technique, then we try to > do PWJ for those partitioned relations. I see. Thanks for the explanation. > (The name of the flag isn't > good? If so, that would be my fault because I named that flag.) If it's really just to store the fact that the relation's targetlist contains expressions that partitionwise join currently cannot handle, then setting it like this in set_append_rel_size seems a bit misleading: if (enable_partitionwise_join && rel->reloptkind == RELOPT_BASEREL && rte->relkind == RELKIND_PARTITIONED_TABLE && rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL) rel->consider_partitionwise_join = true; Sorry, I wasn't around to comment on the patch which got committed in 7cfdc77023a, but checking the value of enable_partitionwise_join and other things in set_append_rel_size() to set the value of consider_partitionwise_join seems a bit odd to me. Perhaps, consider_partitionwise_join should be initially set to true for a relation (actually, to rel->part_scheme != NULL) and only set it to false if the relation's targetlist is found to contain unsupported expressions. That way, it becomes easier to think what it means imho. I think enable_partitionwise_join should only be checked in relnode.c or joinrels.c. I've attached a patch to show what I mean. Can you please take a look? If you think that this patch is a good idea, then you'll need to explicitly set consider_partitionwise_join to false for a dummy partition rel in set_append_rel_size(), because the assumption of your patch that such partition's rel's consider_partitionwise_join would be false (as initialized with the current code) would be broken by my patch. But that might be a good thing to do anyway as it will document the special case usage of consider_partitionwise_join variable more explicitly, assuming you'll be adding a comment describing why it's being set to false explicitly. >> That's a loaded meaning and abusing it to mean something else can be >> challenged, but we can live with that if properly documented. Speaking of >> which: >> >> /* used by partitionwise joins: */ >> bool consider_partitionwise_join; /* consider >> partitionwise join >> * paths? (if partitioned >> rel) */ >> >> Maybe, mention here how it will be abused in back-branches for >> non-partitioned relations? > > Will do. Thank you. >>> Please find attached an updated version of the patch. I modified your >>> version so that building tlists for child dummy rels are also postponed >>> until after they actually participate in partitionwise-joins, to avoid >>> that possibly-useless work as well. I haven't done any performance tests >>> yet though. >> >> Thanks for updating the patch. I tested your patch (test setup described >> below) and it has almost the same performance as my previous version: >> 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also >> mentioned below. > > Thanks for that testing! > > I also tested the patch with your script: > > 253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10) Oh, PG 11 doesn't appear as bad compared to PG 10 with your numbers as it did on my machine. That's good anyway. Thanks, Amit diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 1999cd4436..24900ce593 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -1305,15 +1
Re: doc: where best to add ~ 400 words to the manual?
On 12/30/18 22:23, Chapman Flack wrote: > I would like to add material to the manual, detailing the differences > between the XPath 1.0 language supported in PG, and the XQuery/XPath 2.0+ > expected by the standard. > ... > I have thought of: > ... > 3. A sect2 at the very end of functions-xml, linked to from a short remark > at the top. (3) seems to be supported by (the only) precedent, the "Limits and Compatibility" sect3 within functions-posix-regexp. So, absent objection, I'll work on a Limits and Compatibility sect2 within functions-xml. -Chap
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
On 2019/01/11 11:21, Etsuro Fujita wrote: > (2019/01/10 21:23), Amit Langote wrote: >> On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat >> wrote: >>> Though this will solve a problem for performance when partition-wise >>> join is not possible, we still have the same problem when >>> partition-wise join is possible. And that problem really happens >>> because our inheritance mechanism requires expression translation from >>> parent to child everywhere. That consumes memory, eats CPU cycles and >>> generally downgrades performance of partition related query planning. I >>> think a better way would be to avoid these translations and use Parent >>> var to represent a Var of the child being dealt with. That will be a >>> massive churn on inheritance based planner code, but it will improve >>> planning time for queries involving thousands of partitions. >> >> Yeah, it would be nice going forward to overhaul inheritance planning >> such that parent-to-child Var translation is not needed, especially >> where no pruning can occur or many partitions remain even after >> pruning. > > I agree on that point, but I think that's an improvement for a future > release rather than a fix for the issue reported on this thread. Agreed. Improving planning performance for large number of partitions even in the absence of pruning is a good goal to pursue for future versions, as is being discussed in some other threads [1]. Thanks, Amit [1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1FB60AE5%40G01JPEXMBYT05
Re: Displaying and dumping of table access methods
On Wed, 9 Jan 2019 at 14:30, Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Tue, Jan 8, 2019 at 6:34 PM Andres Freund wrote: > > > > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote: > > > On 08/01/2019 00:56, Andres Freund wrote: > > > > A patch at [2] adds display of a table's access method to \d+ - but that > > > > means that running the tests with a different default table access > > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...) > > > > there'll be a significant number of test failures, even though the test > > > > results did not meaningfully differ. > > > > > > For psql, a variable that hides the access method if it's the default. > > > > Yea, I think that seems the least contentious solution. Don't like it > > too much, but it seems better than the alternative. I wonder if we want > > one for multiple regression related issues, or whether one specifically > > about table AMs is more appropriate. I lean towards the latter. > > Are there any similar existing regression related issues? If no, then probably > the latter indeed makes more sense. > > > > > Similarly, if pg_dump starts to dump table access methods either > > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to > > > > unimportant differences. > > > > > > For pg_dump, track and set the default_table_access_method setting > > > throughout the dump (similar to how default_with_oids was handled, I > > > believe). > > > > Yea, that's similar to that, and I think that makes sense. > > Yes, sounds like a reasonable approach, I can proceed with it. Dmitry, I believe you have taken the pg_dump part only. If that's right, I can proceed with the psql part. Does that sound right ? > -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Re: What to name the current heap after pluggable storage / what to rename?
On Fri, Jan 11, 2019 at 11:05 AM Andres Freund wrote: > Hi, > > On 2018-12-19 14:21:29 -0500, Robert Haas wrote: > > On Tue, Dec 18, 2018 at 11:17 PM Andres Freund > wrote: > > > Another would be to be aggressive in renaming, and deconflict by > > > renaming functions like heap_create[_with_catalog] etc to sound more > > > accurate. I think that has some appeal, because a lot of those names > > > aren't describing their tasks particularly well. > > > > I like that option. > > I'd like to start doing that by moving the functions in the following > heapam.h block elsewhere: > > /* in heap/heapam.c */ > extern Relation relation_open(Oid relationId, LOCKMODE lockmode); > extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode); > extern Relation relation_openrv(const RangeVar *relation, LOCKMODE > lockmode); > extern Relation relation_openrv_extended(const RangeVar *relation, > LOCKMODE lockmode, bool > missing_ok); > extern void relation_close(Relation relation, LOCKMODE lockmode); > > extern Relation heap_open(Oid relationId, LOCKMODE lockmode); > extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode); > extern Relation heap_openrv_extended(const RangeVar *relation, > LOCKMODE lockmode, bool > missing_ok); > > ISTM that the first block would best belong into new files like > access/rel[ation].h and access/common/rel[ation].h. I think the second > set should be renamed to be table_open() (with backward compat > redirects, there's way way too many references) and should go into > access/table.h access/table/table.c alongside tableam.[ch], but I could > also see just putting them into relation.[ch]. > > Comments? > Yes, that will be good. Regards, Haribabu Kommi Fujitsu Australia
Re: doc: where best to add ~ 400 words to the manual?
On 01/10/19 23:48, Chapman Flack wrote: > On 12/30/18 22:23, Chapman Flack wrote: > (3) seems to be supported by (the only) precedent, the "Limits and > Compatibility" sect3 within functions-posix-regexp. > > So, absent objection, I'll work on a Limits and Compatibility sect2 > within functions-xml. One fly in the ointment: tables of contents seem to list sect1 and sect2 elements, but not sect3. So the "Limits and Compatibility" sect3 under functions-posix-regexp does not clutter the overall "Functions and Operators" ToC, but a "Limits and Compatibility" sect2 under functions-xml would be visible there. Tucking it as a sect3 within an existing sect2 would prevent that, but it is common information that pertains to more than one of them, so that doesn't seem quite right. Or is it ok to have a "Limits and Compatibility" visible in the ToC under XML Functions? -Chap
RE: speeding up planning with partitions
Hi David, On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote: > 3. I wonder if there's a better way to handle what > build_dummy_partition_rel() does. I see the child's relid to the > parent's relid and makes up a fake AppendRelInfo and puts it in the > parent's slot. What's going to happen when the parent is not the > top-level parent? It'll already have a AppendRelInfo set. ... > > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10; I think there is a mistake in the select SQL. "p1.id < 10" doesn't prune any partition because tables are partitioned by column "a" in your definition. Isn't it? > Does that not mean that the if (parent->top_parent_relids) will always > be false in build_dummy_partition_rel() and it'll only ever get > rtekind == RTE_RELATION? At least, I checked if (parent->top_parent_relids) can be true if I execute below SQL. select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15; I couldn't check other points you mentioned, but I also think build_dummy_partition_rel() needs more consideration because I felt it has complicated logic when I was checking around here. Amit, I also realized there are some mistakes in the comments around this function. + * build_dummy_partition_rel + * Build a RelOptInfo and AppendRelInfo for a pruned partition s/and AppendRelInfo/and an AppendRelInfo/ +* Now we'll need a (no-op) AppendRelInfo for parent, because we're +* setting the dummy partition's relid to be same as the parent's. s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/ -- Yoshikazu Imai
Re: New vacuum option to do only freezing
On Thu, Jan 10, 2019 at 2:45 PM Masahiko Sawada wrote: > > On Thu, Jan 10, 2019 at 5:23 AM Bossart, Nathan wrote: > > > > Hi, > > > > On 1/8/19, 7:03 PM, "Masahiko Sawada" wrote: > > > Attached the updated version patch incorporated all comments I got. > > > > Thanks for the new patch! > > > > > * Instead of freezing xmax I've changed the behaviour of the new > > > option (DISABLE_INDEX_CLEANUP) so that it sets dead tuples as dead > > > instead of as unused and skips both index vacuum and index cleanup. > > > That is, we remove the storage of dead tuple but leave dead itemids > > > for the index lookups. These are removed by the next vacuum execution > > > enabling index cleanup. Currently HOT-pruning doesn't set the root of > > > the chain as unused even if the whole chain is dead. Since setting > > > tuples as dead removes storage the freezing xmax is no longer > > > required. > > > > I tested this option with a variety of cases (HOT, indexes, etc.), and > > it seems to work well. I haven't looked too deeply into the > > implications of using LP_DEAD this way, but it seems like a reasonable > > approach at first glance. > > Thank you for reviewing the patch! > > > > > + > > +DISABLE_INDEX_CLEANUP > > + > > + > > + VACUUM removes dead tuples and prunes HOT-updated > > + tuples chain for live tuples on table. If the table has any dead > > tuple > > + it removes them from both table and indexes for re-use. With this > > + option VACUUM marks tuples as dead (i.e., it > > doesn't > > + remove tuple storage) and disables removing dead tuples from indexes. > > + This is suitable for avoiding transaction ID wraparound but not > > + sufficient for avoiding index bloat. This option cannot be used in > > + conjunction with FULL option. > > + > > + > > + > > > > I think we should clarify the expected behavior when this option is > > used on a table with no indexes. We probably do not want to fail, as > > this could disrupt VACUUM commands that touch several tables. Also, > > we don't need to set tuples as dead instead of unused, which appears > > to be what this patch is actually doing: > > > > + if (nindexes > 0 && disable_index_cleanup) > > + lazy_set_tuples_dead(onerel, blkno, buf, > > vacrelstats); > > + else > > + lazy_vacuum_page(onerel, blkno, buf, 0, > > vacrelstats, &vmbuffer); > > > > In this case, perhaps we should generate a log statement that notes > > that DISABLE_INDEX_CLEANUP is being ignored and set > > disable_index_cleanup to false during processing. > > Agreed. How about output a NOTICE message before calling > lazy_scan_heap() in lazy_vacuum_rel()? > > > > > + if (disable_index_cleanup) > > + ereport(elevel, > > + (errmsg("\"%s\": marked %.0f row > > versions in %u pages as dead", > > + > > RelationGetRelationName(onerel), > > + tups_vacuumed, > > vacuumed_pages))); > > + else > > + ereport(elevel, > > + (errmsg("\"%s\": removed %.0f row > > versions in %u pages", > > + > > RelationGetRelationName(onerel), > > + tups_vacuumed, > > vacuumed_pages))); > > > > Should the first log statement only be generated when there are also > > indexes? > > You're right. Will fix. > > > > > +static void > > +lazy_set_tuples_dead(Relation onerel, BlockNumber blkno, Buffer buffer, > > + LVRelStats *vacrelstats) > > > > This function looks very similar to lazy_vacuum_page(). Perhaps the > > two could be combined? > > > > Yes, I intentionally separed them as I was concerned the these > functions have different assumptions and usage. But the combining them > also could work. Will change it. > Attached the updated patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center v4-0001-Add-DISABLE_INDEX_CLEANUP-option-to-VACUUM-comman.patch Description: Binary data
Re: speeding up planning with partitions
On 2019/01/11 11:07, Amit Langote wrote: > On 2019/01/11 6:47, David Rowley wrote: >> On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera >> wrote: >>> Pushed 0001 with some minor tweaks, thanks. >> >> Thanks for pushing. I had looked at 0001 last night and there wasn't >> much to report, just: >> >> v12 0001 >> >> 1. I see you've moved translate_col_privs() out of prepunion.c into >> appendinfo.c. This required making it an external function, but it's >> only use is in inherit.c, so would it not be better to put it there >> and keep it static? > > Actually, I *was* a bit puzzled where to put it. I tend to agree with you > now that it might be define it locally within inherit.c as it might not be > needed elsewhere. Let's hear what Alvaro thinks. I'm attaching a patch > anyway. > >> 2. The following two lines I think need to swap their order. >> >> +#include "utils/rel.h" >> +#include "utils/lsyscache.h" > > Oops, fixed. > >> Both are pretty minor details but thought I'd post them anyway. > > Thank you for reporting. > > Attached find the patch. Looking around a bit more, I started thinking even build_child_join_sjinfo doesn't belong in appendinfo.c (just to be clear, it was defined in prepunion.c before this commit), so maybe we should move it to joinrels.c and instead export adjust_child_relids that's required by it from appendinfo.c. There's already adjust_child_relids_multilevel in appendinfo.h, so having adjust_child_relids next to it isn't too bad. At least not as bad as appendinfo.c exporting build_child_join_sjinfo for joinrels.c to use. I have updated the patch. Thanks, Amit diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 38eeb23d81..db18feccfe 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -46,6 +46,9 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, List *parent_restrictlist); static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op); +static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root, + SpecialJoinInfo *parent_sjinfo, + Relids left_relids, Relids right_relids); /* @@ -1582,3 +1585,45 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel, bool strict_op) return -1; } + +/* + * Construct the SpecialJoinInfo for a child-join by translating + * SpecialJoinInfo for the join between parents. left_relids and right_relids + * are the relids of left and right side of the join respectively. + */ +static SpecialJoinInfo * +build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo, + Relids left_relids, Relids right_relids) +{ + SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo); + AppendRelInfo **left_appinfos; + int left_nappinfos; + AppendRelInfo **right_appinfos; + int right_nappinfos; + + memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo)); + left_appinfos = find_appinfos_by_relids(root, left_relids, + &left_nappinfos); + right_appinfos = find_appinfos_by_relids(root, right_relids, + &right_nappinfos); + + sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand, + left_nappinfos, left_appinfos); + sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand, + right_nappinfos, + right_appinfos); + sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand, + left_nappinfos, left_appinfos); + sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand, + right_nappinfos, + right_appinfos); + sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root, + (Node *) sjinfo->semi_rhs_exprs, + right_nappinfos, +
RE: speeding up planning with partitions
On Thu, Jan 10, 2019 at 6:10 PM, Imai, Yoshikazu wrote: > > Does that not mean that the if (parent->top_parent_relids) will always > > be false in build_dummy_partition_rel() and it'll only ever get > > rtekind == RTE_RELATION? > > At least, I checked if (parent->top_parent_relids) can be true if I > execute below SQL. > > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15; Sorry, I also made mistake. I was executed: select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.a < 15; -- Yoshikazu Imai > -Original Message- > From: Imai, Yoshikazu [mailto:imai.yoshik...@jp.fujitsu.com] > Sent: Friday, January 11, 2019 3:10 PM > To: 'David Rowley' ; Amit Langote > > Cc: Amit Langote ; Alvaro Herrera > ; Pg Hackers > Subject: RE: speeding up planning with partitions > > Hi David, > > On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote: > > 3. I wonder if there's a better way to handle what > > build_dummy_partition_rel() does. I see the child's relid to the > > parent's relid and makes up a fake AppendRelInfo and puts it in the > > parent's slot. What's going to happen when the parent is not the > > top-level parent? It'll already have a AppendRelInfo set. > ... > > > > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id > > < 10; > > I think there is a mistake in the select SQL. > "p1.id < 10" doesn't prune any partition because tables are partitioned > by column "a" in your definition. Isn't it? > > > Does that not mean that the if (parent->top_parent_relids) will always > > be false in build_dummy_partition_rel() and it'll only ever get > > rtekind == RTE_RELATION? > > At least, I checked if (parent->top_parent_relids) can be true if I > execute below SQL. > > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id > < 15; > > I couldn't check other points you mentioned, but I also think > build_dummy_partition_rel() needs more consideration because I felt it > has complicated logic when I was checking around here. > > > Amit, > I also realized there are some mistakes in the comments around this > function. > > + * build_dummy_partition_rel > + * Build a RelOptInfo and AppendRelInfo for a pruned > partition > s/and AppendRelInfo/and an AppendRelInfo/ > > + * Now we'll need a (no-op) AppendRelInfo for parent, because > we're > + * setting the dummy partition's relid to be same as the parent's. > s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/ > > -- > Yoshikazu Imai
Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table
On Fri, Jan 11, 2019 at 10:50:32AM +0900, Amit Langote wrote: > Okay, I withdraw my objection to the wording proposed by you. Thanks. I can commit this version if there are no objections then. -- Michael signature.asc Description: PGP signature
Re: How does the planner determine plan_rows ?
> On Jan 10, 2019, at 8:01 PM, Tom Lane wrote: > > ... estimate_rel_size() in plancat.c is where to look to find out > about the planner's default estimates when it's lacking hard data. Thank you! Now I see how the planner uses the rows to estimate the cost and generates the best_plan. To me, tracing the function calls is not a simple task. I'm using cscope, and I use printf when I'm not entirely sure. I was considering to use gbd, but I'm having issues referencing the source code in gdb. I'm very interested to learn how the professionals explore the codebase!