Re: Improve performance of pg_strtointNN functions
On Sun, 4 Dec 2022 at 03:19, David Rowley wrote: > > Pushed with some small adjustments. > Ah, I see that you changed the overflow test, and I realise that I forgot to answer your question about why I wrote that as 1 - INT_MIN / 10 over on the other thread. The reason is that we need to detect whether tmp * base will exceed -INT_MIN, not INT_MAX, since we're accumulating the absolute value of a signed integer. So the right test is tmp >= 1 - INT_MIN / base or equivalently tmp > -(INT_MIN / base) I used the first form, because it didn't require extra parentheses, but that doesn't really matter. The point is that, in general, that's not the same as tmp > INT_MAX / base though it happens to be the same for base = 10, because INT_MIN and INT_MAX aren't divisible by 10. It will break when base is a power of 2 though, so although it's not broken now, it's morally wrong, and it risks breaking when Peter commits his patch. Regards, Dean
RE: Perform streaming logical transactions by background workers and parallel apply
On Friday, December 2, 2022 7:27 PM Kuroda, Hayato/黒田 隼人 wroteL > > Dear Hou, > > Thanks for making the patch. Followings are my comments for v54-0003 and > 0004. Thanks for the comments! > > 0003 > > pa_free_worker() > > + /* Unlink any files that were needed to serialize partial changes. */ > + if (winfo->serialize_changes) > + stream_cleanup_files(MyLogicalRepWorker->subid, > winfo->shared->xid); > + > > I think this part is not needed, because the LA cannot reach here if > winfo->serialize_changes is true. Moreover stream_cleanup_files() is done in > pa_free_worker_info(). Removed. > LogicalParallelApplyLoop() > > The parallel apply worker wakes up every 0.1s even if we are in the > PARTIAL_SERIALIZE mode. Do you have idea to reduce that? The parallel apply worker usually will wait on the stream lock after entering PARTIAL_SERIALIZE mode. > ``` > + pa_spooled_messages(); > ``` > > Comments are needed here, like "Changes may be serialize...". Added. > pa_stream_abort() > > ``` > + /* > +* Reopen the file and set the file position > to > the saved > +* position. > +*/ > + if (reopen_stream_fd) > + { > + charpath[MAXPGPATH]; > + > + changes_filename(path, > MyLogicalRepWorker->subid, xid); > + stream_fd = > BufFileOpenFileSet(&MyParallelShared->fileset, > + > path, O_RDONLY, false); > + BufFileSeek(stream_fd, fileno, offset, > SEEK_SET); > + } > ``` > > MyParallelShared->serialize_changes may be used instead of reopen_stream_fd. These codes have been removed. > > ``` > + /* > +* It is possible that while sending this change to > parallel apply > +* worker we need to switch to serialize mode. > +*/ > + if (winfo->serialize_changes) > + pa_set_fileset_state(winfo->shared, > FS_READY); > ``` > > There are three same parts in the code, can we combine them to common part? These codes have been slightly refactored. > apply_spooled_messages() > > ``` > + /* > +* Break the loop if the parallel apply worker has finished > applying > +* the transaction. The parallel apply worker should have > closed > the > +* file before committing. > +*/ > + if (am_parallel_apply_worker() && > + MyParallelShared->xact_state == > PARALLEL_TRANS_FINISHED) > + goto done; > ``` > > I thnk pfree(buffer) and pfree(s2.data) should not be skippied. > And this part should be at below "nchanges++;" buffer, s2.data were allocated in the toplevel transaction's context and it will be automatically freed soon when handling STREAM COMMIT. > > 0004 > > set_subscription_retry() > > ``` > + LockSharedObject(SubscriptionRelationId, MySubscription->oid, 0, > +AccessShareLock); > + > ``` > > I think AccessExclusiveLock should be aquired instead of AccessShareLock. > In AlterSubscription(), LockSharedObject(AccessExclusiveLock) seems to be > used. Changed. Best regards, Hou zj
RE: Perform streaming logical transactions by background workers and parallel apply
On Friday, December 2, 2022 4:59 PM Peter Smith wrote: > > Here are my review comments for patch v54-0001. Thanks for the comments! > == > > FILE: .../replication/logical/applyparallelworker.c > > 1b. > > + * > + * This file contains routines that are intended to support setting up, > + using > + * and tearing down a ParallelApplyWorkerInfo which is required to > + communicate > + * among leader and parallel apply workers. > > "that are intended to support" -> "for" I find the current word is consistent with the comments atop vacuumparallel.c and execParallel.c. So didn't change this one. > 3. pa_setup_dsm > > +/* > + * Set up a dynamic shared memory segment. > + * > + * We set up a control region that contains a fixed-size worker info > + * (ParallelApplyWorkerShared), a message queue, and an error queue. > + * > + * Returns true on success, false on failure. > + */ > +static bool > +pa_setup_dsm(ParallelApplyWorkerInfo *winfo) > > IMO that's confusing to say "fixed-sized worker info" when it's referring to > the > ParallelApplyWorkerShared structure and not the other > ParallelApplyWorkerInfo. > > Might be better to say: > > "a fixed-size worker info (ParallelApplyWorkerShared)" -> "a fixed-size struct > (ParallelApplyWorkerShared)" The ParallelApplyWorkerShared is also kind of information that shared between workers. So, I am fine with current word. Or maybe just "fixed-size info" ? > ~~~ > > 12. pa_clean_subtrans > > +/* Reset the list that maintains subtransactions. */ void > +pa_clean_subtrans(void) > +{ > + subxactlist = NIL; > +} > > Maybe a more informative function name would be pa_reset_subxactlist()? I thought the current name is more consistent with pa_start_subtrans. > ~~~ > > 17. apply_handle_stream_stop > > + case TRANS_PARALLEL_APPLY: > + elog(DEBUG1, "applied %u changes in the streaming chunk", > + parallel_stream_nchanges); > + > + /* > + * By the time parallel apply worker is processing the changes in > + * the current streaming block, the leader apply worker may have > + * sent multiple streaming blocks. This can lead to parallel apply > + * worker start waiting even when there are more chunk of streams > + * in the queue. So, try to lock only if there is no message left > + * in the queue. See Locking Considerations atop > + * applyparallelworker.c. > + */ > > SUGGESTION (minor rewording) > > By the time the parallel apply worker is processing the changes in the current > streaming block, the leader apply worker may have sent multiple streaming > blocks. To the parallel apply from waiting unnecessarily, try to lock only if > there > is no message left in the queue. See Locking Considerations atop > applyparallelworker.c. > Didn't change this one according to Amit's comment. > > 21. apply_worker_clean_exit > > I wasn't sure if calling this a 'clean' exit meant anything much. > > How about: > - apply_worker_proc_exit, or > - apply_worker_exit I thought the clean means the exit number is 0(proc_exit(0)) and is not due to any ERROR, I am not sure If proc_exit or exit is better. I have addressed other comments in the new version patch. Best regards, Hou zj
RE: Avoid streaming the transaction which are skipped (in corner cases)
On Saturday, December 3, 2022 7:37 PM Amit Kapila wrote: > > On Tue, Nov 29, 2022 at 12:23 PM houzj.f...@fujitsu.com > wrote: > > > > On Tuesday, November 29, 2022 12:08 PM Dilip Kumar > wrote: > > > > I have few comments about the patch. > > > > 1. > > > > 1.1. > > - /* For streamed transactions notify the remote node about the abort. > */ > > - if (rbtxn_is_streamed(txn)) > > - rb->stream_abort(rb, txn, lsn); > > + /* the transaction which is being skipped shouldn't have been > streamed */ > > + Assert(!rbtxn_is_streamed(txn)); > > > > 1.2 > > - rbtxn_is_serialized(txn)) > > + rbtxn_is_serialized(txn) && > > + rbtxn_has_streamable_change(txn)) > > ReorderBufferStreamTXN(rb, toptxn); > > > > In the above two places, I think we should do the check for the > > top-level transaction(e.g. toptxn) because the patch only set flag for > > the top-level transaction. > > > > Among these, the first one seems okay because it will check both the > transaction > and its subtransactions from that path and none of those should be marked as > streamed. I have fixed the second one in the attached patch. > > > 2. > > > > + /* > > +* If there are any streamable changes getting queued then get the > top > > +* transaction and mark it has streamable change. This is required > for > > +* streaming in-progress transactions, the in-progress transaction > > will > > +* not be selected for streaming unless it has at least one > > streamable > > +* change. > > +*/ > > + if (change->action == REORDER_BUFFER_CHANGE_INSERT || > > + change->action == REORDER_BUFFER_CHANGE_UPDATE || > > + change->action == REORDER_BUFFER_CHANGE_DELETE || > > + change->action == > REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT || > > + change->action == > REORDER_BUFFER_CHANGE_TRUNCATE) > > > > I think that a transaction that contains REORDER_BUFFER_CHANGE_MESSAGE > > can also be considered as streamable. Is there a reason that we don't check > > it > here ? > > > > No, I don't see any reason not to do this check for > REORDER_BUFFER_CHANGE_MESSAGE. > > Apart from the above, I have slightly adjusted the comments in the attached. > Do > let me know what you think of the attached. Thanks for updating the patch. It looks good to me. Best regards, Hou zj
Re: Error-safe user functions
On 2022-12-03 Sa 16:46, Tom Lane wrote: > Andrew Dunstan writes: >> Great. Let's hope we can get this settled early next week and then we >> can get to work on the next tranche of functions, those that will let >> the SQL/JSON work restart. > OK, here's a draft proposal. I should start out by acknowledging that > this steals a great deal from Nikita's original patch as well as yours, > though I editorialized heavily. > > 0001 is the core infrastructure and documentation for the feature. > (I didn't bother breaking it down further than that.) > > 0002 fixes boolin and int4in. That is the work that we're going to > have to replicate in an awful lot of places, and I am pleased by how > short-and-sweet it is. Of course, stuff like the datetime functions > might be more complex to adapt. > > Then 0003 is a quick-hack version of COPY that is able to exercise > all this. I did not bother with the per-column flags as you had > them, because I'm not sure if they're worth the trouble compared > to a simple boolean; in any case we can add that refinement later. > What I did add was a WARN_ON_ERROR option that exercises the ability > to extract the error message after a soft error. I'm not proposing > that as a shippable feature, it's just something for testing. Overall I think this is pretty good, and I hope we can settle on it quickly. > > I think there are just a couple of loose ends here: > > 1. Bikeshedding on my name choices is welcome. I know Robert is > dissatisfied with "ereturn", but I'm content with that so I didn't > change it here. I haven't got anything better than ereturn. details_please seems more informal than our usual style. details_wanted maybe? > > 2. Everybody has struggled with just where to put the declaration > of the error context structure. The most natural home for it > probably would be elog.h, but that's out because it cannot depend > on nodes.h, and the struct has to be a Node type to conform to > the fmgr safety guidelines. What I've done here is to drop it > in nodes.h, as we've done with a couple of other hard-to-classify > node types; but I can't say I'm satisfied with that. > > Other plausible answers seem to be: > > * Drop it in fmgr.h. The only real problem is that historically > we've not wanted fmgr.h to depend on nodes.h either. But I'm not > sure how strong the argument for that really is/was. If we did > do it like that we could clean up a few kluges, both in this patch > and pre-existing (fmNodePtr at least could go away). > > * Invent a whole new header just for this struct. But then we're > back to the question of what to call it. Maybe something along the > lines of utils/elog_extras.h ? > > Maybe a new header misc_nodes.h? Soon after we get this done I think we'll find we need to extend this to non-input functions. But that can wait a short while. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Questions regarding distinct operation implementation
On 04/12/22 02:27, David Rowley wrote: To make this work when rows can exit the window frame seems significantly harder. Likely a hash table would be a better data structure to remove records from, but then how are you going to spill the hash table to disk when it reaches work_mem? As David J mentions, it seems like you'd need a hash table with a counter to track how many times a given value appears and only remove it from the table once that counter reaches 0. Unsure how you're going to constrain that to not use more than work_mem though. Interesting problem, Hashtables created in normal aggregates (AGG_HASHED mode) may provide some reference as they have hashagg_spill_tuple but I am not sure of any prior implementation of hashtable with counter and spill. Major concern is, if we go through tuplesort route (without order by case), would we get handicapped in future if we want order by or more features? Are there any other databases which support DISTINCT window aggregate with an ORDER BY in the window clause? Oracle db support distinct window aggregates albeit without order by clause. Rest of databases which I've tried (mysql/sqlserver express) don't even support distinct in window aggregates so those gets ruled out as well. If you were to limit this to only working with the query you mentioned in [1], i.e PARTITION BY without an ORDER BY, then you only need to aggregate once per partition per aggregate and you only need to do that once all of the tuples for the partition are in the tuplestore. It seems to me like you could add all the records to a tuplesort and then sort by the DISTINCT column then aggregate everything except for consecutive duplicates. You can then aggregate any other aggregates which share the same DISTINCT column, otherwise, you just destroy the tuplesort and rinse and repeat for the next aggregate. This looks like way to go that would ensure main use case of portability from Oracle. If you were to limit this to only working with the query you mentioned in [1], i.e PARTITION BY without an ORDER BY, I need to dig deeper into order by case. -- Regards, Ankit Kumar Pandey
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-03 Sa 16:46, Tom Lane wrote: >> 1. Bikeshedding on my name choices is welcome. I know Robert is >> dissatisfied with "ereturn", but I'm content with that so I didn't >> change it here. > details_please seems more informal than our usual style. details_wanted > maybe? Yeah, Corey didn't like that either. "details_wanted" works for me. > Soon after we get this done I think we'll find we need to extend this to > non-input functions. But that can wait a short while. I'm curious to know exactly which other use-cases you foresee. It wouldn't be a bad idea to write some draft code to verify that this mechanism will work conveniently for them. regards, tom lane
[PATCH] Add .idea to gitignore for JetBrains CLion
>From 6d10dafdd7c7789eddd7fd72ca22dfde74febe23 Mon Sep 17 00:00:00 2001 From: Ali Sajjad Date: Sun, 4 Dec 2022 06:03:11 -0800 Subject: [PATCH] Add .idea to gitignore for JetBrains CLion --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 1c0f3e5e35..7118b90f25 100644 --- a/.gitignore +++ b/.gitignore @@ -31,6 +31,7 @@ win32ver.rc *.exe lib*dll.def lib*.pc +**/.idea # Local excludes in root directory /GNUmakefile -- 2.34.1 Probably I'm the first one building PostgreSQL on CLion.
Re: [PATCH] Add .idea to gitignore for JetBrains CLion
I searched the commit fest app and there's already someone who has made this. On Sun, Dec 4, 2022 at 7:28 AM Sayyid Ali Sajjad Rizavi wrote: > From 6d10dafdd7c7789eddd7fd72ca22dfde74febe23 Mon Sep 17 00:00:00 2001 > From: Ali Sajjad > Date: Sun, 4 Dec 2022 06:03:11 -0800 > Subject: [PATCH] Add .idea to gitignore for JetBrains CLion > > --- > .gitignore | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/.gitignore b/.gitignore > index 1c0f3e5e35..7118b90f25 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -31,6 +31,7 @@ win32ver.rc > *.exe > lib*dll.def > lib*.pc > +**/.idea > > # Local excludes in root directory > /GNUmakefile > -- > 2.34.1 > > Probably I'm the first one building PostgreSQL on CLion. >
Re: [PATCH] Add .idea to gitignore for JetBrains CLion
Sayyid Ali Sajjad Rizavi writes: > +**/.idea Our policy is that the in-tree .gitignore files should only hide files that are build artifacts of standard build processes. Something like this belongs in your personal ~/.gitexclude, instead. (BTW, perhaps we should remove the entries targeting ".sl" extensions? AFAIK that was only for HP-UX, which is now desupported.) regards, tom lane
Re: Error-safe user functions
On 2022-12-04 Su 10:25, Tom Lane wrote: > Andrew Dunstan writes: >> On 2022-12-03 Sa 16:46, Tom Lane wrote: >>> 1. Bikeshedding on my name choices is welcome. I know Robert is >>> dissatisfied with "ereturn", but I'm content with that so I didn't >>> change it here. >> details_please seems more informal than our usual style. details_wanted >> maybe? > Yeah, Corey didn't like that either. "details_wanted" works for me. > >> Soon after we get this done I think we'll find we need to extend this to >> non-input functions. But that can wait a short while. > I'm curious to know exactly which other use-cases you foresee. > It wouldn't be a bad idea to write some draft code to verify > that this mechanism will work conveniently for them. The SQL/JSON patches at [1] included fixes for some numeric and datetime conversion functions as well as various input functions, so that's a fairly immediate need. More generally, I can see uses for error free casts, something like, say CAST(foo AS bar ON ERROR blurfl) cheers andrew [1] https://www.postgresql.org/message-id/f54ebd2b-0e67-d1c6-4ff7-5d732492d1a0%40postgrespro.ru -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Questions regarding distinct operation implementation
On 12/4/22 14:34, Ankit Kumar Pandey wrote: On 04/12/22 02:27, David Rowley wrote: If you were to limit this to only working with the query you mentioned in [1], i.e PARTITION BY without an ORDER BY, then you only need to aggregate once per partition per aggregate and you only need to do that once all of the tuples for the partition are in the tuplestore. It seems to me like you could add all the records to a tuplesort and then sort by the DISTINCT column then aggregate everything except for consecutive duplicates. You can then aggregate any other aggregates which share the same DISTINCT column, otherwise, you just destroy the tuplesort and rinse and repeat for the next aggregate. > This looks like way to go that would ensure main use case of portability from Oracle. The goal should not be portability from Oracle, but adherence to the standard. -- Vik Fearing
Re: Error-safe user functions
On 12/4/22 17:21, Andrew Dunstan wrote: More generally, I can see uses for error free casts, something like, say CAST(foo AS bar ON ERROR blurfl) What I am proposing for inclusion in the standard is basically the same as what JSON does: ::= CAST AS [ FORMAT ] [ ON ERROR ] ::= ERROR | NULL | DEFAULT Once/If I get that in, I will be pushing to get that syntax in postgres as well. -- Vik Fearing
Re: Questions regarding distinct operation implementation
On 04/12/22 22:25, Vik Fearing wrote: On 12/4/22 14:34, Ankit Kumar Pandey wrote: This looks like way to go that would ensure main use case of portability from Oracle. The goal should not be portability from Oracle, but adherence to the standard. Yes, Vik. You are right. Wrong remark from my side. -- Regards, Ankit Kumar Pandey
Re: Generate pg_stat_get_* functions with Macros
On Sun, Dec 04, 2022 at 06:07:37AM +0100, Drouvot, Bertrand wrote: > On 12/3/22 9:16 PM, Nathan Bossart wrote: >> Thanks. I editorialized a bit in the attached v3. I'm not sure that my >> proposed names for the macros are actually an improvement. WDYT? > > Thanks! I do prefer the macros definition ordering that you're proposing > (that makes pgstatfuncs.c "easier" to read). > > As far the names, I think it's better to replace "TAB" with "REL" (like in v4 > attached): the reason is that those macros will be used in [1] for both > tables and indexes stats (and so we'd have to replace "TAB" with "REL" in > [1]). > Having "REL" already in place reduces the changes that will be needed in [1]. Alright. I marked this as ready-for-committer. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: CREATE INDEX CONCURRENTLY on partitioned index
On Sat, Dec 03, 2022 at 07:13:30PM +0400, Ilya Gladyshev wrote: > Hi, > > Thank you Justin and Alexander for working on this, I have reviewed and > tested the latest patch, it works well, the problems mentioned > previously are all fixed. I like the idea of sharing code of reindex > and index, but I have noticed some peculiarities as a user. > > The reporting is somewhat confusing as it switches to reporting for > reindex concurrently while building child indexes, this should be fixed > with the simple patch I have attached. Another thing that I have > noticed is that REINDEX, which is used under the hood, creates new > indexes with suffix _ccnew, and if the index building fails, the > indexes that could not be build will have the name with _ccnew suffix. > This can actually be seen in your test: > > ERROR: could not create unique index "idxpart2_a_idx2_ccnew" > I find it quite confusing and I don't think that this the expected > behavior (if it is, I think it should be documented, like it is for > REINDEX). As an example of problems that it might entail, DROP INDEX > will not drop all the invalid indexes in the inheritance tree, because > it will leave _ccnew indexes in place, which is ok for reindex > concurrently, but that's not how C-I-C works now. I think that fixing > this problem requires some heavy code rewrite and I'm not quite sure This beavior is fixed. I re-factored and re-implented to use DefineIndex() for building indexes concurrently rather than reindexing. That makes the patch smaller, actually, and has the added benefit of splitting off the "Concurrently" part of DefineIndex() into a separate function. This currently handles partitions with a loop around the whole CIC implementation, which means that things like WaitForLockers() happen once for each index, the same as REINDEX CONCURRENTLY on a partitioned table. Contrast that with ReindexRelationConcurrently(), which handles all the indexes on a table in one pass by looping around indexes within each phase. BTW, it causes the patch to fail to apply in cfbot when you send an additional (002) supplementary patch without including the original (001) patch. You can name it *.txt to avoid the issue. https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F Thanks for looking. -- Justin >From e25b15173f4ce939efa54426e369b6996129ff59 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 6 Jun 2020 17:42:23 -0500 Subject: [PATCH] Allow CREATE INDEX CONCURRENTLY on partitioned table --- doc/src/sgml/ddl.sgml | 4 +- doc/src/sgml/ref/create_index.sgml | 9 -- src/backend/commands/indexcmds.c | 172 + src/test/regress/expected/indexing.out | 127 +- src/test/regress/sql/indexing.sql | 26 +++- 5 files changed, 268 insertions(+), 70 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 38618de01c5..cd72b455447 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -4163,9 +4163,7 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 so that they are applied automatically to the entire hierarchy. This is very convenient, as not only will the existing partitions become indexed, but - also any partitions that are created in the future will. One limitation is - that it's not possible to use the CONCURRENTLY - qualifier when creating such a partitioned index. To avoid long lock + also any partitions that are created in the future will. To avoid long lock times, it is possible to use CREATE INDEX ON ONLY the partitioned table; such an index is marked invalid, and the partitions do not get the index applied automatically. The indexes on partitions can diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 40986aa502f..fc8cda655f0 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -692,15 +692,6 @@ Indexes: cannot. - -Concurrent builds for indexes on partitioned tables are currently not -supported. However, you may concurrently build the index on each -partition individually and then finally create the partitioned index -non-concurrently in order to reduce the time where writes to the -partitioned table will be locked out. In this case, building the -partitioned index is a metadata only operation. - - diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index b5b860c3abf..cfab45b9992 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -92,6 +92,11 @@ static char *ChooseIndexName(const char *tabname, Oid namespaceId, bool primary, bool isconstraint); static char *ChooseIndexNameAddition(List *colnames); static List *ChooseIndexColumnNames(List *indexElems); +static void DefineIndexConcurrentInternal(Oid relationId, + Oid
Re: [PATCH] Add .idea to gitignore for JetBrains CLion
> On 4 Dec 2022, at 16:35, Tom Lane wrote: > > Sayyid Ali Sajjad Rizavi writes: >> +**/.idea > > Our policy is that the in-tree .gitignore files should only hide > files that are build artifacts of standard build processes. > Something like this belongs in your personal ~/.gitexclude, > instead. Since this comes up every now and again, I wonder if it's worth documenting this in our .gitignore along the lines of: --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,7 @@ +# This contains ignores for build artifacts from standard builds, +# auxiliary files from local workflows should be ignored locally +# with $GIT_DIR/info/exclude + > (BTW, perhaps we should remove the entries targeting ".sl" > extensions? AFAIK that was only for HP-UX, which is now > desupported.) +1. Grepping through the .gitignores in the tree didn't reveal anything else that seemed to have outlived its usefulness. -- Daniel Gustafsson https://vmware.com/
Re: [PATCH] Add .idea to gitignore for JetBrains CLion
Daniel Gustafsson writes: >> On 4 Dec 2022, at 16:35, Tom Lane wrote: >> Our policy is that the in-tree .gitignore files should only hide >> files that are build artifacts of standard build processes. >> Something like this belongs in your personal ~/.gitexclude, >> instead. > Since this comes up every now and again, I wonder if it's worth documenting > this in our .gitignore along the lines of: Good idea. >> (BTW, perhaps we should remove the entries targeting ".sl" >> extensions? AFAIK that was only for HP-UX, which is now >> desupported.) > +1. Grepping through the .gitignores in the tree didn't reveal anything else > that seemed to have outlived its usefulness. I'll make it so. Thanks for checking. regards, tom lane
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
On Tue, Nov 29, 2022 at 5:13 PM Melanie Plageman wrote: > Thanks for the review, Maciek! > > I've attached a new version 39 of the patch which addresses your docs > feedback from this email as well as docs feedback from Andres in [1] and > Justin in [2]. This looks great! Just a couple of minor comments. > You are right: reused is a normal, expected part of strategy > execution. And you are correct: the idea behind reusing existing > strategy buffers instead of taking buffers off the freelist is to leave > those buffers for blocks that we might expect to be accessed more than > once. > > In practice, however, if you happen to not be using many shared buffers, > and then do a large COPY, for example, you will end up doing a bunch of > writes (in order to reuse the strategy buffers) that you perhaps didn't > need to do at that time had you leveraged the freelist. I think the > decision about which tradeoff to make is quite contentious, though. Thanks for the explanation--that makes sense. > On Mon, Nov 7, 2022 at 1:26 PM Maciek Sakrejda wrote: > > Alternately, what do you think about pulling equivalencies to existing > > views out of the main column descriptions, and adding them after the > > main table as a sort of footnote? Most view docs don't have anything > > like that, but pg_stat_replication does and it might be a good pattern > > to follow. > > > > Thoughts? > > Thanks for including a patch! > In the attached v39, I've taken your suggestion of flattening some of > the lists and done some rewording as well. I have also moved the note > about equivalence with pg_stat_statements columns to the > pg_stat_statements documentation. The result is quite a bit different > than what I had before, so I would be interested to hear your thoughts. > > My concern with the blue "note" section like you mentioned is that it > would be harder to read the lists of backend types than it was in the > tabular format. Oh, I wasn't thinking of doing a separate "note": just additional paragraphs of text after the table (like what pg_stat_replication has before its "note", or the brief comment after the pg_stat_archiver table). But I think the updated docs work also. + + The context or location of an IO operation. + maybe "...of an IO operation:" (colon) instead? + default. Future values could include those derived from + XLOG_BLCKSZ, once WAL IO is tracked in this view, and + constant multipliers once non-block-oriented IO (e.g. temporary file IO) + is tracked here. I know Lukas had commented that we should communicate that the goal is to eventually provide relatively comprehensive I/O stats in this view (you do that in the view description and I think that works), and this is sort of along those lines, but I think speculative documentation like this is not all that helpful. I'd drop this last sentence. Just my two cents. + + evicted in io_context + buffer pool and io_object + temp relation counts the number of times a block of + data from an existing local buffer was evicted in order to replace it + with another block, also in local buffers. + Doesn't this follow from the first sentence of the column description? I think we could drop this, no? Otherwise, the docs look good to me. Thanks, Maciek
Re: Improve performance of pg_strtointNN functions
On Sun, 4 Dec 2022 at 22:53, Dean Rasheed wrote: > Ah, I see that you changed the overflow test, and I realise that I > forgot to answer your question about why I wrote that as 1 - INT_MIN / > 10 over on the other thread. > > The reason is that we need to detect whether tmp * base will exceed > -INT_MIN, not INT_MAX, since we're accumulating the absolute value of > a signed integer. I think I'd been too focused on the simplicity of that expression and also the base 10 part. I saw that everything worked in base 10 and failed to give enough forward thought to other bases. I now see that it was wrong-headed to code it the way I had it. Thanks for pointing this out. I've just pushed a fix. David
Re: restoring user id and SecContext before logging error in ri_PlanCheck
On Wed, Nov 02, 2022 at 08:00:58AM -0700, Zhihong Yu wrote: > Looking down in ri_PerformCheck(), I see there may be case where error from > SPI_execute_snapshot() would skip restoring UID. > @@ -2405,13 +2405,19 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo, > SECURITY_NOFORCE_RLS); > > /* Finally we can run the query. */ > - spi_result = SPI_execute_snapshot(qplan, > - vals, > nulls, > - > test_snapshot, crosscheck_snapshot, > - > false, false, limit); > - > - /* Restore UID and security context */ > - SetUserIdAndSecContext(save_userid, save_sec_context); > + PG_TRY(); > + { > + spi_result = SPI_execute_snapshot(qplan, > + > vals, nulls, > + > test_snapshot, crosscheck_snapshot, > + > false, false, limit); > + } > + PG_FINALLY(); > + { > + /* Restore UID and security context */ > + SetUserIdAndSecContext(save_userid, save_sec_context); > + } > + PG_END_TRY(); After an error, AbortSubTransaction() or AbortTransaction() will restore userid and sec_context. That makes such changes unnecessary.
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Sat, Dec 03, 2022 at 09:07:38AM +0530, Bharath Rupireddy wrote: > Yes, I removed those changes. Even if someone sees an offset of a > record within a WAL file elsewhere, they have the new utility function > (0002) pg_walfile_offset_lsn(). > > I'm attaching the v4 patch set for further review. + * Compute an LSN and timline ID given a WAL file name and decimal byte offset. s/timline/timeline/, exactly two times + Datum values[2] = {0}; + boolisnull[2] = {0}; I would not hardcode the number of attributes of the record returned. Regarding pg_walfile_offset_lsn(), I am not sure that this is the best move we can do as it is possible to compile a LSN from 0/0 with just a segment number, say: select '0/0'::pg_lsn + :segno * setting::int + :offset from pg_settings where name = 'wal_segment_size'; + resultTupleDesc = CreateTemplateTupleDesc(2); + TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn", + PG_LSNOID, -1, 0); + TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id", + INT4OID, -1, 0); Let's use get_call_result_type() to get the TupleDesc and to avoid a duplication between pg_proc.dat and this code. Hence I would tend to let XLogFromFileName do the job, while having a SQL function that is just a thin wrapper around it that returns the segment TLI and its number, leaving the offset out of the equation as well as this new XLogIdFromFileName(). -- Michael signature.asc Description: PGP signature
Re: Error-safe user functions
On Sun, Dec 04, 2022 at 06:01:33PM +0100, Vik Fearing wrote: > Once/If I get that in, I will be pushing to get that syntax in postgres as > well. If I may ask, how long would it take to know if this grammar would be integrated in the standard or not? -- Michael signature.asc Description: PGP signature
Re: mprove tab completion for ALTER EXTENSION ADD/DROP
On Sat, Dec 03, 2022 at 05:34:57PM +, Matheus Alcantara wrote: > I've tested your patched on current master and seems to be working properly. > > I'm starting reviewing some patches here, let's see what more experience > hackers > has to say about this, but as far I can tell is that is working as expected. + /* ALTER EXTENSION ADD|DROP */ + else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP")) + COMPLETE_WITH("ACCESS METHOD", "AGGREGATE", "CAST", "COLLATION", + "CONVERSION", "DOMAIN", "EVENT TRIGGER", "FOREIGN", + "FUNCTION", "MATERIALIZED VIEW", "OPERATOR", + "PROCEDURAL LANGUAGE", "PROCEDURE", "LANGUAGE", + "ROUTINE", "SCHEMA", "SEQUENCE", "SERVER", "TABLE", + "TEXT SEARCH", "TRANSFORM FOR", "TYPE", "VIEW"); + + /* ALTER EXTENSION ADD|DROP FOREIGN*/ + else if (Matches("ALTER", "EXTENSION", MatchAny, "ADD|DROP", "FOREIGN")) + COMPLETE_WITH("DATA WRAPPER", "TABLE"); The DROP could be matched with the objects that are actually part of the so-said extension? -- Michael signature.asc Description: PGP signature
doc: add missing "id" attributes to extension packaging page
Hi On this page: https://www.postgresql.org/docs/current/extend-extensions.html three of the sections are missing an "id" attribute; patch adds these. Noticed when trying to create a stable link to one of the affected sections. Regards Ian Barwick From c0dec489ec5b088d7e16b814aef9750ff271c606 Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Mon, 5 Dec 2022 10:41:05 +0900 Subject: [PATCH v1] doc: add "id" attributes to extension documentation Some of the sections here: https://www.postgresql.org/docs/current/extend-extensions.html were missing "id" attributes. --- doc/src/sgml/extend.sgml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml index 46e873a166..ba89cf5a28 100644 --- a/doc/src/sgml/extend.sgml +++ b/doc/src/sgml/extend.sgml @@ -628,7 +628,7 @@ RETURNS anycompatible AS ... dropping the whole extension. - + Extension Files @@ -1063,7 +1063,7 @@ SELECT pg_catalog.pg_extension_config_dump('my_config', 'WHERE NOT standard_entr - + Extension Updates @@ -1144,7 +1144,7 @@ SELECT * FROM pg_extension_update_paths('extension_name - + Installing Extensions Using Update Scripts base-commit: 71cb84ec69a38444c48bd8d3b5451b2da157848b -- 2.31.1
RE: Partial aggregates pushdown
Hi Mr.Pyhalov. > Attaching minor fixes. I haven't proof-read all comments (but perhaps, they > need attention from some native speaker). Thank you. I fixed according to your patch. And I fixed have proof-read all comments and messages. > Tested it with queries from > https://github.com/swarm64/s64da-benchmark-toolkit, works as expected. Thank you for additional tests. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R&D Center Mitsubishi Electric Corporation 0001-Partial-aggregates-push-down-v16.patch Description: 0001-Partial-aggregates-push-down-v16.patch
Re: docs: add missing id elements for developer GUCs
On Sat, Dec 03, 2022 at 03:58:19PM +0900, Ian Lawrence Barwick wrote: > A few of the developer option GUCs were missing the "id" attribute > in their markup, making it impossible to link to them directly. True enough that the other developer GUCs do that, so applied and backpatched down to 11. -- Michael signature.asc Description: PGP signature
Re: generic plans and "initial" pruning
On Fri, Dec 2, 2022 at 7:40 PM Amit Langote wrote: > On Thu, Dec 1, 2022 at 9:43 PM Amit Langote wrote: > > On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera > > wrote: > > > On 2022-Dec-01, Amit Langote wrote: > > > > Hmm, how about keeping the [Merge]Append's parent relation's RT index > > > > in the PartitionPruneInfo and passing it down to > > > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for > > > > cross-checking? Both Append and MergeAppend already have a > > > > 'apprelids' field that we can save a copy of in the > > > > PartitionPruneInfo. Tried that in the attached delta patch. > > > > > > Ah yeah, that sounds about what I was thinking. I've merged that in and > > > pushed to github, which had a strange pg_upgrade failure on Windows > > > mentioning log files that were not captured by the CI tooling. So I > > > pushed another one trying to grab those files, in case it wasn't an > > > one-off failure. It's running now: > > > https://cirrus-ci.com/task/5857239638999040 > > > > > > If all goes well with this run, I'll get this 0001 pushed. > > > > Thanks for pushing 0001. > > > > Rebased 0002 attached. > > Thought it might be good for PartitionPruneResult to also have > root_parent_relids that matches with the corresponding > PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check > that the root_parent_relids of a given pair of PartitionPrune{Info | > Result} match. > > Posting the patch separately as the attached 0002, just in case you > might think that the extra cross-checking would be an overkill. Rebased over 92c4dafe1eed and fixed some factual mistakes in the comment above ExecutorDoInitialPruning(). -- Thanks, Amit Langote EDB: http://www.enterprisedb.com v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch Description: Binary data v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patch Description: Binary data
Re: Collation version tracking for macOS
On Tue, Nov 29, 2022 at 7:51 PM Jeff Davis wrote: > On Sat, 2022-11-26 at 18:27 +1300, Thomas Munro wrote: > > On Thu, Nov 24, 2022 at 5:48 PM Thomas Munro > > wrote: > > > On Thu, Nov 24, 2022 at 3:07 PM Jeff Davis > > > wrote: > > > > I'd vote for 1 on the grounds that it's easier to document and > > > > understand a single collation version, which comes straight from > > > > ucol_getVersion(). This approach makes it a separate problem to > > > > find > > > > the collation version among whatever libraries the admin can > > > > provide; > > > > but adding some observability into the search should mitigate any > > > > confusion. > > > > > > OK, it sounds like I should code that up next. > > > > Here's the first iteration. > > Thank you. Thanks for the review. Responses further down. And thanks also for the really interesting discussion about how the version numbers work (or in some cases, don't work...), and practical packaging and linking problems. To have a hope of making something happen for PG16, which I think means we need a serious contender patch in the next few weeks, we really need to make some decisions. I enjoyed trying out search-by-collversion, but it's still not my favourite. On the ballot we have two main questions: 1. Should we commit to search-by-collversion, or one of the explicit library version ideas, and if the latter, which? 2. Should we try to support being specific about minor versions (in various different ways according to the choice made for #1)? My tentative votes are: 1. I think we should seriously consider provider = ICU63. I still think search-by-collversion is a little too magical, even though it clearly can be made to work. Of the non-magical systems, I think encoding the choice of library into the provider name would avoid the need to add a second confusing "X_version" concept alongside our existing "X_version" columns in catalogues and DDL syntax, while still making it super clear what is going on. This would include adding DDL commands so you can do ALTER DATABASE/COLLATION ... PROVIDER = ICU63 to make warnings go way. 2. I think we should ignore minor versions for now (other than reporting them in the relevant introspection functions), but not make any choices that would prevent us from changing our mind about that in a later release. For example, having two levels of specificity ICU and ICU68 in the libver-in-provider-name design wouldn't preclude us from adding support for ICU68_2 later I haven't actually tried that design out in code yet, but I'm willing to try to code that up very soon. So no new patch from me yet. Does anyone else want to express a view? > Proposed changes: > > * I attached a first pass of some documentation. Thanks. Looks pretty good, and much of it would stay if we changed to one of the other models. > * Should be another GUC to turn WARNING into an ERROR. Useful at least > for testing; perhaps too dangerous for production. OK, will add that into the next version. > * The libraries should be loaded in a more diliberate order. The "*" > should be expanded in a descending fashion so that later versions are > preferred. Yeah, I agree. > * GUCs should be validated. Will do. > * Should validate that loaded library has expected version. Will do. > * We need to revise or remove pg_collation_actual_version() and > pg_database_collation_actual_version(). I never liked that use of the word "actual"... > * The GUCs are PGC_SUSET, but don't take effect because > icu_library_list_fully_loaded is never reset. True. Just rought edges because I was trying to prototype search-by-collversion fast. Will consider this for the next version. > * The extra collations you're adding at bootstrap time are named based > on the library major version. I suppose it might be more "proper" to > name them based on the collation version, but that would be more > verbose, so I won't advocate for that. Just pointing it out. Ah, yes, the ones with names like "en-US-x-icu68". I agree that made a little less sense in the search-by-collversion patch. Maybe we wouldn't want these at all in the search-by-collversion model. But I think they're perfect the way they are in the provider = ICU68 model. The other idea I considered ages ago was that we could use namespaces: you could "icu68.en-US", or just "en-US" in some contexts to get what your search path sees, but that all seemed a little too cute and not really like anything else we do with system-created catalogues, so I gave that idea up. > * It looks hard (or impossible) to mix multiple ICU libraries with the > same major version and different minor versions. That's because, > e.g., libicui18n.so.63.1 links against libicuuc.63 and libicudata.63, > and when you install ICU 63.2, those dependencies get clobbered with > the 63.2 versions. That fails the sanity check I proposed above about > the library version number matching the requested library version > number. And it also just seems wro
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Mon, Dec 5, 2022 at 6:34 AM Michael Paquier wrote: > > Regarding pg_walfile_offset_lsn(), I am not sure that this is the best > move we can do as it is possible to compile a LSN from 0/0 with just a > segment number, say: > select '0/0'::pg_lsn + :segno * setting::int + :offset > from pg_settings where name = 'wal_segment_size'; Nice. > + resultTupleDesc = CreateTemplateTupleDesc(2); > + TupleDescInitEntry(resultTupleDesc, (AttrNumber) 1, "lsn", > + PG_LSNOID, -1, 0); > + TupleDescInitEntry(resultTupleDesc, (AttrNumber) 2, "timeline_id", > + INT4OID, -1, 0); > Let's use get_call_result_type() to get the TupleDesc and to avoid a > duplication between pg_proc.dat and this code. > > Hence I would tend to let XLogFromFileName do the job, while having a > SQL function that is just a thin wrapper around it that returns the > segment TLI and its number, leaving the offset out of the equation as > well as this new XLogIdFromFileName(). So, a SQL function pg_dissect_walfile_name (or some other better name) given a WAL file name returns the tli and seg number. Then the pg_walfile_offset_lsn can just be a SQL-defined function (in system_functions.sql) using this new function and pg_settings. If this understanding is correct, it looks good to me at this point. That said, let's also hear from others. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Sun, Dec 4, 2022 at 5:14 PM houzj.f...@fujitsu.com wrote: > > On Saturday, December 3, 2022 7:37 PM Amit Kapila > wrote: > > Apart from the above, I have slightly adjusted the comments in the > > attached. Do > > let me know what you think of the attached. > > Thanks for updating the patch. It looks good to me. > I feel the function name ReorderBufferLargestTopTXN() is slightly misleading because it also checks some of the streaming properties (like whether the TXN has partial changes and whether it contains any streamable change). Shall we rename it to ReorderBufferLargestStreamableTopTXN() or something like that? The other point to consider is whether we need to have a test case for this patch. I think before this patch if the size of DDL changes in a transaction exceeds logical_decoding_work_mem, the empty streams will be output in the plugin but after this patch, there won't be any such stream. -- With Regards, Amit Kapila.
Re: Allow placeholders in ALTER ROLE w/o superuser
On Thu, Dec 1, 2022 at 6:14 AM Alexander Korotkov wrote: > On Wed, Nov 23, 2022 at 1:53 AM Steve Chavez wrote: > > So from my side this all looks good! > > Thank you for your feedback. > > The next revision of the patch is attached. It contains code > improvements, comments and documentation. I'm going to also write > sode tests. pg_db_role_setting doesn't seem to be well-covered with > tests. I will probably need to write a new module into > src/tests/modules to check now placeholders interacts with dynamically > defined GUCs. Another revision of patch is attached. It's fixed now that USER SET values can't be used for PGC_SUSET parameters. Tests are added. That require new module test_pg_db_role_setting to check dynamically defined GUCs. -- Regards, Alexander Korotkov 0001-USER-SET-parameters-for-pg_db_role_setting-v3.patch Description: Binary data
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Mon, Dec 5, 2022 at 8:59 AM Amit Kapila wrote: > > On Sun, Dec 4, 2022 at 5:14 PM houzj.f...@fujitsu.com > wrote: > > > > On Saturday, December 3, 2022 7:37 PM Amit Kapila > > wrote: > > > Apart from the above, I have slightly adjusted the comments in the > > > attached. Do > > > let me know what you think of the attached. > > > > Thanks for updating the patch. It looks good to me. > > > > I feel the function name ReorderBufferLargestTopTXN() is slightly > misleading because it also checks some of the streaming properties > (like whether the TXN has partial changes and whether it contains any > streamable change). Shall we rename it to > ReorderBufferLargestStreamableTopTXN() or something like that? Yes that makes sense > The other point to consider is whether we need to have a test case for > this patch. I think before this patch if the size of DDL changes in a > transaction exceeds logical_decoding_work_mem, the empty streams will > be output in the plugin but after this patch, there won't be any such > stream. Yes, we can do that, I will make these two changes. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Optimize common expressions in projection evaluation
> the need for this code seems not that great. But as to the code itself I'm > unable to properly judge. A simplified version of my use case is like this: CREATE FOREIGN TABLE ft(rawdata json); INSERT INTO tbl SELECT (convert_func(rawdata)).* FROM ft; We have a foreign data source that can emit json data in different formats. We need different convert_func to extract the actual fields out. The client know which function to use, but as the each json may have hundreds of columns, the performance is very poor. Best regards, Peifeng Qiu
Re: Optimize common expressions in projection evaluation
On Sun, Dec 4, 2022 at 9:00 PM Peifeng Qiu wrote: > > the need for this code seems not that great. But as to the code itself > I'm unable to properly judge. > A simplified version of my use case is like this: > CREATE FOREIGN TABLE ft(rawdata json); > INSERT INTO tbl SELECT (convert_func(rawdata)).* FROM ft; > > Which is properly written as the following, using lateral, which also avoids the problem you describe: INSERT INTO tbl SELECT func_call.* FROM ft JOIN LATERAL convert_func(ft.rawdata) AS func_call ON true; David J.
Re: Bug in row_number() optimization
On Fri, 2 Dec 2022 at 00:21, Sergey Shinderuk wrote: > Maybe I'm missing something, but the previous call to spool_tuples() > might have read extra tuples (if the tuplestore spilled to disk), and > after switching to WINDOWAGG_PASSTHROUGH_STRICT mode we nevertheless > would loop through these extra tuples and call ExecProject if only to > increment winstate->currentpos. The tuples which are spooled in the WindowAgg node are the ones from the WindowAgg's subnode. Since these don't contain the results of the WindowFunc, then I don't think there's any issue with what's stored in any of the spooled tuples. What matters is what we pass along to the node that's reading from the WindowAgg. If we NULL out the memory where we store the WindowFunc (and maybe an Aggref) results then the ExecProject in ExecWindowAgg() will no longer fill the WindowAgg's output slot with the address of free'd memory (or some stale byval value which has lingered for byval return type WindowFuncs). Since the patch I sent sets the context's ecxt_aggnulls to true, it means that when we do the ExecProject(), the EEOP_WINDOW_FUNC in ExecInterpExpr (or the JIT equivalent) will put an SQL NULL in the *output* slot for the WindowAgg node. The same is true for EEOP_AGGREFs as the WindowAgg node that we are running in WINDOWAGG_PASSTHROUGH mode could also contain normal aggregate functions, not just WindowFuncs. David
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Mon, Dec 05, 2022 at 08:48:25AM +0530, Bharath Rupireddy wrote: > So, a SQL function pg_dissect_walfile_name (or some other better name) > given a WAL file name returns the tli and seg number. Then the > pg_walfile_offset_lsn can just be a SQL-defined function (in > system_functions.sql) using this new function and pg_settings. If this > understanding is correct, it looks good to me at this point. I would do without the SQL function that looks at pg_settings, FWIW. > That said, let's also hear from others. Sure. Perhaps my set of suggestions will not get the majority, though.. -- Michael signature.asc Description: PGP signature
Re: Bug in row_number() optimization
On Mon, 28 Nov 2022 at 22:59, Sergey Shinderuk wrote: > > On 28.11.2022 03:23, David Rowley wrote: > > On Sat, 26 Nov 2022 at 05:19, Tom Lane wrote: > >> It's pretty unlikely that this would work during an actual index scan. > >> I'm fairly sure that btree (and other index AMs) have hard-wired > >> assumptions that index operators are strict. > > > > If we're worried about that then we could just restrict this > > optimization to only work with strict quals. > > Not sure this is necessary if btree operators must be strict anyway. I'd rather see the func_strict() test in there. You've already demonstrated you can get wrong results with a non-strict operator. I'm not disputing that it sounds like a broken operator class or not. I just want to ensure we don't leave any holes open for this optimisation to return incorrect results. David
Re: Optimize common expressions in projection evaluation
Peifeng Qiu writes: >> the need for this code seems not that great. But as to the code itself I'm >> unable to properly judge. > A simplified version of my use case is like this: > CREATE FOREIGN TABLE ft(rawdata json); > INSERT INTO tbl SELECT (convert_func(rawdata)).* FROM ft; It might be worth noting that the code as we got it from Berkeley could do this scenario without multiple evaluations of convert_func(). Memory is foggy, but I believe it involved essentially a two-level targetlist. Unfortunately, the scheme was impossibly baroque and buggy, so we eventually ripped it out altogether in favor of the multiple-evaluation behavior you see today. I think that commit 62e29fe2e might have been what ripped it out, but I'm not quite sure. It's about the right time-frame, anyway. I mention this because trying to reverse-engineer this situation in execExpr seems seriously ugly and inefficient, even assuming you can make it non-buggy. The right solution has to involve never expanding foo().* into duplicate function calls in the first place, which is the way it used to be. Maybe if you dug around in those twenty-year-old changes you could get some inspiration. I tend to agree with David that LATERAL offers a good-enough solution in most cases ... but it is annoying that we accept this syntax and then pessimize it. regards, tom lane
Re: Optimize common expressions in projection evaluation
po 5. 12. 2022 v 5:28 odesílatel Tom Lane napsal: > Peifeng Qiu writes: > >> the need for this code seems not that great. But as to the code itself > I'm unable to properly judge. > > > A simplified version of my use case is like this: > > CREATE FOREIGN TABLE ft(rawdata json); > > INSERT INTO tbl SELECT (convert_func(rawdata)).* FROM ft; > > It might be worth noting that the code as we got it from Berkeley > could do this scenario without multiple evaluations of convert_func(). > Memory is foggy, but I believe it involved essentially a two-level > targetlist. Unfortunately, the scheme was impossibly baroque and > buggy, so we eventually ripped it out altogether in favor of the > multiple-evaluation behavior you see today. I think that commit > 62e29fe2e might have been what ripped it out, but I'm not quite > sure. It's about the right time-frame, anyway. > > I mention this because trying to reverse-engineer this situation > in execExpr seems seriously ugly and inefficient, even assuming > you can make it non-buggy. The right solution has to involve never > expanding foo().* into duplicate function calls in the first place, > which is the way it used to be. Maybe if you dug around in those > twenty-year-old changes you could get some inspiration. > > I tend to agree with David that LATERAL offers a good-enough > solution in most cases ... but it is annoying that we accept > this syntax and then pessimize it. > I agree, so there is a perfect solution like don't use .*, but on second hand any supported syntax should be optimized well or we should raise some warning about negative performance impact. Today there are a lot of baroque technologies in the stack so it is hard to remember all good practices and it is hard for newbies to take this knowledge. We should reduce possible performance traps when it is possible. Regards Pavel > > regards, tom lane > > >
Re: Bug in row_number() optimization
On Thu, 1 Dec 2022 at 21:18, Richard Guo wrote: >> + if (!func_strict(opexpr->opfuncid)) >> + return false; >> >> Should return true instead? > > > Yeah, you're right. This should be a thinko. Yeah, oops. That's wrong. I've adjusted that in the attached. I'm keen to move along and push the fix for this bug. If there are no objections to the method in the attached and also adding the restriction to limit the optimization to only working with strict OpExprs, then I'm going to push this, likely about 24 hours from now. David diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 81ba024bba..110c594edd 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -2300,7 +2300,27 @@ ExecWindowAgg(PlanState *pstate) continue; } else + { winstate->status = WINDOWAGG_PASSTHROUGH; + + /* +* Otherwise, if we're not the top-window, we'd better +* NULLify the aggregate results, else, by-ref result +* types will point to freed memory. We can get away +* without storing the final result of the WindowFunc +* because in the planner we insisted that the +* runcondition is strict. Setting these to NULL will +* ensure the top-level WindowAgg filter will always +* filter out the rows produced in this WindowAgg when +* not in WINDOWAGG_RUN state. +*/ + numfuncs = winstate->numfuncs; + for (i = 0; i < numfuncs; i++) + { + econtext->ecxt_aggvalues[i] = (Datum) 0; + econtext->ecxt_aggnulls[i] = true; + } + } } else { diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 03ee6dc832..595870ea30 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -2399,6 +2399,24 @@ check_and_push_window_quals(Query *subquery, RangeTblEntry *rte, Index rti, if (list_length(opexpr->args) != 2) return true; + /* +* Currently, we restrict this optimization to strict OpExprs. The reason +* for this is that during execution, once the runcondition becomes false, +* we stop evaluating WindowFuncs in WindowAgg nodes and since we're no +* longer updating them, we NULLify the aggregate and window aggregate +* results to prevent by-ref result typed window aggregates from pointing +* to free'd memory for byref WindowFuncs and storing stale values for +* byval WindowFuncs (remember that window functions such as row_number() +* return a byref type on non-FLOAT8PASSBYVAL builds). Upper-level +* WindowAgg nodes may make reference to these results in their filter +* clause and we can ensure the filter remain false by making sure the +* operator is strict and by performing the NULLification in the executor +* when the runcondition first becomes false. +*/ + set_opfuncid(opexpr); + if (!func_strict(opexpr->opfuncid)) + return true; + /* * Check for plain Vars that reference window functions in the subquery. * If we find any, we'll ask find_window_run_conditions() if 'opexpr' can
Re: Optimize common expressions in projection evaluation
Pavel Stehule writes: > po 5. 12. 2022 v 5:28 odesílatel Tom Lane napsal: >> I tend to agree with David that LATERAL offers a good-enough >> solution in most cases ... but it is annoying that we accept >> this syntax and then pessimize it. > I agree, so there is a perfect solution like don't use .*, but on second > hand any supported syntax should be optimized well or we should raise some > warning about negative performance impact. Yeah. I wonder if we could get the parser to automatically transform SELECT (foo(t.x)).* FROM tab t into SELECT f.* FROM tab t, LATERAL foo(t.x) f There are probably cases where this doesn't work or changes the semantics subtly, but I suspect it could be made to work in most cases of practical interest. regards, tom lane
Re: Optimize common expressions in projection evaluation
On Sun, Dec 4, 2022 at 9:37 PM Pavel Stehule wrote: > > po 5. 12. 2022 v 5:28 odesílatel Tom Lane napsal: > >> Peifeng Qiu writes: >> >> the need for this code seems not that great. But as to the code >> itself I'm unable to properly judge. >> >> I mention this because trying to reverse-engineer this situation >> in execExpr seems seriously ugly and inefficient, even assuming >> you can make it non-buggy. The right solution has to involve never >> expanding foo().* into duplicate function calls in the first place, >> which is the way it used to be. Maybe if you dug around in those >> twenty-year-old changes you could get some inspiration. >> >> I tend to agree with David that LATERAL offers a good-enough >> solution in most cases ... but it is annoying that we accept >> this syntax and then pessimize it. >> > > I agree, so there is a perfect solution like don't use .*, but on second > hand any supported syntax should be optimized well or we should raise some > warning about negative performance impact. > > Yeah, I phrased that poorly. I agree that having this problem solved would be beneficial to the project. But, and this is probably a bit unfair to the OP, if Tom decided to implement LATERAL after years of hearing complaints I doubted this patch was going to be acceptable. The benefit/cost ratio of fixing this in code just doesn't seem to be high enough to try at this point. But I'd be happy to be proven wrong. And I readily admit both a lack of knowledge, and that as time has passed maybe we've improved the foundations enough to implement something here. Otherwise, we can maybe improve the documentation. There isn't any way (that the project would accept anyway) to actually warn the user at runtime. If we go that route we should probably just disallow the syntax outright. David J.
Re: [PATCH] Add native windows on arm64 support
On Fri, Dec 02, 2022 at 11:09:15AM +, Niyas Sait wrote: > I've attached a new revision of the patch (v5) and includes following > changes, > > 1. Add support for meson build system > 2. Extend MSVC scripts to handle ARM64 platform. > 3. Add arm64 definition of spin_delay function. > 4. Exclude arm_acle.h import with MSVC compiler. Hmm. There are still a few things that need some attention here: - USE_SSE42_CRC32C_WITH_RUNTIME_CHECK should not be set for aarch64. - This is missing updates for ICU. Looking at the upstream code, Build.Windows.ProjectConfiguration.props uses libARM64 and binARM64 for the output library and binary paths. - This is missing updates for krb5. For this case, I am seeing no traces of packages for aarch64, so I guess that we could just fail hard until someone cares enough to ping us about what to do here. - There were zero changes in the docs, but we need to update at least the section about architectures supported for the 64-bit builds. - Last comes OpenSSL, that supports amd64_arm64 as build target (see NOTES-WINDOWS.md), and the library names are libssl.lib and libcrypto.lib. Looking at https://slproweb.com/products/Win32OpenSSL.html, there are experimental builds for arm64 with OpenSSL 3.0. Niyas or somebody else, could you look at the contents of lib/VC/ and see what we could rely on for the debug builds after installing this MSI? We should rely on something like lib/VC/sslcrypto64MD.lib or lib/VC/sslcrypto32MD.lib, but for arm64. With meson gaining in maturity, perhaps that's not the most urgent thing as we will likely remove src/tools/msvc/ soon but I'd rather do that right anyway as much as I can to avoid an incorrect state in the tree at any time in its history. - USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => undef, + USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK => $self->{platform} eq "ARM64" ? : 1 : undef, Did you actually test this patch? This won't work at all with perl, per se the double colon after the question mark. For now, please find attached an updated patch with all the fixes I could come up with. -- Michael From 9cb858cfdb25c3a119a8b28321cf3218eac282c5 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 5 Dec 2022 14:09:29 +0900 Subject: [PATCH v6] Enable postgres native build for windows-arm64 platform Following changes are included - Extend MSVC scripts to handle ARM64 platform - Add arm64 definition of spin_delay function - Exclude arm_acle.h import for MSVC - Add support for meson build --- src/include/storage/s_lock.h | 10 +- src/port/pg_crc32c_armv8.c| 2 ++ doc/src/sgml/install-windows.sgml | 3 ++- meson.build | 33 +++ src/tools/msvc/MSBuildProject.pm | 16 +++ src/tools/msvc/Mkvcbuild.pm | 9 +++-- src/tools/msvc/Solution.pm| 30 ++-- src/tools/msvc/gendef.pl | 8 8 files changed, 80 insertions(+), 31 deletions(-) diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 8b19ab160f..ab6a6e0281 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -708,13 +708,21 @@ typedef LONG slock_t; #define SPIN_DELAY() spin_delay() /* If using Visual C++ on Win64, inline assembly is unavailable. - * Use a _mm_pause intrinsic instead of rep nop. + * Use _mm_pause (x64) or __isb (arm64) intrinsic instead of rep nop. */ #if defined(_WIN64) static __forceinline void spin_delay(void) { +#ifdef _M_ARM64 + /* + * See spin_delay aarch64 inline assembly definition above for details + * ref: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics#BarrierRestrictions + */ + __isb(_ARM64_BARRIER_SY); +#else _mm_pause(); +#endif } #else static __forceinline void diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c index 9e301f96f6..981718752f 100644 --- a/src/port/pg_crc32c_armv8.c +++ b/src/port/pg_crc32c_armv8.c @@ -14,7 +14,9 @@ */ #include "c.h" +#ifndef _MSC_VER #include +#endif #include "port/pg_crc32c.h" diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml index bbd4960e7b..3f865d7d3b 100644 --- a/doc/src/sgml/install-windows.sgml +++ b/doc/src/sgml/install-windows.sgml @@ -352,7 +352,8 @@ $ENV{MSBFLAGS}="/m"; Special Considerations for 64-Bit Windows - PostgreSQL will only build for the x64 architecture on 64-bit Windows. + PostgreSQL will only build for the x64 and ARM64 architectures on 64-bit + Windows. diff --git a/meson.build b/meson.build index 725e10d815..e354ad7650 100644 --- a/meson.build +++ b/meson.build @@ -1944,7 +1944,13 @@ int main(void) elif host_cpu == 'arm' or host_cpu == 'aarch64' - prog = ''' + if cc.get_id() == 'msvc' +cdata.set('USE_ARMV8_CRC32C', false) +cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) +have_optimized_crc = true + else + +prog = ''' #include int main(void) @@ -1960,18 +1966,
Re: generic plans and "initial" pruning
On Mon, Dec 5, 2022 at 12:00 PM Amit Langote wrote: > On Fri, Dec 2, 2022 at 7:40 PM Amit Langote wrote: > > Thought it might be good for PartitionPruneResult to also have > > root_parent_relids that matches with the corresponding > > PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check > > that the root_parent_relids of a given pair of PartitionPrune{Info | > > Result} match. > > > > Posting the patch separately as the attached 0002, just in case you > > might think that the extra cross-checking would be an overkill. > > Rebased over 92c4dafe1eed and fixed some factual mistakes in the > comment above ExecutorDoInitialPruning(). Sorry, I had forgotten to git-add hunks including some cosmetic changes in that one. Here's another version. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patch Description: Binary data v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch Description: Binary data
Re: Failed Assert while pgstat_unlink_relation
At Thu, 1 Dec 2022 19:23:28 -0800, Andres Freund wrote in > > AtEOXact_PgStat_DroppedStats() visits a relation that has been dropped > > then wiped (when CLOBBER_FREED_MEMORY) by AtEOXact_RelationCache() > > called just before. Since the relcache content directly pointed from > > stats module is lost in this case, the stats side cannot defend itself > > from this. > > > > Perhaps RelationDestroyRelation() need to do pgstat_drop_entry() or > > AtEOXact_PgStat_DroppedStats() should be called before > > AtEOXact_RelationCache(). the latter of which is simpler. I think we > > need to test this case, too. > > This doesn't strike me as the right fix. What do you think about my patch at > https://postgr.es/m/20221128210908.hyffmljjylj447nu%40awork3.anarazel.de , > leaving the quibbles around error handling aside? Yeah, I didn't like what my patch does... > Afaict it fixes the issue. Hmm. I see it works for this specific case, but I don't understand why it is generally safe. The in-xact created relation t1 happened to be scanned during the CREATE RULE and a stats entry is attached. So the stats entry loses t1 at roll-back, then crashes. Thus, if I understand it correctly, it seems to me that just unlinking the stats from t1 (when relkind is changed) works. But the fix doesn't change the behavior in relkind-not-changing cases. If an in-xact-created table gets a stats entry then the relcache entry for t1 is refreshed to a table relation again then the transaction rolls back, crash will happen for the same reason. I'm not sure if there is such a case actually. When I tried to check that behavior further, I found that that CREATE ROLE is no longer allowed.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Question regarding "Make archiver process an auxiliary process. commit"
Hi, I see that in the archiver code, in the function pgarch_MainLoop, the archiver sleeps for a certain time or until there's a signal. The time it sleeps for is represented by: timeout = PGARCH_AUTOWAKE_INTERVAL - (curtime - last_copy_time); It so happens that last_copy_time and curtime are always set at the same time which always makes timeout equal (actually roughly equal) to PGARCH_AUTOWAKE_INTERVAL. I see that this behaviour was introduced as a part of the commit: d75288fb27b8fe0a926aaab7d75816f091ecdc27. The discussion thread is: https://postgr.es/m/20180629.173418.190173462.horiguchi.kyot...@lab.ntt.co.jp The change was introduced in v31, with the following comment in the discussion thread: - pgarch_MainLoop start the loop with wakened = true when both notified or timed out. Otherwise time_to_stop is set and exits from the loop immediately. So the variable wakened is actually useless. Removed it. This behaviour was different before the commit: d75288fb27b8fe0a926aaab7d75816f091ecdc27, in which the archiver keeps track of how much time has elapsed since last_copy_time in case there was a signal, and it results in a smaller subsequent value of timeout, until timeout is zero. This also avoids calling pgarch_ArchiverCopyLoop before PGARCH_AUTOWAKE_INTERVAL in case there's an intermittent signal. With the current changes it may be okay to always sleep for PGARCH_AUTOWAKE_INTERVAL, but that means curtime and last_copy_time are no more needed. I would like to validate if my understanding is correct, and which of the behaviours we would like to retain. Thanks & Regards, Sravan Velagandula EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add LZ4 compression in pg_dump
On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote: > While this is correct in checking that the contents are compressed > under --with-zlib, this also removes the coverage where we make sure > that this command is able to complete under --without-zlib without > compressing any of the table data files. Hence my point from > upthread: this test had better not use compile_option, but change > glob_pattern depending on if the build uses zlib or not. In short, I mean something like the attached. I have named the flag content_patterns, and switched it to an array so as we can check that toc.dat is always uncompression and that the other data files are always uncompressed. > In order to check this behavior with defaults_custom_format, perhaps > we could just remove the -Z6 from it or add an extra command for its > default behavior? This is slightly more complicated as there is just one file generated for the compression and non-compression cases, so I have let that as it is now. -- Michael From 5c583358caed5598fec9abea6750ff7fbd98d269 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 5 Dec 2022 16:04:57 +0900 Subject: [PATCH v15] Provide coverage for pg_dump default compression for dir format The restore program will succeed regardless of whether the dumped output was compressed or not. This commit implements a portable way to check the contents of the directory via perl's build in filename expansion. --- src/bin/pg_dump/t/002_pg_dump.pl | 29 + 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index 709db0986d..9796d2667f 100644 --- a/src/bin/pg_dump/t/002_pg_dump.pl +++ b/src/bin/pg_dump/t/002_pg_dump.pl @@ -36,6 +36,9 @@ my $tempdir = PostgreSQL::Test::Utils::tempdir; # to test pg_restore's ability to parse manually compressed files # that otherwise pg_dump does not compress on its own (e.g. *.toc). # +# content_patterns is an optional array consisting of strings compilable +# with glob() to check the files generated after a dump. +# # restore_cmd is the pg_restore command to run, if any. Note # that this should generally be used when the pg_dump goes to # a non-text file and that the restore can then be used to @@ -46,6 +49,10 @@ my $tempdir = PostgreSQL::Test::Utils::tempdir; # database and then pg_dump *that* database (or something along # those lines) to validate that part of the process. +my $supports_icu = ($ENV{with_icu} eq 'yes'); +my $supports_lz4 = check_pg_config("#define USE_LZ4 1"); +my $supports_gzip = check_pg_config("#define HAVE_LIBZ 1"); + my %pgdump_runs = ( binary_upgrade => { dump_cmd => [ @@ -213,6 +220,9 @@ my %pgdump_runs = ( }, # Do not use --no-sync to give test coverage for data sync. + # By default, the directory format compresses its contents + # when the code is compiled with gzip support, and lets things + # uncompressed when not compiled with it. defaults_dir_format => { test_key => 'defaults', dump_cmd => [ @@ -224,6 +234,11 @@ my %pgdump_runs = ( "--file=$tempdir/defaults_dir_format.sql", "$tempdir/defaults_dir_format", ], + content_patterns => ["$tempdir/defaults_dir_format/toc.dat", + $supports_gzip ? + "$tempdir/defaults_dir_format/*.dat.gz" : + "$tempdir/defaults_dir_format/*.dat", + ], }, # Do not use --no-sync to give test coverage for data sync. @@ -3920,10 +3935,6 @@ if ($collation_check_stderr !~ /ERROR: /) $collation_support = 1; } -my $supports_icu = ($ENV{with_icu} eq 'yes'); -my $supports_lz4 = check_pg_config("#define USE_LZ4 1"); -my $supports_gzip = check_pg_config("#define HAVE_LIBZ 1"); - # ICU doesn't work with some encodings my $encoding = $node->safe_psql('postgres', 'show server_encoding'); $supports_icu = 0 if $encoding eq 'SQL_ASCII'; @@ -4153,6 +4164,16 @@ foreach my $run (sort keys %pgdump_runs) command_ok(\@full_compress_cmd, "$run: compression commands"); } + if ($pgdump_runs{$run}->{content_patterns}) + { + my $content_patterns = $pgdump_runs{$run}->{content_patterns}; + foreach my $content_pattern (@{$content_patterns}) + { + my @glob_output = glob($content_pattern); + is(scalar(@glob_output) > 0, 1, "$run: content check for $content_pattern"); + } + } + if ($pgdump_runs{$run}->{restore_cmd}) { $node->command_ok(\@{ $pgdump_runs{$run}->{restore_cmd} }, -- 2.38.1 signature.asc Description: PGP signature
Re: Generate pg_stat_get_* functions with Macros
Hi, On 12/4/22 6:32 PM, Nathan Bossart wrote: On Sun, Dec 04, 2022 at 06:07:37AM +0100, Drouvot, Bertrand wrote: On 12/3/22 9:16 PM, Nathan Bossart wrote: Thanks. I editorialized a bit in the attached v3. I'm not sure that my proposed names for the macros are actually an improvement. WDYT? Thanks! I do prefer the macros definition ordering that you're proposing (that makes pgstatfuncs.c "easier" to read). As far the names, I think it's better to replace "TAB" with "REL" (like in v4 attached): the reason is that those macros will be used in [1] for both tables and indexes stats (and so we'd have to replace "TAB" with "REL" in [1]). Having "REL" already in place reduces the changes that will be needed in [1]. Alright. I marked this as ready-for-committer. Thanks! -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Generate pg_stat_get_* functions with Macros
On Mon, Dec 05, 2022 at 08:27:15AM +0100, Drouvot, Bertrand wrote: > On 12/4/22 6:32 PM, Nathan Bossart wrote: >> Alright. I marked this as ready-for-committer. > > Thanks! Well, that's kind of nice: 5 files changed, 139 insertions(+), 396 deletions(-) And I like removing code, so.. In the same area, I am counting a total of 21 (?) pgstat routines for databases that rely on pgstat_fetch_stat_dbentry() while returning an int64. This would lead to more cleanup. -- Michael signature.asc Description: PGP signature