Re: CREATE SUBSCRIPTION - add missing test case
On Tue, 20 Aug 2024 at 08:21, Peter Smith wrote: > > On Fri, Aug 16, 2024 at 2:15 PM vignesh C wrote: > > > > Thanks for the review. > > > > > I agree currently there is no test to hit this code. I'm not sure if > > this is the correct location for the test, should it be included in > > the 008_diff_schema.pl file? > > Yes, that is a better home for this test. Done as suggested in > attached patch v2. Thanks, this looks good to me. Regards, Vignesh
Re: Fix memory counter update in reorderbuffer
Hi, On Fri, Aug 16, 2024 at 12:22 AM Shlok Kyal wrote: > > On Wed, 7 Aug 2024 at 11:48, Amit Kapila wrote: > > > > On Wed, Aug 7, 2024 at 7:42 AM Masahiko Sawada > > wrote: > > > > > > On Tue, Aug 6, 2024 at 1:12 PM Amit Kapila > > > wrote: > > > > > > > > On Sat, Aug 3, 2024 at 1:21 AM Masahiko Sawada > > > > wrote: > > > > > > > > > > I found a bug in the memory counter update in reorderbuffer. It was > > > > > introduced by commit 5bec1d6bc5e, so pg17 and master are affected. > > > > > > > > > > In ReorderBufferCleanupTXN() we zero the transaction size and then > > > > > free the transaction entry as follows: > > > > > > > > > > /* Update the memory counter */ > > > > > ReorderBufferChangeMemoryUpdate(rb, NULL, txn, false, txn->size); > > > > > > > > > > /* deallocate */ > > > > > ReorderBufferReturnTXN(rb, txn); > > > > > > > > > > > > > Why do we need to zero the transaction size explicitly? Shouldn't it > > > > automatically become zero after freeing all the changes? > > > > > > It will become zero after freeing all the changes. However, since > > > updating the max-heap when freeing each change could bring some > > > overhead, we freed the changes without updating the memory counter, > > > and then zerod it. > > > > > > > I think this should be covered in comments as it is not apparent. > > > > > > > > > BTW, commit 5bec1d6bc5e also introduced > > > > ReorderBufferChangeMemoryUpdate() in ReorderBufferTruncateTXN() which > > > > is also worth considering while fixing the reported problem. It may > > > > not have the same problem as you have reported but we can consider > > > > whether setting txn size as zero explicitly is required or not. > > > > > > The reason why it introduced ReorderBufferChangeMemoryUpdate() is the > > > same as I mentioned above. And yes, as you mentioned, it doesn't have > > > the same problem that I reported here. > > > > > > > I checked again and found that ReorderBufferResetTXN() first calls > > ReorderBufferTruncateTXN() and then ReorderBufferToastReset(). After > > that, it also tries to free spec_insert change which should have the > > same problem. So, what saves this path from the same problem? > > I tried testing this scenario and I was able to reproduce the crash in > HEAD with this scenario. I have created a patch for the testcase. > I also tested the same scenario with the latest patch shared by > Sawada-san in [1]. And confirm that this resolves the issue. Thank you for testing the patch! I'm slightly hesitant to add a test under src/test/subscription since it's a bug in ReorderBuffer and not specific to logical replication. If we reasonably cannot add a test under contrib/test_decoding, I'm okay with adding it under src/test/subscription. I've attached the updated patch with the commit message (but without a test case for now). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com v3-0001-Fix-memory-counter-update-in-ReorderBuffer.patch Description: Binary data
Re: define PG_REPLSLOT_DIR
On Tue, 20 Aug 2024 17:47:57 +0900 Michael Paquier wrote: > On Mon, Aug 19, 2024 at 02:11:55PM +, Bertrand Drouvot wrote: > > I made the changes for pg_tblspc in pg_combinebackup.c as the number of > > occurences > > are greater that the "pg_wal" ones and we were to define PG_TBLSPC_DIR in > > any > > case. > > > > Please find attached the related patches. > > No real objection about the replslot and pg_logical bits. > > - *$PGDATA/pg_tblspc/spcoid/PG_MAJORVER_CATVER/dboid/relfilenumber > + * > $PGDATA/PG_TBLSPC_DIR/spcoid/PG_MAJORVER_CATVER/dboid/relfilenumber > > For the tablespace parts, I am not sure that I would update the > comments to reflect the variables, TBH. Somebody reading the comments > would need to refer back to pg_tblspc/ in the header. I also think that it is not necessary to change the comments even for pg_replslot. - * Each replication slot gets its own directory inside the $PGDATA/pg_replslot + * Each replication slot gets its own directory inside the $PGDATA/PG_REPLSLOT_DIR For example, I found that comments in xlog.c use "pg_wal" even though XLOGDIR is used in the codes as below, and I don't feel any problem for this. > static void > ValidateXLOGDirectoryStructure(void) > { > charpath[MAXPGPATH]; > struct stat stat_buf; > > /* Check for pg_wal; if it doesn't exist, error out */ > if (stat(XLOGDIR, &stat_buf) != 0 || > !S_ISDIR(stat_buf.st_mode)) Should be the follwing also rewritten using sizeof(PG_REPLSLOT_DIR)? struct stat statbuf; charpath[MAXPGPATH * 2 + 12]; Regards, Yugo Nagata -- Yugo Nagata
Re: Track the amount of time waiting due to cost_delay
Hi, On Mon, Jul 01, 2024 at 04:59:25AM +, Bertrand Drouvot wrote: > Hi, > > On Fri, Jun 28, 2024 at 08:07:39PM +, Imseih (AWS), Sami wrote: > > > 46ebdfe164 will interrupt the leaders sleep every time a parallel workers > > > reports > > > progress, and we currently don't handle interrupts by restarting the > > > sleep with > > > the remaining time. nanosleep does provide the ability to restart with > > > the remaining > > > time [1], but I don't think it's worth the effort to ensure more accurate > > > vacuum delays for the leader process. > > > > After discussing offline with Bertrand, it may be better to have > > a solution to deal with the interrupts and allows the sleep to continue to > > completion. This will simplify this patch and will be useful > > for other cases in which parallel workers need to send a message > > to the leader. This is the thread [1] for that discussion. > > > > [1] > > https://www.postgresql.org/message-id/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-00%40email.amazonses.com > > > > Yeah, I think it would make sense to put this thread on hold until we know > more > about [1] (you mentioned above) outcome. As it looks like we have a consensus not to wait on [0] (as reducing the number of interrupts makes sense on its own), then please find attached v4, a rebase version (that also makes clear in the doc that that new field might show slightly old values, as mentioned in [1]). [0]: https://www.postgresql.org/message-id/flat/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-00%40email.amazonses.com [1]: https://www.postgresql.org/message-id/ZruMe-ppopQX4uP8%40nathan Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 90196125d1262095d02f0df74bb6cab0d03c75ff Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Mon, 24 Jun 2024 08:43:26 + Subject: [PATCH v4] Report the total amount of time that vacuum has been delayed due to cost delay This commit adds one column: time_delayed to the pg_stat_progress_vacuum system view to show the total amount of time in milliseconds that vacuum has been delayed. This uses the new parallel message type for progress reporting added by f1889729dd. In case of parallel worker, to avoid the leader to be interrupted too frequently (while it might be sleeping for cost delay), the report is done only if the last report has been done more than 1 second ago. Having a time based only approach to throttle the reporting of the parallel workers sounds reasonable. Indeed when deciding about the throttling: 1. The number of parallel workers should not come into play: 1.1) the more parallel workers is used, the less the impact of the leader on the vacuum index phase duration/workload is (because the repartition is done on more processes). 1.2) the less parallel workers is, the less the leader will be interrupted ( less parallel workers would report their delayed time). 2. The cost limit should not come into play as that value is distributed proportionally among the parallel workers (so we're back to the previous point). 3. The cost delay does not come into play as the leader could be interrupted at the beginning, the midle or whatever part of the wait and we are more interested about the frequency of the interrupts. 3. A 1 second reporting "throttling" looks a reasonable threshold as: 3.1 the idea is to have a significant impact when the leader could have been interrupted say hundred/thousand times per second. 3.2 it does not make that much sense for any tools to sample pg_stat_progress_vacuum multiple times per second (so a one second reporting granularity seems ok). Bump catversion because this changes the definition of pg_stat_progress_vacuum. --- doc/src/sgml/monitoring.sgml | 13 src/backend/catalog/system_views.sql | 2 +- src/backend/commands/vacuum.c| 49 src/include/catalog/catversion.h | 2 +- src/include/commands/progress.h | 1 + src/test/regress/expected/rules.out | 3 +- 6 files changed, 67 insertions(+), 3 deletions(-) 23.5% doc/src/sgml/ 4.2% src/backend/catalog/ 63.4% src/backend/commands/ 4.6% src/include/ 4.0% src/test/regress/expected/ diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 55417a6fa9..d87604331a 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6307,6 +6307,19 @@ FROM pg_stat_get_backend_idset() AS backendid; cleaning up indexes. + + + + time_delayed bigint + + + Total amount of time spent in milliseconds waiting due to vacuum_cost_delay + or autovacuum_vacuum_cost_delay. In case of parallel + vacuum the reported time is across all the workers and the leader. This + column is updated at a 1 Hz frequency (one time per second) so could show + slightly old val
Re: Restart pg_usleep when interrupted
Hi, On Tue, Aug 13, 2024 at 11:40:27AM -0500, Nathan Bossart wrote: > On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote: > > Having to add special handling to space out instrumentation > > directly in vacuum_delay_point seems very odd to me. I don't > > think vacuum_delay_point should have to worry about this. > > > > Also, > > 1/ what is an appropriate interval to collect these stats? > > 2/ What if there are other callers in the future that wish > > to instrument parallel vacuum workers? they will need to implement > > similar logic. > > None of this seems intractable to me. 1 Hz seems like an entirely > reasonable place to start, and it is very easy to change (or to even make > configurable). pg_stat_progress_vacuum might show slightly old values in > this column, but that should be easy enough to explain in the docs if we > are really concerned about it. If other callers want to do something > similar, maybe we should add a more generic implementation in > backend_progress.c. > As it looks like we have a consensus that reducing the number of interrupts also makes sense, I just provided a rebase version of the 1 Hz version (see [0], that also makes clear in the doc that the new field might show slightly old values). [0]: https://www.postgresql.org/message-id/ZsSQnS9OW9EWSOk4%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: generic plans and "initial" pruning
On Tue, Aug 20, 2024 at 1:39 AM Robert Haas wrote: > On Fri, Aug 16, 2024 at 8:36 AM Amit Langote wrote: > > So it is possible for the executor to try to run a plan that has > > become invalid since it was created, so... > > I'm not sure what the "so what" here is. I meant that if the executor has to deal with broken plans anyway, we might as well lean into that fact by choosing not to handle only the cached plan case in a certain way. Yes, I understand that that's not a good justification. > > One perhaps crazy idea [1]: > > > > What if we remove AcquireExecutorLocks() and move the responsibility > > of taking the remaining necessary locks into the executor (those on > > any inheritance children that are added during planning and thus not > > accounted for by AcquirePlannerLocks()), like the patch already does, > > but don't make it also check if the plan has become invalid, which it > > can't do anyway unless it's from a CachedPlan. That means we instead > > let the executor throw any errors that occur when trying to either > > initialize the plan because of the changes that have occurred to the > > objects referenced in the plan, like what is happening in the above > > example. If that case is going to be rare anway, why spend energy on > > checking the validity and replan, especially if that's not an easy > > thing to do as we're finding out. In the above example, we could say > > that it's a user error to create a rule like that, so it should not > > happen in practice, but when it does, the executor seems to deal with > > it correctly by refusing to execute a broken plan . Perhaps it's more > > worthwhile to make the executor behave correctly in face of plan > > invalidation than teach the rest of the system to deal with the > > executor throwing its hands up when it runs into an invalid plan? > > Again, I think this may be a crazy line of thinking but just wanted to > > get it out there. > > I don't know whether this is crazy or not. I think there are two > issues. One, the set of checks that we have right now might not be > complete, and we might just not have realized that because it happens > infrequently enough that we haven't found all the bugs. If that's so, > then a change like this could be a good thing, because it might force > us to fix stuff we should be fixing anyway. I have a feeling that some > of the checks you hit there were added as bug fixes long after the > code was written originally, so my confidence that we don't have more > bugs isn't especially high. This makes sense. > And two, it matters a lot how frequent the errors will be in practice. > I think we normally try to replan rather than let a stale plan be used > because we want to not fail, because users don't like failure. If the > design you propose here would make failures more (or less) frequent, > then that's a problem (or awesome). I think we'd modify plancache.c to postpone the locking of only prunable relations (i.e., partitions), so we're looking at only a handful of concurrent modifications that are going to cause execution errors. That's because we disallow many DDL modifications of partitions unless they are done via recursion from the parent, so the space of errors in practice would be smaller compared to if we were to postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX a_partion_only_index comes to mind as something that might cause an error. I've not tested if other partition-only constraints can cause unsafe behaviors. Perhaps, we can add the check for CachedPlan.is_valid after every table_open() and index_open() in the executor that takes a lock or at all the places we discussed previously and throw the error (say: "cached plan is no longer valid") if it's false. That's better than running into and throwing into some random error by soldiering ahead with its initialization / execution, but still a loss in terms of user experience because we're adding a new failure mode, however rare. -- Thanks, Amit Langote
Re: Remaining reference to _PG_fini() in ldap_password_func
Heikki Linnakangas writes: > On 20/08/2024 07:46, Michael Paquier wrote: >> How about removing it like in the attached to be consistent? > +1. There's also a prototype for _PG_fini() in fmgr.h, let's remove that > too. +1. I think the fmgr.h prototype may have been left there deliberately to avoid breaking extension code, but it's past time to clean it up. regards, tom lane
Re: generic plans and "initial" pruning
On Tue, Aug 20, 2024 at 3:21 AM Robert Haas wrote: > On Mon, Aug 19, 2024 at 1:52 PM Tom Lane wrote: > > Robert Haas writes: > > > But that seems somewhat incidental to what this thread is about. > > > > Perhaps. But if we're running into issues related to that, it might > > be good to set aside the long-term goal for a bit and come up with > > a cleaner answer for intra-session locking. That could allow the > > pruning problem to be solved more cleanly in turn, and it'd be > > an improvement even if not. > > Maybe, but the pieces aren't quite coming together for me. Solving > this would mean that if we execute a stale plan, we'd be more likely > to get a good error and less likely to get a bad, nasty-looking > internal error, or a crash. That's good on its own terms, but we don't > really want user queries to produce errors at all, so I don't think > we'd feel any more free to rearrange the order of operations than we > do today. Yeah, it's unclear whether executing a potentially stale plan is an acceptable tradeoff compared to replanning, especially if it occurs rarely. Personally, I would prefer that it is. > > > Do you have a view on what the way forward might be? > > > > I'm fresh out of ideas at the moment, other than having a hope that > > divide-and-conquer (ie, solving subproblems first) might pay off. > > Fair enough, but why do you think that the original approach of > creating a data structure from within the plan cache mechanism > (probably via a call into some new executor entrypoint) and then > feeding that through to ExecutorRun() time can't work? That would be ExecutorStart(). The data structure need not be referenced after ExecInitNode(). > Is it possible > you latched onto some non-optimal decisions that the early versions of > the patch made, rather than there being a fundamental problem with the > concept? > > I actually thought the do-it-at-executorstart-time approach sounded > pretty good, even though we might have to abandon planstate tree > initialization partway through, right up until Amit started talking > about moving ExecutorStart() from PortalRun() to PortalStart(), which > I have a feeling is going to create a bigger problem than we can > solve. I think if we want to save that approach, we should try to > figure out if we can teach the plancache to replan one query from a > list without replanning the others, which seems like it might allow us > to keep the order of major operations unchanged. Otherwise, it makes > sense to me to have another go at the other approach, at least to make > sure we understand clearly why it can't work. +1 -- Thanks, Amit Langote
Re: why is pg_upgrade's regression run so slow?
On 2024-08-19 Mo 8:00 AM, Alexander Lakhin wrote: Hello Andrew, 29.07.2024 13:54, Andrew Dunstan wrote: On 2024-07-29 Mo 4:00 AM, Alexander Lakhin wrote: And another interesting fact is that TEMP_CONFIG is apparently ignored by `meson test regress/regress`. For example, with temp.config containing invalid settings, `meson test pg_upgrade/002_pg_upgrade` fails, but `meson test regress/regress` passes just fine. Well, that last fact explains the discrepancy I originally complained about. How the heck did that happen? It looks like we just ignored its use in Makefile.global.in :-( Please look at the attached patch (partially based on ideas from [1]) for meson.build, that aligns it with `make` in regard to use of TEMP_CONFIG. Maybe it could be implemented via a separate meson option, but that would also require modifying at least the buildfarm client... [1] https://www.postgresql.org/message-id/CAN55FZ304Kp%2B510-iU5-Nx6hh32ny9jgGn%2BOB5uqPExEMK1gQQ%40mail.gmail.com I think this is the way to go. The patch LGTM. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Some questions about PostgreSQL’s design.
On 20/08/2024 11:46, 陈宗志 wrote: I’ve recently started exploring PostgreSQL implementation. I used to be a MySQL InnoDB developer, and I find the PostgreSQL community feels a bit strange. There are some areas where they’ve done really well, but there are also some obvious issues that haven’t been improved. For example, the B-link tree implementation in PostgreSQL is particularly elegant, and the code is very clean. But there are some clear areas that could be improved but haven’t been addressed, like the double memory problem where the buffer pool and page cache store the same page, using full-page writes to deal with torn page writes instead of something like InnoDB’s double write buffer. It seems like these issues have clear solutions, such as using DirectIO like InnoDB instead of buffered IO, or using a double write buffer instead of relying on the full-page write approach. Can anyone replay why? There are pros and cons. With direct I/O, you cannot take advantage of the kernel page cache anymore, so it becomes important to tune shared_buffers more precisely. That's a downside: the system requires more tuning. For many applications, squeezing the last ounce of performance just isn't that important. There are also scaling issues with the Postgres buffer cache, which might need to be addressed first. With double write buffering, there are also pros and cons. It also requires careful tuning. And replaying WAL that contains full-page images can be much faster, because you can write new page images "blindly" without reading the old pages first. We have WAL prefetching now, which alleviates that, but it's no panacea. In summary, those are good solutions but they're not obviously better in all circumstances. However, the PostgreSQL community’s mailing list is truly a treasure trove, where you can find really interesting discussions. For instance, this discussion on whether lock coupling is needed for B-link trees, etc. https://www.postgresql.org/message-id/flat/CALJbhHPiudj4usf6JF7wuCB81fB7SbNAeyG616k%2Bm9G0vffrYw%40mail.gmail.com Yep, there are old threads and patches for double write buffers and direct IO too :-). -- Heikki Linnakangas Neon (https://neon.tech)
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, Aug 19, 2024 at 1:54 PM Jelte Fennema-Nio wrote: > My point is that the code that breaks, actually wants to be broken in this > case. I'll turn this around then and assume for a moment that this is true: no matter what the use cases are, they all want to be broken for correctness. If this version change is allowed to break both the endpoints and any intermediaries on the connection, why have we chosen 30001 as the new reported version as opposed to, say, 4? Put another way: for a middlebox on the connection (which may be passively observing, but also maybe actively adding new messages to the stream), what is guaranteed to remain the same in the protocol across a minor version bump? Hopefully the answer isn't "nothing"? --Jacob
Re: define PG_REPLSLOT_DIR
On 2024-Aug-19, Bertrand Drouvot wrote: > diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h > index 6f006d5a93..a6cb091635 100644 > --- a/src/include/common/relpath.h > +++ b/src/include/common/relpath.h > @@ -33,6 +33,10 @@ typedef Oid RelFileNumber; > #define TABLESPACE_VERSION_DIRECTORY "PG_" PG_MAJORVERSION "_" \ > > CppAsString2(CATALOG_VERSION_NO) > > +#define PG_TBLSPC_DIR "pg_tblspc" This one is missing some commentary along the lines of "This must not be changed, unless you want to break every tool in the universe". As is, it's quite tempting. > +#define PG_TBLSPC_DIR_SLASH PG_TBLSPC_DIR "/" I would make this simply "pg_tblspc/", since it's not really possible to change pg_tblspc anyway. Also, have a comment explaining why we have it. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Los dioses no protegen a los insensatos. Éstos reciben protección de otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
Re: Improve pg_re_throw: check if sigjmp_buf is valid and report error
Hi On Mon, Aug 19, 2024 at 10:12 PM Tom Lane wrote: > > We have had multiple instances of code "return"ing out of a PG_TRY, > so I fully agree that some better way to detect that would be good. > But maybe we ought to think about static analysis for that. I have some static analysis scripts for detecting this kind of problem (of mis-using PG_TRY). Not sure if my scripts are helpful here but I would like to share them. - A clang plugin for detecting unsafe control flow statements in PG_TRY. https://github.com/higuoxing/clang-plugins/blob/main/lib/ReturnInPgTryBlockChecker.cpp - Same as above, but in CodeQL[^1] script. https://github.com/higuoxing/postgres.ql/blob/main/return-in-PG_TRY.ql - A CodeQL script for detecting the missing of volatile qualifiers (objects have been changed between the setjmp invocation and longjmp call should be qualified with volatile). https://github.com/higuoxing/postgres.ql/blob/main/volatile-in-PG_TRY.ql Andres also has some compiler hacking to detect return statements in PG_TRY[^2]. [^1]: https://codeql.github.com/ [^2]: https://www.postgresql.org/message-id/20230113054900.b7onkvwtkrykeu3z%40awork3.anarazel.de
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 15:48, Jacob Champion wrote: > Put another way: for a middlebox on the connection (which may be > passively observing, but also maybe actively adding new messages to > the stream), what is guaranteed to remain the same in the protocol > across a minor version bump? Hopefully the answer isn't "nothing"? I think primarily we do a minor version bump because a major version bump would cause existing Postgres servers to throw an error for the connection attempt (and we don't want that). While for a minor version bump they will instead send a NegotiateProtocolVersion message. In practical terms I think that means for a minor version bump the format of the StartupMessage cannot be changed. Changing anything else is fair game for a minor protocol version bump. I think we probably would not want to change the format of ErrorResponse and NoticeResponse, since those can be sent by the server before the NegotiateProtocolVersion message. But I don't even think that is strictly necessary, as long as clients would be able to parse both the old and new versions. Note that this definition arises from code and behaviour introduced in ae65f6066dc3 in 2017. And PQprotocolVersion was introduced in efc3a25bb0 in 2003. So anyone starting to use the PQprotocolVersion function in between 2003 and 2017 had no way of knowing that there would ever be a thing called a "minor" version, in which anything about the protocol could be changed except for the StartupMessage.
Re: generic plans and "initial" pruning
On Tue, Aug 20, 2024 at 9:00 AM Amit Langote wrote: > I think we'd modify plancache.c to postpone the locking of only > prunable relations (i.e., partitions), so we're looking at only a > handful of concurrent modifications that are going to cause execution > errors. That's because we disallow many DDL modifications of > partitions unless they are done via recursion from the parent, so the > space of errors in practice would be smaller compared to if we were to > postpone *all* cached plan locks to ExecInitNode() time. DROP INDEX > a_partion_only_index comes to mind as something that might cause an > error. I've not tested if other partition-only constraints can cause > unsafe behaviors. This seems like a valid point to some extent, but in other contexts we've had discussions about how we don't actually guarantee all that much uniformity between a partitioned table and its partitions, and it's been questioned whether we made the right decisions there. So I'm not entirely sure that the surface area for problems here will be as narrow as you're hoping -- I think we'd need to go through all of the ALTER TABLE variants and think it through. But maybe the problems aren't that bad. It does seem like constraints can change the plan. Imagine the partition had a CHECK(false) constraint before and now doesn't, or something. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
Maxim Orlov writes: > So, my point here: installcheck-world doesn't > work on the current master branch until I explicitly install > injection_points extension. In my view, it's a bit wired, since > neither test_decoding, pg_stat_statements or pg_prewarm demand it. Ugh. The basic issue here is that "make install-world" doesn't install anything from underneath src/test/modules, which I recall as being an intentional decision. Rather than poking a hole in that policy for injection_points, I wonder if we should move it to contrib. regards, tom lane
Re: Fsync (flush) all inserted WAL records
Dear All, I would propose a new function like GetXLogInsertRecPtr(), but with some modifications (please, see the attached patch). The result LSN can be passed to XLogFLush() safely. I believe, it will not raise an error in any case. XLogFlush(GetXLogLastInsertEndRecPtr()) will flush (fsync) all already inserted records at the moment. It is what I would like to get. I'm not sure, we need a SQL function counterpart for this new C function, but it is not a big deal to implement. With best regards, Vitaly On Monday, August 19, 2024 09:35 MSK, Michael Paquier wrote: On Wed, Aug 07, 2024 at 06:00:45PM +0300, Aleksander Alekseev wrote: > Assuming the function has value, as you claim, I see no reason not to > expose it similarly to pg_current_wal_*(). On top of that you will > have to test-cover it anyway. The easiest way to do it will be to have > an SQL-wrapper. I cannot be absolutely without seeing a patch, but adding SQL functions in this area is usually very useful for monitoring purposes of external solutions. -- Michael From ba82d6c6f8570fbbff14b4b52fa7720122bfb8ad Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Tue, 20 Aug 2024 18:03:11 +0300 Subject: [PATCH] Add function to return the end LSN of the last inserted WAL record --- src/backend/access/transam/xlog.c | 19 +++ src/include/access/xlog.h | 1 + 2 files changed, 20 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ee0fb0e28f..1430aea6d5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9425,6 +9425,25 @@ GetXLogWriteRecPtr(void) return LogwrtResult.Write; } +/* + * Get the end pointer of the last inserted WAL record. + * The returned value will differ from the current insert pointer + * returned by GetXLogInsertRecPtr() if the last WAL record ends + * up at a page boundary. + */ +XLogRecPtr +GetXLogLastInsertEndRecPtr(void) +{ + XLogCtlInsert *Insert = &XLogCtl->Insert; + uint64 current_bytepos; + + SpinLockAcquire(&Insert->insertpos_lck); + current_bytepos = Insert->CurrBytePos; + SpinLockRelease(&Insert->insertpos_lck); + + return XLogBytePosToEndRecPtr(current_bytepos); +} + /* * Returns the redo pointer of the last checkpoint or restartpoint. This is * the oldest point in WAL that we still need, if we have to restart recovery. diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 083810f5b4..e98a825642 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -226,6 +226,7 @@ extern RecoveryState GetRecoveryState(void); extern bool XLogInsertAllowed(void); extern XLogRecPtr GetXLogInsertRecPtr(void); extern XLogRecPtr GetXLogWriteRecPtr(void); +extern XLogRecPtr GetXLogLastInsertEndRecPtr(void); extern uint64 GetSystemIdentifier(void); extern char *GetMockAuthenticationNonce(void); -- 2.34.1
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On 20/08/2024 16:48, Jacob Champion wrote: On Mon, Aug 19, 2024 at 1:54 PM Jelte Fennema-Nio wrote: My point is that the code that breaks, actually wants to be broken in this case. I'll turn this around then and assume for a moment that this is true: no matter what the use cases are, they all want to be broken for correctness. If this version change is allowed to break both the endpoints and any intermediaries on the connection, why have we chosen 30001 as the new reported version as opposed to, say, 4? That's not a completely crazy idea, it crossed my mind too. And since we already decided to skip protocol number 3.1, how about we jump directly to 3.4. That way: protocol | version | PQProtocolVersion() 2 | 2 (in old unsupported library versions) 3.0 | 3 3.4 | 4 3.5 | 5 and so forth. This kind of assumes we'll never bump the major protocol version again. But if we do, we could jump to 4 at that point. Put another way: for a middlebox on the connection (which may be passively observing, but also maybe actively adding new messages to the stream), what is guaranteed to remain the same in the protocol across a minor version bump? Hopefully the answer isn't "nothing"? I don't think we can give any future guarantees like that. If you have a middlebox on the connection, it needs to fully understand all the protocol versions it supports. It cannot safely pass through protocol version 3.5 without knowing what changed between 3.4 and 3.5. If the middlebox only knows about protocol version 3.4, it should respond with a NegotiateProtocolVersion packet to downgrade to 3.4, even if both ends of the connection could speak 3.5. That seems a bit tangential to the PQprotocolVersion() function though. A middlebox like that would probably not use libpq. I'm actually not sure exactly what an application would use PQprotocolVersion() for. To check if a feature exists or not? None of the features discussed so far really need an application to check that, but if we introduce one, I think we'd want to add a better feature-check function for that purpose. Something like "bool PQsupportsFeature(conn, const char *feature_name)" perhaps. If we introduce optional protocol features rather than bump protocol version in the future, we'll need a different mechanism than PQprotocolVersion() anyway. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 11:24 AM Heikki Linnakangas wrote: > That's not a completely crazy idea, it crossed my mind too. And since we > already decided to skip protocol number 3.1, how about we jump directly > to 3.4. That way: > > protocol | > version | PQProtocolVersion() > > 2 | 2 (in old unsupported library versions) > 3.0 | 3 > 3.4 | 4 > 3.5 | 5 > > and so forth. > > This kind of assumes we'll never bump the major protocol version again. > But if we do, we could jump to 4 at that point. I personally like this less than both (a) adding a new function and (b) redefining the existing function as Jelte proposes. It just seems too clever to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 17:46, Robert Haas wrote: > I personally like this less than both (a) adding a new function and > (b) redefining the existing function as Jelte proposes. It just seems > too clever to me. Agreed, I'm not really seeing a benefit of returning 4 instead of 30004. Both are new numbers that are higher than 3, so on existing code they would have the same impact. But any new code would be more readable when using version >= 30004 imho.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 11:53 AM Jelte Fennema-Nio wrote: > On Tue, 20 Aug 2024 at 17:46, Robert Haas wrote: > > I personally like this less than both (a) adding a new function and > > (b) redefining the existing function as Jelte proposes. It just seems > > too clever to me. > > Agreed, I'm not really seeing a benefit of returning 4 instead of > 30004. Both are new numbers that are higher than 3, so on existing > code they would have the same impact. But any new code would be more > readable when using version >= 30004 imho. Yes. And the major * 1 + minor convention is used in other places already, for PG versions, so it might already be familiar to some people. I think if we're going to redefine an existing function, we might as well just redefine it as you propose -- or perhaps even redefine it to return major * 1 + minor always, instead of having the strange exception for 3.0. I think I'm still on the side of not redefining it, but if we're going to redefine it, I think we should do what seems most elegant/logical and just accept that some code may break. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
I wrote: > Ugh. The basic issue here is that "make install-world" doesn't > install anything from underneath src/test/modules, which I recall > as being an intentional decision. Rather than poking a hole in > that policy for injection_points, I wonder if we should move it > to contrib. ... which would also imply writing documentation and so forth, and it'd mean that injection_points starts to show up in end-user installations. (That would happen with the alternative choice of hacking install-world to include src/test/modules/injection_points, too.) While you could argue that that'd be helpful for extension authors who'd like to use injection_points in their own tests, I'm not sure that it's where we want to go with that module. It's only meant as test scaffolding, and I don't think we've analyzed the implications of some naive user installing it. We do, however, need to preserve the property that installcheck works after install-world. I'm starting to think that maybe the 041 test should be hacked to silently skip if it doesn't find injection_points available. (We could then remove some of the makefile hackery that's supporting the current behavior.) Probably the same needs to happen in each other test script that's using injection_points --- I imagine that Maxim's test is simply failing here first. regards, tom lane
Re: define PG_REPLSLOT_DIR
Hi, On Tue, Aug 20, 2024 at 10:15:44AM -0400, Alvaro Herrera wrote: > On 2024-Aug-19, Bertrand Drouvot wrote: > > > diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h > > index 6f006d5a93..a6cb091635 100644 > > --- a/src/include/common/relpath.h > > +++ b/src/include/common/relpath.h > > @@ -33,6 +33,10 @@ typedef Oid RelFileNumber; > > #define TABLESPACE_VERSION_DIRECTORY "PG_" PG_MAJORVERSION "_" \ > > > > CppAsString2(CATALOG_VERSION_NO) > > > > +#define PG_TBLSPC_DIR "pg_tblspc" > > This one is missing some commentary along the lines of "This must not be > changed, unless you want to break every tool in the universe". As is, > it's quite tempting. Yeah, makes sense, thanks. > > +#define PG_TBLSPC_DIR_SLASH PG_TBLSPC_DIR "/" > > I would make this simply "pg_tblspc/", since it's not really possible to > change pg_tblspc anyway. Also, have a comment explaining why we have > it. Please find attached v3 that: - takes care of your comments (and also removed the use of PG_TBLSPC_DIR in RELATIVE_PG_TBLSPC_DIR). - removes the new macros from the comments (see Michael's and Yugo-San's comments in [0] resp. [1]). - adds a missing sizeof() (see [1]). - implements Ashutosh's idea of adding a new SLOT_DIRNAME_ARGS (see [2]). It's done in 0002 (I used REPLSLOT_DIR_ARGS though). - fixed a macro usage in ReorderBufferCleanupSerializedTXNs() (was not at the right location, discovered while implementing 0002). [0]: https://www.postgresql.org/message-id/ZsRYPcOtoqbWzjGG%40paquier.xyz [1]: https://www.postgresql.org/message-id/20240820213048.207aade6a75e0dc1fe4d1067%40sraoss.co.jp [2]: https://www.postgresql.org/message-id/CAExHW5vkjxuvyQ1fPPnuDW4nAT5jqox09ie36kciOV2%2BrhjbHA%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 12398b01fccb7437c4fb35282c576c317b9a7578 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Wed, 14 Aug 2024 09:16:21 + Subject: [PATCH v3 1/6] Define PG_REPLSLOT_DIR Replace most of the places where "pg_replslot" is used in .c files with a new PG_REPLSLOT_DIR define. The places where it is not done is for consistency with the existing PG_STAT_TMP_DIR define. --- src/backend/backup/basebackup.c | 3 +- .../replication/logical/reorderbuffer.c | 17 - src/backend/replication/slot.c| 36 +-- src/backend/utils/adt/genfile.c | 3 +- src/bin/initdb/initdb.c | 2 +- src/bin/pg_rewind/filemap.c | 2 +- src/include/replication/slot.h| 2 ++ 7 files changed, 35 insertions(+), 30 deletions(-) 27.5% src/backend/replication/logical/ 60.8% src/backend/replication/ 3.9% src/backend/utils/adt/ 4.2% src/bin/ 3.3% src/ diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 01b35e26bd..de16afac74 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -36,6 +36,7 @@ #include "port.h" #include "postmaster/syslogger.h" #include "postmaster/walsummarizer.h" +#include "replication/slot.h" #include "replication/walsender.h" #include "replication/walsender_private.h" #include "storage/bufpage.h" @@ -161,7 +162,7 @@ static const char *const excludeDirContents[] = * even if the intention is to restore to another primary. See backup.sgml * for a more detailed description. */ - "pg_replslot", + PG_REPLSLOT_DIR, /* Contents removed on startup, see dsm_cleanup_for_mmap(). */ PG_DYNSHMEM_DIR, diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 00a8327e77..78166b6ab7 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -4571,9 +4571,9 @@ ReorderBufferCleanupSerializedTXNs(const char *slotname) DIR *spill_dir; struct dirent *spill_de; struct stat statbuf; - char path[MAXPGPATH * 2 + 12]; + char path[MAXPGPATH * 2 + sizeof(PG_REPLSLOT_DIR)]; - sprintf(path, "pg_replslot/%s", slotname); + sprintf(path, "%s/%s", PG_REPLSLOT_DIR, slotname); /* we're only handling directories here, skip if it's not ours */ if (lstat(path, &statbuf) == 0 && !S_ISDIR(statbuf.st_mode)) @@ -4586,14 +4586,14 @@ ReorderBufferCleanupSerializedTXNs(const char *slotname) if (strncmp(spill_de->d_name, "xid", 3) == 0) { snprintf(path, sizeof(path), - "pg_replslot/%s/%s", slotname, + "%s/%s/%s", PG_REPLSLOT_DIR, slotname, spill_de->d_name); if (unlink(path) != 0) ereport(ERROR, (errcode_for_file_access(), - errmsg("could not remove file \"%s\" during removal of pg_replslot/%s/xid*: %m", -path, slotname))); + errmsg("could not remove file \"%s\" during removal of %s/%s/xid*: %m", +path, PG_REPLSLOT_DIR,
Re: define PG_REPLSLOT_DIR
Hi, On Tue, Aug 20, 2024 at 09:30:48PM +0900, Yugo Nagata wrote: > Should be the follwing also rewritten using sizeof(PG_REPLSLOT_DIR)? > >struct stat statbuf; > charpath[MAXPGPATH * 2 + 12]; > > Yeah, done in v3 that I just shared up-thread (also removed the macros from the comments). Thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: ANALYZE ONLY
Hi Michael, Thanks for starting this thread. I've also spent a bit time on this after reading your first thread on this issue [1] Michael Harris , 20 Ağu 2024 Sal, 08:52 tarihinde şunu yazdı: > The problem is that giving an ANALYZE command targeting a partitioned table > causes it to update statistics for the partitioned table AND all the > individual > partitions. There is currently no option to prevent it from including the > partitions. > > This is wasteful for our application: for one thing the autovacuum > has already analyzed the individual partitions; for another most of > the partitions > will have had no changes, so they don't need to be analyzed repeatedly. > I agree that it's a waste to analyze partitions when they're already analyzed by autovacuum. It would be nice to have a way to run analyze only on a partitioned table without its partitions. > I took some measurements when running ANALYZE on one of our tables. It > took approx > 4 minutes to analyze the partitioned table, then 29 minutes to analyze the > partitions. We have hundreds of these tables, so the cost is very > significant. > I quickly tweaked the code a bit to exclude partitions when a partitioned table is being analyzed. I can confirm that there is a significant gain even on a simple case like a partitioned table with 10 partitions and 1M rows in each partition. 1. Would such a feature be welcomed? Are there any traps I might not > have thought of? > > 2. The existing ANALYZE command has the following structure: > > ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ] > > It would be easiest to add ONLY as another option, but that > doesn't look quite > right to me - surely the ONLY should be attached to the table name > An alternative would be: > > ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ] > I feel closer to adding this as an option instead of a new keyword in ANALYZE grammar. To me, it would be easier to have this option and then give the names of partitioned tables as opposed to typing ONLY before each partition table. But we should think of another name as ONLY is used differently (attached to the table name as you mentioned) in other contexts. I've been also thinking about how this new option should affect inheritance tables. Should it have just no impact on them or only analyze the parent table without taking child tables into account? There are two records for an inheritance parent table in pg_statistic, one row for only the parent table and a second row including children. We might only analyze the parent table if this new "ONLY" option is specified. I'm not sure if that would be something users would need or not, but I think this option should behave similarly for both partitioned tables and inheritance tables. If we decide to go with only partition tables and not care about inheritance, then naming this option to SKIP_PARTITIONS as Jelte suggested sounds fine. But that name wouldn't work if this option will affect inheritance tables. Thanks, -- Melih Mutlu Microsoft
Re: define PG_REPLSLOT_DIR
Hi, On Tue, Aug 20, 2024 at 12:06:52PM +, Bertrand Drouvot wrote: > Hi, > > On Tue, Aug 20, 2024 at 05:41:48PM +0900, Michael Paquier wrote: > > On Tue, Aug 20, 2024 at 11:10:46AM +0530, Ashutosh Bapat wrote: > > > Since these are all related changes, doing them at once might make it > > > faster. You may use multiple commits (one for each change) > > > > Doing multiple commits with individual definitions for each path would > > be the way to go for me. All that is mechanical, still that feels > > slightly cleaner. > > Right, that's what v2 has done. If there is a need for v3 then I'll add one > dedicated patch for Ashutosh's proposal in the v3 series. Ashutosh's idea is implemented in v3 that I just shared up-thread (I used REPLSLOT_DIR_ARGS though). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: ANALYZE ONLY
Melih Mutlu , 20 Ağu 2024 Sal, 19:26 tarihinde şunu yazdı: > Hi Michael, > > Thanks for starting this thread. I've also spent a bit time on this after > reading your first thread on this issue [1] > Forgot to add the reference [1] [1] https://www.postgresql.org/message-id/cadofcaxvbd0ygp_eac9chmzsoosai3jcfbcnyva3j0rrdrv...@mail.gmail.com
Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
On 2024-Aug-20, Tom Lane wrote: > We do, however, need to preserve the property that installcheck > works after install-world. I'm starting to think that maybe > the 041 test should be hacked to silently skip if it doesn't > find injection_points available. Yeah, I like this option. Injection points require to be explicitly enabled in configure, so skipping that test when injection_points can't be found seems reasonable. This also suggests that EXTRA_INSTALL should have injection_points only when the option is enabled. I've been curious about what exactly does this Makefile line export enable_injection_points enable_injection_points achieve. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 9:02 AM Robert Haas wrote: > Yes. And the major * 1 + minor convention is used in other places > already, for PG versions, so it might already be familiar to some > people. > I'm wondering why we are indicating that minor versions of the protocol are even a real thing. We should just use integer version numbers. We are on 3. The next one is 4 (the trailing .0 is just historical cruft just like with our 3-digit PostgreSQL version number). v18 libpq-based clients, if they attempt to connect using v4 and fail, will try again using the v3 connection. That will retain status quo behavior when something like a connection pooler doesn't understand the new reality. We can add a libpq option to prevent this auto-downgrade behavior. At some point users will want to use something other than the v3 current tooling supports and will put pressure on those tools to change. In the mean-time our out-of-the-box behavior continues to work using the v3 protocol. Feature detection sounds great, and maybe we want to go there eventually, but everyone understands progressive enhancement represented by version numbering. A given major server version would only support a fixed and unchanging set of protocol versions between 3 and N. On the client, if N = 7 then libpq would be able to choose both 7 and 3 as the version it tries out-of-the-box. We can use a libpq parameter to allow the client to specify something between 4 and 6 (which may fail depending on poolers and what-not). If the chain of servers supports protocol version negotiation then the attempt to connect using 7 can be auto-downgraded to anything between 3 and 6 (saving effort of a failed attempt and establishing a new one.) Leaving the client the option to specify a minimum version of the protocol it can accept. David J.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 12:42 PM David G. Johnston wrote: > I'm wondering why we are indicating that minor versions of the protocol are > even a real thing. Because that concept is already a part of the existing wire protocol. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Bug Fix]standby may crash when switching-over in certain special cases
On Mon, 19 Aug 2024 16:43:09 +0800 px shi wrote: > Hi, hackers, > > I've recently encountered an issue where a standby crashes when > reconnecting to a new primary after a switchover under certain conditions. > Here's a procedure of the crash scenario: > > > 1) We have three instances: one primary and two standbys (s1 and s2, both > using streaming replication). > > > 2) The primary crashed when the standby’s `flushed_lsn` was at the > beginning of a WAL segment (e.g., `0/2200`). Both s1 and s2 logged the > following: > >``` > >FATAL: could not connect to the primary server... > >LOG: waiting for WAL to become available at 0/22ED > >``` > > > 3) s1 was promoted to the new primary, s1 logged the following: > >``` > >LOG: received promote request > >LOG: redo done at 0/21FFFEE8 > >LOG: selected new timeline ID: 2 > >``` > > > 4) s2's `primary_conninfo` was updated to point to s1, s2 logged the > following: > >``` > >LOG: received SIGHUP, reloading configuration files > >LOG: parameter "primary_conninfo" changed to ... > >``` > > > 5) s2 began replication with s1 and attempted to fetch `0/2200` on > timeline 2, s2 logged the following: > >``` > >LOG: fetching timeline history file for timeline 2 from primary server > >FATAL: could not start WAL streaming: ERROR: requested starting point > 0/2200 on timeline 1 is not this server's history > >DETAIL: This server's history forked from timeline 1 at 0/21E0. > >LOG: started streaming WAL from primary at 0/2200 on timeline 2 > >``` > > > 6) WAL record mismatch caused the walreceiver process to terminate, s2 > logged the following: > >``` > >LOG: invalid contrecord length 10 (expected 213) at 0/21E0 > >FATAL: terminating walreceiver process due to administrator command > >``` > > > 7) s2 then attempted to fetch `0/2100` on timeline 2. However, the > startup process failed to open the WAL file before it was created, leading > to a crash: > >``` > >PANIC: could not open file "pg_wal/00020021": No such > file or directory > >LOG: startup process was terminated by signal 6: Aborted > >``` > > > In this scenario, s2 attempts replication twice. First, it starts from > `0/2200` on timeline 2, setting `walrcv->flushedUpto` to `0/2200`. > But when a mismatch occurs, the walreceiver process is terminated. On the > second attempt, replication starts from `0/2100` on timeline 2. The > startup process expects the WAL file to exist because WalRcvFlushRecPtr() > return `0/2200`, but the file is not found, causing s2's startup > process to crash. > > I think it should check recptr and walrcv->flushedUpto when Request > XLogStreming, so I create a patch for it. Is s1 a cascading standby of s2? If otherwise s1 and s2 is the standbys of the primary server respectively, it is not surprising that s2 has progressed far than s1 when the primary fails. I believe that this is the case you should use pg_rewind. Even if flushedUpto is reset as proposed in your patch, s2 might already have applied a WAL record that s1 has not processed yet, and there would be no gurantee that subsecuent applys suceed. Regards, Yugo Nagata -- Yugo Nagata
Re: Partial aggregates pushdown
On Tue, Aug 20, 2024 at 10:07:32AM +0200, Jelte Fennema-Nio wrote: > On Thu, 15 Aug 2024 at 23:12, Bruce Momjian wrote: > > Third, I would like to show a more specific example to clarify what is > > being considered above. If we look at MAX(), we can have FDWs return > > the max for each FDW, and the coordinator can chose the highest value. > > This is the patch 1 listed above. These can return the > > pg_aggregate.aggtranstype data type using the pg_type.typoutput text > > output. > > > > The second case is for something like AVG(), which must return the SUM() > > and COUNT(), and we currently have no way to return multiple text values > > on the wire. For patch 0002, we have the option of creating functions > > that can do this and record them in new pg_attribute columns, or we can > > create a data type with these functions, and assign the data type to > > pg_aggregate.aggtranstype. > > > > Is that accurate? > > It's close to accurate, but not entirely. Patch 1 would actually > solves some AVG cases too, because some AVG implementations use an SQL > array type to store the transtype instead of an internal type. And by > using an SQL array type we *can* send multiple text values on the > wire. See below for a list of those aggregates: Okay, so we can do MAX easily, and AVG if the count can be represented as the same data type as the sum? Is that correct? Our only problem is that something like AVG(interval) can't use an array because arrays have to have the same data type for all array elements, and an interval can't represent a count? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Some questions about PostgreSQL’s design.
On Tue, Aug 20, 2024 at 04:46:54PM +0300, Heikki Linnakangas wrote: > There are pros and cons. With direct I/O, you cannot take advantage of the > kernel page cache anymore, so it becomes important to tune shared_buffers > more precisely. That's a downside: the system requires more tuning. For many > applications, squeezing the last ounce of performance just isn't that > important. There are also scaling issues with the Postgres buffer cache, > which might need to be addressed first. > > With double write buffering, there are also pros and cons. It also requires > careful tuning. And replaying WAL that contains full-page images can be much > faster, because you can write new page images "blindly" without reading the > old pages first. We have WAL prefetching now, which alleviates that, but > it's no panacea. 陈宗志, you mimght find this blog post helpful: https://momjian.us/main/blogs/pgblog/2017.html#June_5_2017 -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 18:42, David G. Johnston wrote: > v18 libpq-based clients, if they attempt to connect using v4 and fail, will > try again using the v3 connection. That will retain status quo behavior when > something like a connection pooler doesn't understand the new reality. Having connection latency double when connecting to an older Postgres is something I'd very much like to avoid. Reconnecting functionally retains the status quo, but it doesn't retain the expected perf characteristics. By using a minor protocol version we can easily avoid this connection latency issue.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 9:44 AM Robert Haas wrote: > On Tue, Aug 20, 2024 at 12:42 PM David G. Johnston > wrote: > > I'm wondering why we are indicating that minor versions of the protocol > are even a real thing. > > Because that concept is already a part of the existing wire protocol. > > Right... " If the major version requested by the client is not supported by the server, the connection will be rejected ... If the minor version requested by the client is not supported by the server ... the server may either reject the connection or may respond with a NegotiateProtocolVersion message containing the highest minor protocol version which it supports. The client may then choose either to continue with the connection using the specified protocol version or to abort the connection. " So basically my proposal amounted to making every update a "major version update" and changing the behavior surrounding NegotiateProtocolVersion so it applies to major version differences. I'll stand by that change in definition. The current one doesn't seem all that useful anyway, and as we only have a single version, definitely hasn't been materially implemented. Otherwise, at some point a client that knows both v3 and v4 will exist and its connection will be rejected instead of downgraded by a v3-only server even though such a downgrade would be possible. I suspect we'd go ahead and change the rule then - so why not just do so now, while getting rid of the idea that minor versions are a thing. I suppose we could leave minor versions for patch releases of the main server version - which still leaves the first new feature of a release incrementing the major version. That would be incidental to changing how we handle major versions. David J.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 7:26 AM Jelte Fennema-Nio wrote: > In practical terms I think that means for a minor version bump the > format of the StartupMessage cannot be changed. Changing anything else > is fair game for a minor protocol version bump. I may be in a tiny minority here, but when I combine that statement with your opinion from way upthread that > IMHO, we > should get to a state where protocol minor version bumps are so > low-risk that we can do them whenever we add message types then I don't see this effort ending up in a healthy place or with a happy ecosystem. Pick any IETF-managed protocol, add on the statement "we get to change anything we want in a minor version, and we reserve the right to do it every single year", and imagine the chaos for anyone who doesn't have power over both servers and clients. To me it seems that what you're proposing is indistinguishable from what most other protocols would consider a major version bump; it's just that you (reasonably) want existing clients to be able to negotiate multiple major versions in one round trip. --Jacob
Re: Partial aggregates pushdown
On Tue, 20 Aug 2024 at 18:50, Bruce Momjian wrote: > Okay, so we can do MAX easily, and AVG if the count can be represented > as the same data type as the sum? Is that correct? Our only problem is > that something like AVG(interval) can't use an array because arrays have > to have the same data type for all array elements, and an interval can't > represent a count? Close, but still not completely correct. AVG(bigint) can also not be supported by patch 1, because the sum and the count for that both stored using an int128. So we'd need an array of int128, and there's currently no int128 SQL type.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 1:02 PM David G. Johnston wrote: > So basically my proposal amounted to making every update a "major version > update" and changing the behavior surrounding NegotiateProtocolVersion so it > applies to major version differences. I'll stand by that change in > definition. The current one doesn't seem all that useful anyway, and as we > only have a single version, definitely hasn't been materially implemented. > Otherwise, at some point a client that knows both v3 and v4 will exist and > its connection will be rejected instead of downgraded by a v3-only server > even though such a downgrade would be possible. I suspect we'd go ahead and > change the rule then - so why not just do so now, while getting rid of the > idea that minor versions are a thing. > > I suppose we could leave minor versions for patch releases of the main server > version - which still leaves the first new feature of a release incrementing > the major version. That would be incidental to changing how we handle major > versions. I don't see how this makes life any better for anyone. At some point in the future we may decide to make a protocol change that is big and breaks a lot of stuff, but the current goals are all to make minor changes that break as little stuff as possible. I think it's appropriate to call the latter a "minor" change and the former a "major" change. If we adopted this proposal, then we could end up in a situation where versions 3 through 17 are all mostly compatible and then version 18 is something totally different. It sounds much better to me to have versions 3.0 through 3.14 and then eventually 4.0. This is also what the person who designed the current protocol version numbering scheme seems to have had in mind, even if the implementation to make it a reality has been a bit lacking. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 10:03 AM Jacob Champion < jacob.champ...@enterprisedb.com> wrote: > On Tue, Aug 20, 2024 at 7:26 AM Jelte Fennema-Nio > wrote: > > In practical terms I think that means for a minor version bump the > > format of the StartupMessage cannot be changed. Changing anything else > > is fair game for a minor protocol version bump. > > I may be in a tiny minority here, but when I combine that statement > with your opinion from way upthread that > > > IMHO, we > > should get to a state where protocol minor version bumps are so > > low-risk that we can do them whenever we add message types > > To me it seems that what you're proposing is indistinguishable from > what most other protocols would consider a major version bump; it's > just that you (reasonably) want existing clients to be able to > negotiate multiple major versions in one round trip. > > This makes more sense to me - a major version change is one where the server fails to understand the incoming message(s) to the point that it cannot make decisions based upon contents. Framed up this way the two-part versioning works just fine and I concur that PQversionNumber should go ahead and report 1+minor (starting at 2) with 3.0 remaining as-is since apparently negotiation down to 3.0 is possible here if the intermediate and/or final server have such ability. Still, instead of just failing immediately if 30002 is specified and rejected, falling back to trying 3.0 - unless configured to either not do that or to only do 3.0 - is advised to help with the transition. David J.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 8:24 AM Heikki Linnakangas wrote: > > Put another way: for a middlebox on the connection (which may be > > passively observing, but also maybe actively adding new messages to > > the stream), what is guaranteed to remain the same in the protocol > > across a minor version bump? Hopefully the answer isn't "nothing"? > > I don't think we can give any future guarantees like that. If you have a > middlebox on the connection, it needs to fully understand all the > protocol versions it supports. (GMail has catastrophically unthreaded this conversation for me, so apologies if I start responding out of order) Many protocols provide the list of assumptions that intermediates are allowed to make within a single group of compatible versions, even as the protocol gets extended. If we choose to provide those, then our "major version" gains really useful semantics. See also the brief "criticality" tangent upthread. > That seems a bit tangential to the PQprotocolVersion() function though. > A middlebox like that would probably not use libpq. It's applicable to the use case I was talking about with Jelte. A libpq client dropping down to the socket level is relying on (implicit, currently undocumented/undecided, possibly incorrect!) intermediary guarantees that the protocol provides for a major version. I'm hoping we can provide some, since we haven't broken anything yet. If we decide we can't, then so be it -- things will break either way -- but it's still strange to me that we'd be okay with literally zero forward compatibility and still call that a "minor version". --Jacob
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 10:31 AM Jacob Champion < jacob.champ...@enterprisedb.com> wrote: > If we decide we can't, then so be it -- things will > break either way -- but it's still strange to me that we'd be okay > with literally zero forward compatibility and still call that a "minor > version". > Semantic versioning guidelines are not something we are following, especially here. Our protocol version is really just two-part; just like our server major version used to be. We just happen to have named both parts here, unlike with the historical server major version. We never have implemented a protocol change during a minor server version update, it doesn't have (though maybe it needs?) a patch version part. David J.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 10:42 AM David G. Johnston wrote: > Semantic versioning guidelines are not something we are following, especially > here. I understand; the protocol is ours, and we'll do whatever we do in the end. I'm claiming that we can choose to provide semantics, and if we do, those semantics will help people who are not here on the list to defend their use cases. --Jacob
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 10:46 AM Jacob Champion < jacob.champ...@enterprisedb.com> wrote: > On Tue, Aug 20, 2024 at 10:42 AM David G. Johnston > wrote: > > Semantic versioning guidelines are not something we are following, > especially here. > > I understand; the protocol is ours, and we'll do whatever we do in the > end. I'm claiming that we can choose to provide semantics, and if we > do, those semantics will help people who are not here on the list to > defend their use cases. > > I was mostly just responding to your surprise given that we have a track-record here. I agree that our existing effective policy isn't all that well documented, namely as to when the major component might change, and the fact that the minor component does not represent a "bug fix release". David J.
Re: Improving the notation for ecpg.addons rules
I wrote: > Michael Paquier writes: >> It looks like %replace_line expects all its elements to have one space >> between each token, still this is not enforced with a check across its >> hardcoded elements? > Yeah, I was wondering about that. I wouldn't do it exactly like > that, but with a check that the entry gets matched somewhere. Here's a patch for that (again based on the other patch series). This did not turn up anything interesting, but it's probably worth keeping. regards, tom lane diff --git a/src/interfaces/ecpg/preproc/parse.pl b/src/interfaces/ecpg/preproc/parse.pl index 98d44d4bf2..998822ce73 100644 --- a/src/interfaces/ecpg/preproc/parse.pl +++ b/src/interfaces/ecpg/preproc/parse.pl @@ -33,7 +33,9 @@ GetOptions( # These hash tables define additional transformations to apply to -# grammar rules. +# grammar rules. For bug-detection purposes, we count usages of +# each hash table entry in a second hash table, and verify that +# all the entries get used. # Substitutions to apply to tokens whenever they are seen in a rule. my %replace_token = ( @@ -44,6 +46,8 @@ my %replace_token = ( 'IDENT' => 'ecpg_ident', 'PARAM' => 'ecpg_param',); +my %replace_token_used; + # This hash can provide a result type to override "void" for nonterminals # that need that, or it can specify 'ignore' to cause us to skip the rule # for that nonterminal. (In either case, ecpg.trailer had better provide @@ -68,6 +72,8 @@ my %replace_types = ( 'plassign_target' => 'ignore', 'plassign_equals' => 'ignore',); +my %replace_types_used; + # This hash provides an "ignore" option or substitute expansion for any # rule or rule alternative. The hash key is the same "concattokens" tag # used for lookup in ecpg.addons. @@ -111,6 +117,8 @@ my %replace_line = ( 'PREPARE prepared_name prep_type_clause AS PreparableStmt', 'var_nameColId' => 'ECPGColId'); +my %replace_line_used; + # Declare assorted state variables. @@ -198,6 +206,30 @@ foreach (keys %addons) die "addon rule $_ was matched multiple times\n" if $addons{$_}{used} > 1; } +# Likewise cross-check that entries in our internal hash tables match something. +foreach (keys %replace_token) +{ + die "replace_token entry $_ was never used\n" + if !defined($replace_token_used{$_}); + # multiple use of a replace_token entry is fine +} + +foreach (keys %replace_types) +{ + die "replace_types entry $_ was never used\n" + if !defined($replace_types_used{$_}); + die "replace_types entry $_ was matched multiple times\n" + if $replace_types_used{$_} > 1; +} + +foreach (keys %replace_line) +{ + die "replace_line entry $_ was never used\n" + if !defined($replace_line_used{$_}); + die "replace_line entry $_ was matched multiple times\n" + if $replace_line_used{$_} > 1; +} + # Read the backend grammar. sub main @@ -400,6 +432,7 @@ sub main # Apply replace_token substitution if we have one. if (exists $replace_token{ $arr[$fieldIndexer] }) { +$replace_token_used{ $arr[$fieldIndexer] }++; $arr[$fieldIndexer] = $replace_token{ $arr[$fieldIndexer] }; } @@ -425,6 +458,7 @@ sub main && $replace_types{$non_term_id} eq 'ignore') { # We'll ignore this nonterminal and rule altogether. + $replace_types_used{$non_term_id}++; $copymode = 0; next line; } @@ -451,6 +485,7 @@ sub main . $replace_types{$non_term_id} . ' ' . $non_term_id; add_to_buffer('types', $tstr); + $replace_types_used{$non_term_id}++; } # Emit the target part of the rule. @@ -616,8 +651,10 @@ sub emit_rule # apply replace_line substitution if any my $rep = $replace_line{$tag}; - if ($rep) + if (defined $rep) { + $replace_line_used{$tag}++; + if ($rep eq 'ignore') { return 0;
Re: MultiXact\SLRU buffers configuration
On 2024-Aug-19, Andrey M. Borodin wrote: > > On 5 Jul 2024, at 23:18, Andrey M. Borodin wrote: > > > > Alvaro, please find attached the test. > > I’ve addressed some Michael’s comments in a nearby thread: removed > > extra load, made injection point names lowercase, fixed some grammar > > issues. > > I’ve made several runs on Github to test stability [0, 1, 2, 4]. CI seems to > be stable. OK, I've made some minor adjustments and pushed. CI seemed OK for me, let's see what does the BF have to say. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: ANALYZE ONLY
On Tue, Aug 20, 2024 at 1:52 AM Michael Harris wrote: > 2. The existing ANALYZE command has the following structure: > > ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ] > > It would be easiest to add ONLY as another option, but that > doesn't look quite > right to me - surely the ONLY should be attached to the table name? > An alternative would be: > > ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ] > > Any feedback or advice would be great. I like trying to use ONLY somehow. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Partial aggregates pushdown
On Tue, Aug 20, 2024 at 07:03:56PM +0200, Jelte Fennema-Nio wrote: > On Tue, 20 Aug 2024 at 18:50, Bruce Momjian wrote: > > Okay, so we can do MAX easily, and AVG if the count can be represented > > as the same data type as the sum? Is that correct? Our only problem is > > that something like AVG(interval) can't use an array because arrays have > > to have the same data type for all array elements, and an interval can't > > represent a count? > > Close, but still not completely correct. AVG(bigint) can also not be > supported by patch 1, because the sum and the count for that both > stored using an int128. So we'd need an array of int128, and there's > currently no int128 SQL type. Okay. Have we considered having the FDW return a record: SELECT (oid, relname) FROM pg_class LIMIT 1; row - (2619,pg_statistic) SELECT pg_typeof((oid, relname)) FROM pg_class LIMIT 1; pg_typeof --- record SELECT pg_typeof(oid) FROM pg_class LIMIT 1; pg_typeof --- oid SELECT pg_typeof(relname) FROM pg_class LIMIT 1; pg_typeof --- name -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Use read streams in pg_visibility
On Tue, Aug 13, 2024 at 03:22:27PM +0300, Nazir Bilal Yavuz wrote: > 2- collect_corrupt_items() > > This one is more complicated. The read stream callback function loops > until it finds a suitable block to read. So, if the callback returns > an InvalidBlockNumber; it means that the stream processed all possible > blocks and the stream should be finished. There is ~3% timing > improvement with this change. I started the server with the default > settings and created a 6 GB table. Then run 100 times > pg_check_visible() by clearing the OS cache between each run. > > The downside of this approach is there are too many "vmbuffer is valid > but BufferGetBlockNumber(*vmbuffer) is not equal to mapBlock, so > vmbuffer needs to be read again" cases in the read stream version (700 > vs 20 for the 6 GB table). This is caused by the callback function of > the read stream reading a new vmbuf while getting next block numbers. > So, vmbuf is wrong when we are checking visibility map bits that might > have changed while we were acquiring the page lock. Did the test that found 700 "read again" use a different procedure than the one shared as setup.sh down-thread? Using that script alone, none of the vm bits are set, so I'd not expect any heap page reads. The "vmbuffer needs to be read again" regression fits what I would expect the v1 patch to do with a table having many vm bits set. In general, I think the fix is to keep two heap Buffer vars, so the callback can work on one vmbuffer while collect_corrupt_items() works on another vmbuffer. Much of the time, they'll be the same buffer. It could be as simple as that, but you could consider further optimizations like these: - Use ReadRecentBuffer() or a similar technique, to avoid a buffer mapping lookup when the other Buffer var already has the block you want. - [probably not worth it] Add APIs for pg_visibility.c to tell read_stream.c to stop calling the ReadStreamBlockNumberCB for awhile. This could help if nonzero vm bits are infrequent, causing us to visit few heap blocks per vm block. For example, if one block out of every 33000 is all-visible, every heap block we visit has a different vmbuffer. It's likely not optimal for the callback to pin and unpin 20 vmbuffers, then have collect_corrupt_items() pin and unpin the same 20 vmbuffers. pg_visibility could notice this trend and request a stop of the callbacks until more of the heap block work completes. If pg_visibility is going to be the only place in the code with this use case, it's probably not worth carrying the extra API just for pg_visibility. However, if we get a stronger use case later, pg_visibility could be another beneficiary. > +/* > + * Callback function to get next block for read stream object used in > + * collect_visibility_data() function. > + */ > +static BlockNumber > +collect_visibility_data_read_stream_next_block(ReadStream *stream, > + >void *callback_private_data, > + >void *per_buffer_data) > +{ > + struct collect_visibility_data_read_stream_private *p = > callback_private_data; > + > + if (p->blocknum < p->nblocks) > + return p->blocknum++; > + > + return InvalidBlockNumber; This is the third callback that just iterates over a block range, after pg_prewarm_read_stream_next_block() and copy_storage_using_buffer_read_stream_next_block(). While not a big problem, I think it's time to have a general-use callback for block range scans. The quantity of duplicate code is low, but the existing function names are long and less informative than a behavior-based name. > +static BlockNumber > +collect_corrupt_items_read_stream_next_block(ReadStream *stream, > + > void *callback_private_data, > + > void *per_buffer_data) > +{ > + struct collect_corrupt_items_read_stream_private *p = > callback_private_data; > + > + for (; p->blocknum < p->nblocks; p->blocknum++) > + { > + boolcheck_frozen = false; > + boolcheck_visible = false; > + > + if (p->all_frozen && VM_ALL_FROZEN(p->rel, p->blocknum, > p->vmbuffer)) > + check_frozen = true; > + if (p->all_visible && VM_ALL_VISIBLE(p->rel, p->blocknum, > p->vmbuffer)) > + check_visible = true; > + if (!check_visible && !check_frozen) > + continue; If a vm has zero bits set, this loop will scan the entire vm before returning. Hence, this loop deserves a CHECK_FOR_INTERRUPTS() or a comment about how VM_ALL_FROZEN/VM_ALL_VISIBLE reaches a CHECK_FOR_INTERRUPTS(). > @@ -687,6 +734,20 @@ collect_corrupt_i
Re: Adding clarification to description of IPC wait events XactGroupUpdate and ProcArrayGroupUpdate
On Tue, Aug 20, 2024 at 02:12:25PM +0530, Amit Kapila wrote: > +1 for Nathan's version. It is quite close to the previous version, > for which we haven't heard any complaints since they were introduced. Committed, thanks. -- nathan
Re: type cache cleanup improvements
On Mon, Aug 5, 2024 at 4:16 AM Alexander Korotkov wrote: > I've revised the patchset. First of all, I've re-ordered the patches. > > 0001-0002 (former 0002-0003) > Comprises hash_search_with_hash_value() function and its application > to avoid full hash iteration in InvalidateAttoptCacheCallback() and > TypeCacheTypCallback(). I think this is quite straightforward > optimization without negative side effects. I've revised comments, > commit message and did some code beautification. I'm going to push > this if no objections. > > 0003 (former 0001) > I've revised this patch. I think main concerns expressed in the > thread about this path is that we don't have invalidation mechanism > for relid => typid map. Finally due to oid wraparound same relids > could get reused. That could lead to invalid entries in the map about > existing relids and typeids. This is rather messy, but I don't think > this could cause a material bug. The maps items are used only for > cache invalidation. Extra invalidation doesn't cause a bug. If type > with same relid will be cached, then correspoding map item will be > overridden, so no missing invalidation. However, I see the following > reasons for keeping consistent state of relid => typid map. > > 1) As the main use-case for this optimization is flood of temporary > tables, it would be nice not let relid => typid map bloat in this > case. I see that TypeCacheHash would get bloated, because its entries > are never deleted. However, I would prefer to not get this situation > even worse. > 2) In future we may find some more use-cases for relid => typid map > besides cache invalidation. Keeping that in consistent state could be > advantage then. > > In the attached patch, I'm keeping relid => typid map when > corresponding typentry have either TCFLAGS_HAVE_PG_TYPE_DATA, or > TCFLAGS_OPERATOR_FLAGS, or tupdesc. Thus, when temporary table gets > deleted, we would invalidate the map item. > > It will be also nice to get rid of iteration over all the cached > domain types in TypeCacheRelCallback(). However, this typically > shouldn't be a problem since domain types are less tended to bloat. > Domain types are created manually, unlike composite types which are > automatically created for every temporary table. We will probably > need to optimize this in future, but I don't feel this to be necessary > in present patch. > > I think the revised 0003 requires review. The rebased remaining patch is attached. -- Regards, Alexander Korotkov Supabase v8-0001-Avoid-looping-over-all-type-cache-entries-in-Type.patch Description: Binary data
Re: Restart pg_usleep when interrupted
> As it looks like we have a consensus that reducing the number of interrupts > also > makes sense, I just provided a rebase version of the 1 Hz version (see [0], > that > also makes clear in the doc that the new field might show slightly old > values). That makes sense. However, I suspect the "1 Hz" work code will no longer be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back goes through. Maybe this extra work can be removed if [1] goes through. What do you think? With regards to CF 5118 and what it means to the current discussion, below are my thoughts. I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in place, pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This is because certain interrupts like Parallel Message and others are not signaled by SIGUSR1 any longer, but with latches. >From this discussion, there is desire for a sleep function that: 1/ Sleeps for the full duration of the requested time 2/ Continues to handle important interrupts during the sleep While something like CF 5118 will take care of point #1, it will not deal with #2. Also, the v11 [4] patch does not do #2 either. So I think in the sleep loop, we need a C_F_I call. The same type of loop can also be used to call WaitForSingleObject. If CF 5118 gets committed, will still need similar loop that calls C_F_I, but the function will need to call WaitLatchUs [5]. Thoughts? -- Sami [1] https://commitfest.postgresql.org/49/5118/ [2] https://www.postgresql.org/message-id/CA%2BhUKG%2Bf-nEc_SowDLW1JMUa6Of5sCK-JZ%3Dv-KhL1xgXk83fiw%40mail.gmail.com [3] https://www.postgresql.org/message-id/CA%2BhUKGKpo3fM%3DrnfdxHqt%2BjNGh_zUNcL1ob4hMsb%3DjFfKn9Aag%40mail.gmail.com [4] https://www.postgresql.org/message-id/e3297b5e-0b33-4f45-afcd-4b00ba0b3547%40gmail.com [5] https://www.postgresql.org/message-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=lufwpds...@mail.gmail.com
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 19:02, David G. Johnston wrote: > So basically my proposal amounted to making every update a "major version > update" and changing the behavior surrounding NegotiateProtocolVersion so it > applies to major version differences. I'll stand by that change in > definition. The current one doesn't seem all that useful anyway, and as we > only have a single version, definitely hasn't been materially implemented. > Otherwise, at some point a client that knows both v3 and v4 will exist and > its connection will be rejected instead of downgraded by a v3-only server > even though such a downgrade would be possible. I suspect we'd go ahead and > change the rule then - so why not just do so now, while getting rid of the > idea that minor versions are a thing. If we decide to never change the format of the StartupMessage again (which may be an okay thing to decide). Then I agree it would make sense to update the existing supported servers ASAP to be able to send back a NegotiateProtocolVersion message if they receive a 4.x StartupMessage, and the server only supports up to 3.x. However, even if we do that, I don't think it makes sense to start using the 4.0 version straight away. Because many older postgres servers would still throw an error when receiving the 4.x request. By using a 3.x version we are able to avoid those errors in the existing ecosystem. Basically, I think we should probably wait ~5 years again until we actually use a 4.0 version. i.e. I don't see serious benefits to using 4.0. The main benefit you seem to describe is: "it's theoretically cleaner to use major version bumps". And there is a serious downside: "seriously breaking the existing ecosystem". > I suppose we could leave minor versions for patch releases of the main server > version - which still leaves the first new feature of a release incrementing > the major version. That would be incidental to changing how we handle major > versions. Having a Postgres server patch update change the protocol version that the server supports sounds pretty scary to me.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 19:31, Jacob Champion wrote: > It's applicable to the use case I was talking about with Jelte. A > libpq client dropping down to the socket level is relying on > (implicit, currently undocumented/undecided, possibly incorrect!) > intermediary guarantees that the protocol provides for a major > version. I'm hoping we can provide some, since we haven't broken > anything yet. If we decide we can't, then so be it -- things will > break either way -- but it's still strange to me that we'd be okay > with literally zero forward compatibility and still call that a "minor > version". I think one compatibility guarantee that we might want to uphold is something like the following: After completing the initial connection setup, a server should only send new message types or new fields on existing message types when the client has specifically advertised support for that message type in one of two ways: 1. By configuring a specific protocol parameter 2. By sending a new message type or using a new field that implicitly advertises support for the new message type/fields. In this case the message should be of a request-response style, the server cannot assume that after the request-response communication happened this new message type is still supported by the client. The reasoning for this was discussed a while back upthread: This would be to allow a connection pooler (like PgBouncer) to have a pool of the highest minor version that the pooler supports e.g 3.8, but then it could still hand out these connections to clients that connected to the pooler using a lower version. Without having these clients receive messages that they do not support. Another way of describing this guarantee: If a client connects using 3.8 and configures no protocol parameters, the client needs to handle anything 3.8 specific that the handshake requires (such as longer cancel token). But then after that handshake it can choose to send only 3.0 packets and expect to receive only 3.0 packets back.
Re: Taking into account syncrep position in flush_lsn reported by apply worker
On 14/08/2024 16:54, Arseny Sher wrote: On 8/13/24 06:35, Amit Kapila wrote: On Mon, Aug 12, 2024 at 3:43 PM Arseny Sher wrote: Sorry for the poor formatting of the message above, this should be better: Hey. Currently synchronous_commit is disabled for logical apply worker on the ground that reported flush_lsn includes only locally flushed data so slot (publisher) preserves everything higher than this, and so in case of subscriber restart no data is lost. However, imagine that subscriber is made highly available by standby to which synchronous replication is enabled. Then reported flush_lsn is ignorant of this synchronous replication progress, and in case of failover data loss may occur if subscriber managed to ack flush_lsn ahead of syncrep. Won't the same can be achieved by enabling the synchronous_commit parameter for a subscription? Nope, because it would force WAL flush and wait for replication to the standby in the apply worker, slowing down it. The logic missing currently is not to wait for the synchronous commit, but still mind its progress in the flush_lsn reporting. I think this patch makes sense. I'm not sure we've actually made any promises on it, but it feels wrong that the slot's LSN might be advanced past the LSN that's been has been acknowledged by the replica, if synchronous replication is configured. I see little downside in making that promise. + /* +* If synchronous replication is configured, take into account its position. +*/ + if (SyncRepStandbyNames != NULL && SyncRepStandbyNames[0] != '\0') + { + LWLockAcquire(SyncRepLock, LW_SHARED); + local_flush = Min(local_flush, WalSndCtl->lsn[SYNC_REP_WAIT_FLUSH]); + LWLockRelease(SyncRepLock); + } + Should probably use the SyncStandbysDefined() macro here. Or check WalSndCtl->sync_standbys_defined like SyncRepWaitForLSN() does; not sure which would be more appropriate here. Should the synchronous_commit setting also affect this? Please also check if the docs need to be updated, or if a paragraph should be added somewhere on this behavior. A TAP test case would be nice. Not sure how complicated it will be, but if not too complicated, it'd be nice to include it in check-world. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Remove dependence on integer wrapping
I've combined all the current proposed changes into one patch. I've also introduced signed versions of the negation functions into int.h to avoid relying on multiplication. -- nathan >From 2364ba4028f879a22b9f69f999aee3ea9c013ec0 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 20 Aug 2024 16:12:39 -0500 Subject: [PATCH v26 1/1] Remove dependence on -fwrapv semantics in more places. --- src/backend/nodes/bitmapset.c | 2 +- src/backend/utils/adt/date.c | 6 +++- src/backend/utils/adt/formatting.c | 28 +-- src/backend/utils/hash/dynahash.c | 6 ++-- src/include/common/int.h | 48 ++ src/test/regress/expected/date.out | 2 ++ src/test/regress/expected/horology.out | 4 +++ src/test/regress/expected/union.out| 43 +++ src/test/regress/sql/date.sql | 1 + src/test/regress/sql/horology.sql | 2 ++ src/test/regress/sql/union.sql | 9 + 11 files changed, 143 insertions(+), 8 deletions(-) diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c index cd05c642b0..d37a997c0e 100644 --- a/src/backend/nodes/bitmapset.c +++ b/src/backend/nodes/bitmapset.c @@ -67,7 +67,7 @@ * we get zero. *-- */ -#define RIGHTMOST_ONE(x) ((signedbitmapword) (x) & -((signedbitmapword) (x))) +#define RIGHTMOST_ONE(x) ((bitmapword) (x) & (~((bitmapword) (x)) + 1)) #define HAS_MULTIPLE_ONES(x) ((bitmapword) RIGHTMOST_ONE(x) != (x)) diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c index 9c854e0e5c..0782e84776 100644 --- a/src/backend/utils/adt/date.c +++ b/src/backend/utils/adt/date.c @@ -257,7 +257,11 @@ make_date(PG_FUNCTION_ARGS) if (tm.tm_year < 0) { bc = true; - tm.tm_year = -tm.tm_year; + if (pg_neg_s32_overflow(tm.tm_year, &tm.tm_year)) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_FIELD_OVERFLOW), +errmsg("date field value out of range: %d-%02d-%02d", + tm.tm_year, tm.tm_mon, tm.tm_mday))); } dterr = ValidateDate(DTK_DATE_M, false, false, bc, &tm); diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 68069fcfd3..76bb3a79b5 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -77,6 +77,7 @@ #include "catalog/pg_collation.h" #include "catalog/pg_type.h" +#include "common/int.h" #include "common/unicode_case.h" #include "common/unicode_category.h" #include "mb/pg_wchar.h" @@ -3809,7 +3810,12 @@ DCH_from_char(FormatNode *node, const char *in, TmFromChar *out, ereturn(escontext,, (errcode(ERRCODE_INVALID_DATETIME_FORMAT), errmsg("invalid input string for \"Y,YYY\""))); - years += (millennia * 1000); + if (pg_mul_s32_overflow(millennia, 1000, &millennia) || + pg_add_s32_overflow(years, millennia, &years)) + ereturn(escontext,, + (errcode(ERRCODE_INVALID_DATETIME_FORMAT), + errmsg("invalid input string for \"Y,YYY\""))); + if (!from_char_set_int(&out->year, years, n, escontext)) return; out->yysz = 4; @@ -4797,11 +4803,27 @@ do_to_timestamp(text *date_txt, text *fmt, Oid collid, bool std, if (tmfc.bc) tmfc.cc = -tmfc.cc; if (tmfc.cc >= 0) + { /* +1 because 21st century started in 2001 */ - tm->tm_year = (tmfc.cc - 1) * 100 + 1; + /* tm->tm_year = (tmfc.cc - 1) * 100 + 1 */ + if (pg_mul_s32_overflow(tmfc.cc - 1, 100, &tm->tm_year) || + pg_add_s32_overflow(tm->tm_year, 1, &tm->tm_year)) + ereport(ERROR, + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE), +errmsg("date out of range: \"%s\"", + text_to_cstring(date_txt; + } else + { /* +1 because year == 599 is 600 BC */ - tm->tm_year = tmfc.cc * 100 + 1; + /* tm->tm_year = tmfc.cc * 10
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, Aug 20, 2024 at 12:55 PM Jelte Fennema-Nio wrote: > Another way of describing this guarantee: If a client connects using > 3.8 and configures no protocol parameters, the client needs to handle > anything 3.8 specific that the handshake requires (such as longer > cancel token). But then after that handshake it can choose to send > only 3.0 packets and expect to receive only 3.0 packets back. That guarantee (if adopted) would also make it possible for my use case to proceed correctly, since a libpq client can still speak 3.0 packets on the socket safely. But in that case, PQprotocolVersion should keep returning 3, because there's an explicit reason to care about the major version by itself. --Jacob
configure failures on chipmunk
Hi. I noticed chipmunk is failing in configure: checking whether the C compiler works... no configure: error: in `/home/pgbfarm/buildroot/HEAD/pgsql.build': configure: error: C compiler cannot create executables You may want to give it a look. Thanks! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind
On Thu, 25 Jul 2024 at 08:45, Anthonin Bonnefoy wrote: > +1 keeping this as a separate command and using \bind_named. \bind has > a different behaviour as it also parses the query so keeping them as > separate commands would probably avoid some confusion. +1 on naming it \bind_named @Anthonin are you planning to update the patch accordingly?
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 20 Aug 2024 at 23:48, Jacob Champion wrote: > That guarantee (if adopted) would also make it possible for my use > case to proceed correctly, since a libpq client can still speak 3.0 > packets on the socket safely. Not necessarily (at least not how I defined it). If a protocol parameter has been configured (possibly done by default by libpq), then that might not be the case anymore. So, you'd also need to compare the current values of the protocol parameters to their expected value. > But in that case, PQprotocolVersion > should keep returning 3, because there's an explicit reason to care > about the major version by itself. I agree that there's a reason to care about the major version then, but it might still be better to fail hard for people that care about protocol details. Because when writing the code that checked PQprotocolVersion there was no definition at all of what could change during a minor version bump. So there was no possibility to reason for that person if they should be notified of a minor version bump or not. So notifying the author of the code by failing the check would be erring on the safe side, because maybe they would need to update their code that depends on the protocol details. If not, and they then realize that our guarantee is strong enough for their use case, then they could replace their check with something like: PQprotocolVersion() >= 3 && PQprotocolVersion() < 4 To be clear, the argument for changing PQprotocolVersion() is definitely less strong if we'd provide such a guarantee. But I don't think the problem is completely gone either.
Re: Remaining reference to _PG_fini() in ldap_password_func
On Tue, Aug 20, 2024 at 09:05:54AM -0400, Tom Lane wrote: > Heikki Linnakangas writes: >> +1. There's also a prototype for _PG_fini() in fmgr.h, let's remove that >> too. > > +1. I think the fmgr.h prototype may have been left there > deliberately to avoid breaking extension code, but it's past > time to clean it up. Okay, I've removed all that then. Thanks for the feedback. -- Michael signature.asc Description: PGP signature
Re: Vacuum statistics
Hi! Thank you very much for your review! Sorry for my late response I was overwhelmed by tasks. On 16.08.2024 14:12, jian he wrote: On Thu, Aug 15, 2024 at 4:49 PM Alena Rybakina wrote: Hi! I've applied all the v5 patches. 0002 and 0003 have white space errors. + +Number of times blocks of this index were already found +in the buffer cache by vacuum operations, so that a read was not necessary +(this only includes hits in the +&project; buffer cache, not the operating system's file system cache) + +Number of times blocks of this table were already found +in the buffer cache by vacuum operations, so that a read was not necessary +(this only includes hits in the +&project; buffer cache, not the operating system's file system cache) + "&project;" represents a sgml file placeholder name as "project" and puts all the content of "project.sgml" to system-views.sgml. but you don't have "project.sgml". you may check doc/src/sgml/filelist.sgml or doc/src/sgml/ref/allfiles.sgml for usage of "&place_holder;". so you can change it to "project", otherwise doc cannot build. src/backend/commands/dbcommands.c we have: /* * If built with appropriate switch, whine when regression-testing * conventions for database names are violated. But don't complain during * initdb. */ #ifdef ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS if (IsUnderPostmaster && strstr(dbname, "regression") == NULL) elog(WARNING, "databases created by regression test cases should have names including \"regression\""); #endif so in src/test/regress/sql/vacuum_tables_and_db_statistics.sql you need to change dbname: CREATE DATABASE statistic_vacuum_database; CREATE DATABASE statistic_vacuum_database1; + + The view pg_stat_vacuum_indexes will contain + one row for each index in the current database (including TOAST + table indexes), showing statistics about vacuuming that specific index. + TOAST should TOAST + /* Build a tuple descriptor for our result type */ + if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) + elog(ERROR, "return type must be a row type"); maybe change to ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("return type must be a row type"))); Later I found out "InitMaterializedSRF(fcinfo, 0);" already did all the work. much of the code can be gotten rid of. please check attached. I agree with your suggestions for improving the code. I will add this in the next version of the patch. #define EXTVACHEAPSTAT_COLUMNS27 #define EXTVACIDXSTAT_COLUMNS19 #define EXTVACDBSTAT_COLUMNS15 #define EXTVACSTAT_COLUMNS Max(EXTVACHEAPSTAT_COLUMNS, EXTVACIDXSTAT_COLUMNS) static Oid CurrentDatabaseId = InvalidOid; we already defined MyDatabaseId in src/include/miscadmin.h, Why do we need "static Oid CurrentDatabaseId = InvalidOid;"? also src/backend/utils/adt/pgstatfuncs.c already included "miscadmin.h". Hmm, Tom Lane added "misc admin.h", or I didn't notice something. Could you point this out, please? We used the Current Database Id to output statistics on tables from another database, so we need to replace it with a different default value. But I want to rewrite this patch to display table statistics only for the current database, that is, this part will be removed in the future. In my opinion, it would be more correct. the following code one function has 2 return statements? /* * Get the vacuum statistics for the heap tables. */ Datum pg_stat_vacuum_tables(PG_FUNCTION_ARGS) { return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_HEAP, EXTVACHEAPSTAT_COLUMNS); PG_RETURN_NULL(); } /* * Get the vacuum statistics for the indexes. */ Datum pg_stat_vacuum_indexes(PG_FUNCTION_ARGS) { return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_INDEX, EXTVACIDXSTAT_COLUMNS); PG_RETURN_NULL(); } /* * Get the vacuum statistics for the database. */ Datum pg_stat_vacuum_database(PG_FUNCTION_ARGS) { return pg_stats_vacuum(fcinfo, PGSTAT_EXTVAC_DB, EXTVACDBSTAT_COLUMNS); PG_RETURN_NULL(); } You are right - the second return is superfluous. I'll fix it. in pg_stats_vacuum: if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP) { Oidrelid = PG_GETARG_OID(1); /* Load table statistics for specified database. */ if (OidIsValid(relid)) { tabentry = fetch_dbstat_tabentry(dbid, relid); if (tabentry == NULL || tabentry->vacuum_ext.type != type) /* Table don't exists or isn't an heap relation. */ PG_RETURN_NULL(); tuplestore_put_for_relation(relid, tupstore, tupdesc, tabentry, ncolumns); }
Re: Vacuum statistics
We check it there: "tabentry->vacuum_ext.type != type". Or were you talking about something else? On 19.08.2024 12:32, jian he wrote: in pg_stats_vacuum if (type == PGSTAT_EXTVAC_INDEX || type == PGSTAT_EXTVAC_HEAP) { Oidrelid = PG_GETARG_OID(1); /* Load table statistics for specified database. */ if (OidIsValid(relid)) { tabentry = fetch_dbstat_tabentry(dbid, relid); if (tabentry == NULL || tabentry->vacuum_ext.type != type) /* Table don't exists or isn't an heap relation. */ PG_RETURN_NULL(); tuplestore_put_for_relation(relid, rsinfo, tabentry); } else { } So for functions pg_stat_vacuum_indexes and pg_stat_vacuum_tables, it seems you didn't check "relid" 's relkind, you may need to use get_rel_relkind. -- Regards, Alena Rybakina Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Vacuum statistics
I think you've counted the above system tables from the database, but I'll double-check it. Thank you for your review! On 19.08.2024 19:28, Ilia Evdokimov wrote: Are you certain that all tables are included in `pg_stat_vacuum_tables`? I'm asking because of the following: SELECT count(*) FROM pg_stat_all_tables ; count --- 108 (1 row) SELECT count(*) FROM pg_stat_vacuum_tables ; count --- 20 (1 row) -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: configure failures on chipmunk
On Wed, Aug 21, 2024 at 9:48 AM Alvaro Herrera wrote: > Hi. I noticed chipmunk is failing in configure: > > checking whether the C compiler works... no > configure: error: in `/home/pgbfarm/buildroot/HEAD/pgsql.build': > configure: error: C compiler cannot create executables One of the runs shows: ./configure: line 4202: 28268 Bus error $CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5
Re: ANALYZE ONLY
On Wed, 21 Aug 2024 at 06:41, Robert Haas wrote: > I like trying to use ONLY somehow. Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table; or as a table modifier like gram.y's extended_relation_expr? Making it a command option means that the option would apply to all tables listed, whereas if it was more like an extended_relation_expr, the option would be applied per table listed in the command. 1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its partitions but get stats on ptab2 and stats on its partitions too. 2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2 without doing that on any of their partitions. Whereas: "ANALYZE (ONLY) ptab, ptab2;" would always give you the behaviour of #2. If we did it as a per-table option, then we'd need to consider what should happen if someone did: "VACUUM ONLY parttab;". Probably silently doing nothing wouldn't be good. Maybe a warning, akin to what's done in: postgres=# analyze (skip_locked) a; WARNING: skipping analyze of "a" --- lock not available David
Re: ANALYZE ONLY
David Rowley writes: > On Wed, 21 Aug 2024 at 06:41, Robert Haas wrote: >> I like trying to use ONLY somehow. > Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table; > or as a table modifier like gram.y's extended_relation_expr? > Making it a command option means that the option would apply to all > tables listed, whereas if it was more like an extended_relation_expr, > the option would be applied per table listed in the command. > 1. ANALYZE ONLY ptab, ptab2; -- gather stats on ptab but not on its > partitions but get stats on ptab2 and stats on its partitions too. > 2. ANALYZE ONLY ptab, ONLY ptab2; -- gather stats on ptab and ptab2 > without doing that on any of their partitions. FWIW, I think that's the right approach, for consistency with the way that ONLY works in DML. regards, tom lane
Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
On Tue, Aug 20, 2024 at 12:10:08PM -0400, Tom Lane wrote: > ... which would also imply writing documentation and so forth, > and it'd mean that injection_points starts to show up in end-user > installations. (That would happen with the alternative choice of > hacking install-world to include src/test/modules/injection_points, > too.) While you could argue that that'd be helpful for extension > authors who'd like to use injection_points in their own tests, I'm > not sure that it's where we want to go with that module. It's only > meant as test scaffolding, and I don't think we've analyzed the > implications of some naive user installing it. The original line of thoughts here is that if I were to write a test that relies on injection points for a critical bug fix, then I'd rather not have to worry about the hassle of maintaining user-facing documentation to get the work done. The second line is that we should be able to tweak the module or even break its ABI as much as we want in stable branches to accomodate to the tests, which is what a test module is good for. The same ABI argument kind of stands as well for the backend portion, but we'll see where it goes. > We do, however, need to preserve the property that installcheck > works after install-world. I'm starting to think that maybe > the 041 test should be hacked to silently skip if it doesn't > find injection_points available. (We could then remove some of > the makefile hackery that's supporting the current behavior.) > Probably the same needs to happen in each other test script > that's using injection_points --- I imagine that Maxim's > test is simply failing here first. Yeah, we could do that. -- Michael signature.asc Description: PGP signature
Re: MultiXact\SLRU buffers configuration
On Tue, Aug 20, 2024 at 02:37:34PM -0400, Alvaro Herrera wrote: > OK, I've made some minor adjustments and pushed. CI seemed OK for me, > let's see what does the BF have to say. I see that you've gone the way with the SQL function doing a load(). Would it be worth switching the test to rely on the two macros for load and caching instead? I've mentioned that previously but never got down to present a patch for the sake of this test. This requires some more tweaks in the module to disable the stats when loaded through a GUC, and two shmem callbacks, then the test is able to work correctly. Please see attached. Thoughts? -- Michael From db4b4601a6e930a374f6d03b622f66589d620fe2 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 21 Aug 2024 08:17:19 +0900 Subject: [PATCH] Refactor injection point test with loading This relies on the two macros INJECTION_POINT_LOAD and INJECTION_POINT_CACHED to trigger the test. This adds to the test module injection_points a GUC to be able to disable injection point stats, in case a cached point is run in a critical section, and a code path to initialize the shmem state data of the module when loading the module. --- src/backend/access/transam/multixact.c| 5 +- .../injection_points/injection_points.c | 80 ++- .../injection_points/injection_stats.c| 8 +- .../injection_points/injection_stats.h| 3 + .../injection_points/injection_stats_fixed.c | 4 +- src/test/modules/test_slru/t/001_multixact.pl | 5 +- 6 files changed, 93 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 14c2b929e2..d2d0298ac3 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -855,6 +855,9 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members) } } + /* Load the injection point before entering the critical section */ + INJECTION_POINT_LOAD("multixact-create-from-members"); + /* * Assign the MXID and offsets range to use, and make sure there is space * in the OFFSETs and MEMBERs files. NB: this routine does @@ -869,7 +872,7 @@ MultiXactIdCreateFromMembers(int nmembers, MultiXactMember *members) */ multi = GetNewMultiXactId(nmembers, &offset); - INJECTION_POINT("multixact-create-from-members"); + INJECTION_POINT_CACHED("multixact-create-from-members"); /* Make an XLOG entry describing the new MXID. */ xlrec.mid = multi; diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c index 4e775c7ec6..8b14f5fc19 100644 --- a/src/test/modules/injection_points/injection_points.c +++ b/src/test/modules/injection_points/injection_points.c @@ -28,6 +28,7 @@ #include "storage/lwlock.h" #include "storage/shmem.h" #include "utils/builtins.h" +#include "utils/guc.h" #include "utils/injection_point.h" #include "utils/memutils.h" #include "utils/wait_event.h" @@ -68,7 +69,12 @@ typedef struct InjectionPointCondition */ static List *inj_list_local = NIL; -/* Shared state information for injection points. */ +/* + * Shared state information for injection points. + * + * This state data can be initialized in two ways: dynamically with a DSM + * or when loading the module. + */ typedef struct InjectionPointSharedState { /* Protects access to other fields */ @@ -97,8 +103,16 @@ extern PGDLLEXPORT void injection_wait(const char *name, /* track if injection points attached in this process are linked to it */ static bool injection_point_local = false; +/* GUC variable */ +bool inj_stats_enabled = true; + +/* Shared memory init callbacks */ +static shmem_request_hook_type prev_shmem_request_hook = NULL; +static shmem_startup_hook_type prev_shmem_startup_hook = NULL; + /* - * Callback for shared memory area initialization. + * Routine for shared memory area initialization, used as a callback + * when initializing dynamically with a DSM or when loading the module. */ static void injection_point_init_state(void *ptr) @@ -111,8 +125,51 @@ injection_point_init_state(void *ptr) ConditionVariableInit(&state->wait_point); } +/* Shared memory initialization when loading module */ +static void +injection_shmem_request(void) +{ + Size size; + + if (prev_shmem_request_hook) + prev_shmem_request_hook(); + + size = MAXALIGN(sizeof(InjectionPointSharedState)); + RequestAddinShmemSpace(size); +} + +static void +injection_shmem_startup(void) +{ + bool found; + + if (prev_shmem_startup_hook) + prev_shmem_startup_hook(); + + /* reset in case this is a restart within the postmaster */ + inj_state = NULL; + + /* Create or attach to the shared memory state */ + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + + inj_state = ShmemInitStruct("injection_points", +sizeof(InjectionPointSharedState), +&found); + + if (!found) + { + /* + * First time through, so initialize. This is shared with the + * dynami
Re: MultiXact\SLRU buffers configuration
On 2024-Aug-21, Michael Paquier wrote: > I see that you've gone the way with the SQL function doing a load(). > Would it be worth switching the test to rely on the two macros for > load and caching instead? I've mentioned that previously but never > got down to present a patch for the sake of this test. Hmm, I have no opinion on which way is best. You probably have a better sense of what's better for the injections point interface, so I'm happy to defer to you on this. > + /* reset in case this is a restart within the postmaster */ > + inj_state = NULL; I'm not sure that this assignment actually accomplishes anything ... I don't understand what do the inj_stats_enabled stuff have to do with this patch. I suspect it's a git operation error, ie., you seem to have squashed two different things together. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Industry suffers from the managerial dogma that for the sake of stability and continuity, the company should be independent of the competence of individual employees." (E. Dijkstra)
Re: POC, WIP: OR-clause support for indexes
Hi! On Thu, Aug 15, 2024 at 10:13 PM Alena Rybakina wrote: > On 07.08.2024 04:11, Alexander Korotkov wrote: > > On Mon, Aug 5, 2024 at 11:24 PM Alena Rybakina > > wrote: > >> Ok, thank you for your work) > >> > >> I think we can leave only the two added libraries in the first patch, > >> others are superfluous. > > Thank you. > > I also have fixed some grammar issues. > > While reviewing the patch, I can't understand one part of the code where > we check the comparability of restrictinfos. > > /* RestrictInfo parameters dmust match parent */ > if (subRinfo->is_pushed_down != rinfo->is_pushed_down || > subRinfo->is_clone != rinfo->is_clone || > subRinfo->security_level != rinfo->security_level || > !bms_equal(subRinfo->required_relids, > rinfo->required_relids) || > !bms_equal(subRinfo->incompatible_relids, > rinfo->incompatible_relids) || > !bms_equal(subRinfo->outer_relids, rinfo->outer_relids)) > return NULL; > > I didn't find a place in the optimizer where required_relids, > incompatible_relids and outer_relids become different. Each > make_restrictinfo function takes arguments from > parent data. > > I disabled this check and the regression tests passed. This code is > needed for security verification, may I clarify? Thank you for pointing this. I've rechecked the life cycle of those parameters. make_restrictinfo() makes them initially equal (except required_relids which might be narrower for sub-clauses). The later changes like adjust_appendrel_attrs_mutator() applies equally for the both parent and children. So, I've turned this into assert check. > In the last patch I corrected the libraries - one of them was not in > alphabetical order. Thank you! Also, I convert the check you've introduced in the previous message to op_in_opfamily(), and introduced collation check similar to match_opclause_to_indexcol(). -- Regards, Alexander Korotkov Supabase v35-0002-Teach-bitmap-path-generation-about-transforming-.patch Description: Binary data v35-0001-Transform-OR-clauses-to-SAOP-s-during-index-matc.patch Description: Binary data
Re: Buf fix: update-po for PGXS does not work
On 2023-Oct-27, Ryo Matsumura (Fujitsu) wrote: > Hi hackers, > > I found that 'make update-po' for PGXS does not work. > Even if execute 'make update-po', but xx.po.new is not generated. > I don't test and check for meson build system, but I post it tentatively. > > I attached patch and test set. > 'update-po' tries to find *.po files $top_srcdir, but there is no po-file in > PGXS system because $top_srcdir is install directory. Thanks. I think you have the order of the ifdef nest backwards; even in the PGXS case we should have "ALL_LANGUAGES = $(AVAIL_LANGUAGES)" unless we're making update-po. Here I present it the other way around. Regards -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "World domination is proceeding according to plan"(Andrew Morton) >From 7c1e3d06f4ad09500cc233c91fc4108ef58ae638 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=C3=81lvaro=20Herrera?= Date: Tue, 20 Aug 2024 20:38:15 -0400 Subject: [PATCH v2] fix update-po for the PGXS case Author: Ryo Matsumura Discussion: https://postgr.es/m/tycpr01mb113164770fb0b0be6ed21e68ee8...@tycpr01mb11316.jpnprd01.prod.outlook.com --- src/nls-global.mk | 5 + 1 file changed, 5 insertions(+) diff --git a/src/nls-global.mk b/src/nls-global.mk index dfff472cb3..73a6db10a1 100644 --- a/src/nls-global.mk +++ b/src/nls-global.mk @@ -142,8 +142,13 @@ init-po: po/$(CATALOG_NAME).pot # For performance reasons, only calculate these when the user actually # requested update-po or a specific file. ifneq (,$(filter update-po %.po.new,$(MAKECMDGOALS))) +ifdef PGXS +ALL_LANGUAGES := $(shell find . -name '*.po' -print | sed 's,^.*/\([^/]*\).po$$,\1,' | LC_ALL=C sort -u) +all_compendia := $(shell find . -name '*.po' -print | LC_ALL=C sort) +else ALL_LANGUAGES := $(shell find $(top_srcdir) -name '*.po' -print | sed 's,^.*/\([^/]*\).po$$,\1,' | LC_ALL=C sort -u) all_compendia := $(shell find $(top_srcdir) -name '*.po' -print | LC_ALL=C sort) +endif else ALL_LANGUAGES = $(AVAIL_LANGUAGES) all_compendia = FORCE -- 2.39.2
Re: [Bug Fix]standby may crash when switching-over in certain special cases
Yugo Nagata 于2024年8月21日周三 00:49写道: > > > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the standbys > of > > the primary server respectively, it is not surprising that s2 has > progressed > > far than s1 when the primary fails. I believe that this is the case you > should > > use pg_rewind. Even if flushedUpto is reset as proposed in your patch, > s2 might > > already have applied a WAL record that s1 has not processed yet, and > there > > would be no gurantee that subsecuent applys suceed. > > Thank you for your response. In my scenario, s1 and s2 is the standbys of the primary server respectively, and s1 a synchronous standby and s2 is an asynchronous standby. You mentioned that if s2's replay progress is ahead of s1, pg_rewind should be used. However, what I'm trying to address is an issue where s2 crashes during replay after s1 has been promoted to primary, even though s2's progress hasn't surpassed s1. Regards, Pixian Shi
Re: Conflict detection and logging in logical replication
On 8/6/24 4:15 AM, Zhijie Hou (Fujitsu) wrote: Thanks for the idea! I thought about few styles based on the suggested format, what do you think about the following ? Thanks for proposing formats. Before commenting on the specifics, I do want to ensure that we're thinking about the following for the log formats: 1. For the PostgreSQL logs, we'll want to ensure we do it in a way that's as convenient as possible for people to parse the context from scripts. 2. Semi-related, I still think the simplest way to surface this info to a user is through a "pg_stat_..." view or similar catalog mechanism (I'm less opinionated on the how outside of we should make it available via SQL). 3. We should ensure we're able to convey to the user these details about the conflict: * What time it occurred on the local server (which we'd have in the logs) * What kind of conflict it is * What table the conflict occurred on * What action caused the conflict * How the conflict was resolved (ability to include source/origin info) With that said: --- Version 1 --- LOG: CONFLICT: insert_exists; DESCRIPTION: remote INSERT violates unique constraint "uniqueindex" on relation "public.test". DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) xid=123,origin="pub",timestamp=xxx; remote tuple (a, b, c) = (2, 4, 5). LOG: CONFLICT: update_differ; DESCRIPTION: updating a row with key (a, b) = (2, 4) on relation "public.test" was modified by a different source. DETAIL: Existing local tuple (a, b, c) = (2, 3, 4) xid=123,origin="pub",timestamp=xxx; remote tuple (a, b, c) = (2, 4, 5). LOG: CONFLICT: update_missing; DESCRIPTION: did not find the row with key (a, b) = (2, 4) on "public.test" to update. DETAIL: remote tuple (a, b, c) = (2, 4, 5). I agree with Amit's downthread comment, I think this tries to put much too much info on the LOG line, and it could be challenging to parse. --- Version 2 It moves most the details to the DETAIL line compared to version 1. --- LOG: CONFLICT: insert_exists on relation "public.test". DETAIL: Key (a)=(1) already exists in unique index "uniqueindex", which was modified by origin "pub" in transaction 123 at 2024xxx; Existing local tuple (a, b, c) = (1, 3, 4), remote tuple (a, b, c) = (1, 4, 5). LOG: CONFLICT: update_differ on relation "public.test". DETAIL: Updating a row with key (a, b) = (2, 4) that was modified by a different origin "pub" in transaction 123 at 2024xxx; Existing local tuple (a, b, c) = (2, 3, 4); remote tuple (a, b, c) = (2, 4, 5). LOG: CONFLICT: update_missing on relation "public.test". DETAIL: Did not find the row with key (a, b) = (2, 4) to update; Remote tuple (a, b, c) = (2, 4, 5). I like the brevity of the LOG line, while it still provides a lot of info. I think we should choose the words/punctuation in the DETAIL carefully so it's easy for scripts to ultimately parse (even if we expose info in pg_stat, people may need to refer to older log files to understand how a conflict resolved). --- Version 3 It is similar to the style in the current patch, I only added the key value for differ and missing conflicts without outputting the complete remote/local tuple value. --- LOG: conflict insert_exists detected on relation "public.test". DETAIL: Key (a)=(1) already exists in unique index "uniqueindex", which was modified by origin "pub" in transaction 123 at 2024xxx. LOG: conflict update_differ detected on relation "public.test". DETAIL: Updating a row with key (a, b) = (2, 4), which was modified by a different origin "pub" in transaction 123 at 2024xxx. LOG: conflict update_missing detected on relation "public.test" DETAIL: Did not find the row with key (a, b) = (2, 4) to update. I think outputting the remote/local tuple value may be a parameter we need to think about (with the desired outcome of trying to avoid another parameter). I have a concern about unintentionally leaking data (and I understand that someone with access to the logs does have a broad ability to view data); I'm less concerned about the size of the logs, as conflicts in a well-designed system should be rare (though a conflict storm could fill up the logs, likely there are other issues to content with at that point). Thanks, Jonathan OpenPGP_signature.asc Description: OpenPGP digital signature
optimize hashjoin
Avoid unnecessary form and deform tuple. In the TPCH test, HashJoin speed up to ~2x. diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c index 61480733a1..2dad0c8a55 100644 --- a/src/backend/executor/nodeHash.c +++ b/src/backend/executor/nodeHash.c @@ -1627,6 +1627,23 @@ ExecHashTableInsert(HashJoinTable hashtable, { boolshouldFree; MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); + + ExecHashTableInsertTuple(hashtable, tuple, hashvalue); + + if (shouldFree) + heap_free_minimal_tuple(tuple); +} + +/* + * ExecHashTableInsert + * insert a tuple into the hash table depending on the hash value + * it may just go to a temp file for later batches + */ +void +ExecHashTableInsertTuple(HashJoinTable hashtable, +MinimalTuple tuple, +uint32 hashvalue) +{ int bucketno; int batchno; @@ -1701,9 +1718,6 @@ ExecHashTableInsert(HashJoinTable hashtable, &hashtable->innerBatchFile[batchno], hashtable); } - - if (shouldFree) - heap_free_minimal_tuple(tuple); } /* @@ -1777,12 +1791,10 @@ retry: * tuples that belong in the current batch once growth has been disabled. */ void -ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable, - TupleTableSlot *slot, - uint32 hashvalue) +ExecParallelHashTableInsertCurrentBatchTuple(HashJoinTable hashtable, + MinimalTuple tuple, + uint32 hashvalue) { - boolshouldFree; - MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); HashJoinTuple hashTuple; dsa_pointer shared; int batchno; @@ -1798,6 +1810,21 @@ ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable, HeapTupleHeaderClearMatch(HJTUPLE_MINTUPLE(hashTuple)); ExecParallelHashPushTuple(&hashtable->buckets.shared[bucketno], hashTuple, shared); +} + +/* + * like ExecParallelHashTableInsertCurrentBatchTuple, + * but this function accept a TupleTableSlot + */ +void +ExecParallelHashTableInsertCurrentBatch(HashJoinTable hashtable, + TupleTableSlot *slot, + uint32 hashvalue) +{ + boolshouldFree; + MinimalTuple tuple = ExecFetchSlotMinimalTuple(slot, &shouldFree); + + ExecParallelHashTableInsertCurrentBatchTuple(hashtable, tuple, hashvalue); if (shouldFree) heap_free_minimal_tuple(tuple); diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 5f4073eabd..002098f129 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -194,10 +194,10 @@ static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode, static TupleTableSlot *ExecParallelHashJoinOuterGetTuple(PlanState *outerNode, HashJoinState *hjstate, uint32 *hashvalue); -static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate, - BufFile *file, - uint32 *hashvalue, - TupleTableSlot *tupleSlot); +static MinimalTuple ExecHashJoinGetSavedTuple(HashJoinState *hjstate, + BufFile *file, + uint32 *hashvalue, + StringInfo buf); static bool ExecHashJoinNewBatch(HashJoinState *hjstate); static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate); static void ExecParallelHashJoinPartitionOuter(HashJoinState *hjstate); @@ -831,6 +831,7 @@ ExecInitHashJoin(HashJoin *node, EState *estate, int eflag
Re: ANALYZE ONLY
On Tue, Aug 20, 2024 at 6:53 PM David Rowley wrote: > On Wed, 21 Aug 2024 at 06:41, Robert Haas wrote: > > I like trying to use ONLY somehow. > > Do you mean as an ANALYZE command option, i.e. ANALYZE (only) table; > or as a table modifier like gram.y's extended_relation_expr? The table modifier idea seems nice to me. > If we did it as a per-table option, then we'd need to consider what > should happen if someone did: "VACUUM ONLY parttab;". Probably > silently doing nothing wouldn't be good. Maybe a warning, akin to > what's done in: > > postgres=# analyze (skip_locked) a; > WARNING: skipping analyze of "a" --- lock not available Perhaps. I'm not sure about this part. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Logical Replication of sequences
Hi Vignesh, Here are my only review comments for the latest patch set. v20240820-0003. nit - missing period for comment in FetchRelationStates nit - typo in function name 'ProcessSyncingTablesFoSync' == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/syncutils.c b/src/backend/replication/logical/syncutils.c index b8f9300..705a330 100644 --- a/src/backend/replication/logical/syncutils.c +++ b/src/backend/replication/logical/syncutils.c @@ -96,7 +96,7 @@ SyncProcessRelations(XLogRecPtr current_lsn) break; case WORKERTYPE_TABLESYNC: - ProcessSyncingTablesFoSync(current_lsn); + ProcessSyncingTablesForSync(current_lsn); break; case WORKERTYPE_APPLY: @@ -143,7 +143,7 @@ FetchRelationStates(bool *started_tx) *started_tx = true; } - /* Fetch tables that are in non-ready state */ + /* Fetch tables that are in non-ready state. */ rstates = GetSubscriptionRelations(MySubscription->oid, true); /* Allocate the tracking info in a permanent memory context. */ diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index ad92b84..c753f45 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -237,7 +237,7 @@ wait_for_worker_state_change(char expected_state) * SYNCDONE and finish. */ void -ProcessSyncingTablesFoSync(XLogRecPtr current_lsn) +ProcessSyncingTablesForSync(XLogRecPtr current_lsn) { SpinLockAcquire(&MyLogicalRepWorker->relmutex); diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h index 24f74ab..6504b70 100644 --- a/src/include/replication/worker_internal.h +++ b/src/include/replication/worker_internal.h @@ -264,7 +264,7 @@ extern void UpdateTwoPhaseState(Oid suboid, char new_state); extern bool FetchRelationStates(bool *started_tx); extern bool WaitForRelationStateChange(Oid relid, char expected_state); -extern void ProcessSyncingTablesFoSync(XLogRecPtr current_lsn); +extern void ProcessSyncingTablesForSync(XLogRecPtr current_lsn); extern void ProcessSyncingTablesForApply(XLogRecPtr current_lsn); extern void SyncProcessRelations(XLogRecPtr current_lsn); extern void SyncInvalidateRelationStates(Datum arg, int cacheid,
RE: Conflict detection and logging in logical replication
On Wednesday, August 21, 2024 9:33 AM Jonathan S. Katz wrote: > On 8/6/24 4:15 AM, Zhijie Hou (Fujitsu) wrote: > > > Thanks for the idea! I thought about few styles based on the suggested > > format, what do you think about the following ? > > Thanks for proposing formats. Before commenting on the specifics, I do want to > ensure that we're thinking about the following for the log formats: > > 1. For the PostgreSQL logs, we'll want to ensure we do it in a way that's as > convenient as possible for people to parse the context from scripts. Yeah. And I personally think the current log format is OK for parsing purposes. > > 2. Semi-related, I still think the simplest way to surface this info to a > user is > through a "pg_stat_..." view or similar catalog mechanism (I'm less > opinionated > on the how outside of we should make it available via SQL). We have a patch(v19-0002) in this thread to collect conflict stats and display them in the view, and the patch is under review. Storing it into a catalog needs more analysis as we may need to add addition logic to clean up old conflict data in that catalog table. I think we can consider it as a future improvement. > > 3. We should ensure we're able to convey to the user these details about the > conflict: > > * What time it occurred on the local server (which we'd have in the logs) > * What kind of conflict it is > * What table the conflict occurred on > * What action caused the conflict > * How the conflict was resolved (ability to include source/origin info) I think all above are already covered in the current conflict log. Except that we have not support resolving the conflict, so we don't log the resolution. > > > I think outputting the remote/local tuple value may be a parameter we need to > think about (with the desired outcome of trying to avoid another parameter). I > have a concern about unintentionally leaking data (and I understand that > someone with access to the logs does have a broad ability to view data); I'm > less concerned about the size of the logs, as conflicts in a well-designed > system should be rare (though a conflict storm could fill up the logs, likely > there > are other issues to content with at that point). We could use an option to control, but the tuple value is already output in some existing cases (e.g. partition check, table constraints check, view with check constraints, unique violation), and it would test the current user's privileges to decide whether to output the tuple or not. So, I think it's OK to display the tuple for conflicts. Best Regards, Hou zj
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
On 2024-Aug-20, Jehan-Guillaume de Rorthais wrote: > I'm back on this issue as well. I start poking at this patch to review it, > test it, challenge it and then report here. > > I'll try to check if some other issues might have lost/forgot on they way as > well. Thanks, much appreciated, looking forward to your feedback. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Conflict detection and logging in logical replication
On Tue, Aug 20, 2024 at 4:45 PM Zhijie Hou (Fujitsu) wrote: > > Here are the remaining patches. > > 0001 adds additional doc to explain the log format. Thanks for the patch. Please find few comments on 001: 1) +Key (column_name, ...)=(column_name, ...); existing local tuple (column_name, ...)=(column_name, ...); remote tuple (column_name, ...)=(column_name, ...); replica identity (column_name, ...)=(column_name, ...). -- column_name --> column_value everywhere in right to '='? 2) + Note that for an + update operation, the column value of the new tuple may be NULL if the + value is unchanged. -- Shall we mention the toast value here? In all other cases, we get a full new tuple. 3) + The key section in the second sentence of the DETAIL line includes the key values of the tuple that already exists in the local relation for insert_exists or update_exists conflicts. -- Shall we mention the key is the column value violating a unique constraint? Something like this: The key section in the second sentence of the DETAIL line includes the key values of the local tuple that violates unique constraint for insert_exists or update_exists conflicts. 4) Shall we give an example LOG message in the end? thanks Shveta
Re: MultiXact\SLRU buffers configuration
On Tue, Aug 20, 2024 at 08:13:12PM -0400, Alvaro Herrera wrote: > I don't understand what do the inj_stats_enabled stuff have to do with > this patch. I suspect it's a git operation error, ie., you seem to have > squashed two different things together. Sorry, I should have split that for clarity (one patch for the GUC, one to change the test to use CACHED/LOAD). It is not an error though: if we don't have a controlled way to disable the stats of the module, then the test would fail when calling the cached callback because we'd try to allocate some memory for the dshash entry in pgstats. The second effect of initializing the shmem state of the module with shared_preload_libraries is condition variables are set up for the sake if the test, removing the dependency to the SQL load() call. Both are OK, but I'd prefer introducing one use case for these two macros in the tree, so as these can be used as a reference in the future when developing new tests. -- Michael signature.asc Description: PGP signature
Re: Fix memory counter update in reorderbuffer
On Tue, Aug 20, 2024 at 5:55 PM Masahiko Sawada wrote: > > On Fri, Aug 16, 2024 at 12:22 AM Shlok Kyal wrote: > > > > Thank you for testing the patch! > > I'm slightly hesitant to add a test under src/test/subscription since > it's a bug in ReorderBuffer and not specific to logical replication. > If we reasonably cannot add a test under contrib/test_decoding, I'm > okay with adding it under src/test/subscription. > Sounds like a reasonable approach. > I've attached the updated patch with the commit message (but without a > test case for now). > The patch LGTM except for one minor comment. + /* All changes must be returned */ + Assert(txn->size == 0); Isn't it better to say: "All changes must be deallocated." in the above comment? -- With Regards, Amit Kapila.
Re: define PG_REPLSLOT_DIR
If this isn't in commitfest already, please add it there. On Tue, Aug 20, 2024 at 9:58 PM Bertrand Drouvot wrote: > > Hi, > > On Tue, Aug 20, 2024 at 12:06:52PM +, Bertrand Drouvot wrote: > > Hi, > > > > On Tue, Aug 20, 2024 at 05:41:48PM +0900, Michael Paquier wrote: > > > On Tue, Aug 20, 2024 at 11:10:46AM +0530, Ashutosh Bapat wrote: > > > > Since these are all related changes, doing them at once might make it > > > > faster. You may use multiple commits (one for each change) > > > > > > Doing multiple commits with individual definitions for each path would > > > be the way to go for me. All that is mechanical, still that feels > > > slightly cleaner. > > > > Right, that's what v2 has done. If there is a need for v3 then I'll add one > > dedicated patch for Ashutosh's proposal in the v3 series. > > Ashutosh's idea is implemented in v3 that I just shared up-thread (I used > REPLSLOT_DIR_ARGS though). > > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com -- Best Wishes, Ashutosh Bapat
Re: define PG_REPLSLOT_DIR
Hi, On Wed, Aug 21, 2024 at 10:14:20AM +0530, Ashutosh Bapat wrote: > If this isn't in commitfest already, please add it there. > Done in [0]. [0]: https://commitfest.postgresql.org/49/5193/ Thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Conflict detection and logging in logical replication
Dear Hou, Thanks for updating the patch! I think the patch is mostly good. Here are minor comments. 0001: 01. ``` + +LOG: conflict detected on relation "schemaname.tablename": conflict=conflict_type +DETAIL: detailed explaination. ... + ``` I don't think the label is correct. label should be used for the actual example output, not for explaining the format. I checked several files like amcheck.sgml and auto-exlain.sgml and func.sgml and they seemed to follow the rule. 02. ``` + + The key section in the second sentence of the ... ``` I preferred that section name is quoted. 0002: 03. ``` -#include "replication/logicalrelation.h" ``` Just to confirm - this removal is not related with the feature but just the improvement, right? Best regards, Hayato Kuroda FUJITSU LIMITED
Re: Conflict detection and logging in logical replication
On Tue, Aug 20, 2024 at 4:45 PM Zhijie Hou (Fujitsu) wrote: > > Here are the remaining patches. > > 0001 adds additional doc to explain the log format. > 0002 collects statistics about conflicts in logical replication. > 0002 has not changed since I last reviewed it. It seems all my old comments are addressed. One trivial thing: I feel in doc, we shall mention that any of the conflicts resulting in apply-error will be counted in both apply_error_count and the corresponding _count. What do you think? thanks Shveta
Disallow USING clause when altering type of generated column
A USING clause when altering the type of a generated column does not make sense. It would write the output of the USING clause into the converted column, which would violate the generation expression. This patch adds a check to error out if this is specified. There was a test for this, but that test errored out for a different reason, so it was not effective. discovered by Jian He at [0] [0]: https://www.postgresql.org/message-id/cacjufxegpytfe79hbsmeoboivfnnprsw7gjvk67m1x2mqgg...@mail.gmail.com
Re: Taking into account syncrep position in flush_lsn reported by apply worker
On Wed, Aug 21, 2024 at 2:25 AM Heikki Linnakangas wrote: > > On 14/08/2024 16:54, Arseny Sher wrote: > > On 8/13/24 06:35, Amit Kapila wrote: > >> On Mon, Aug 12, 2024 at 3:43 PM Arseny Sher wrote: > >>> > >>> Sorry for the poor formatting of the message above, this should be > >>> better: > >>> > >>> Hey. Currently synchronous_commit is disabled for logical apply worker > >>> on the ground that reported flush_lsn includes only locally flushed data > >>> so slot (publisher) preserves everything higher than this, and so in > >>> case of subscriber restart no data is lost. However, imagine that > >>> subscriber is made highly available by standby to which synchronous > >>> replication is enabled. Then reported flush_lsn is ignorant of this > >>> synchronous replication progress, and in case of failover data loss may > >>> occur if subscriber managed to ack flush_lsn ahead of syncrep. > >> > >> Won't the same can be achieved by enabling the synchronous_commit > >> parameter for a subscription? > > > > Nope, because it would force WAL flush and wait for replication to the > > standby in the apply worker, slowing down it. The logic missing > > currently is not to wait for the synchronous commit, but still mind its > > progress in the flush_lsn reporting. > > I think this patch makes sense. I'm not sure we've actually made any > promises on it, but it feels wrong that the slot's LSN might be advanced > past the LSN that's been has been acknowledged by the replica, if > synchronous replication is configured. I see little downside in making > that promise. > One possible downside of such a promise could be that the publisher may slow down for sync replication because it has to wait for all the configured sync_standbys of subscribers to acknowledge the LSN. I don't know how many applications can be impacted due to this if we do it by default but if we feel there won't be any such cases or they will be in the minority then it is okay to proceed with this. -- With Regards, Amit Kapila.
Re: MultiXact\SLRU buffers configuration
On Wed, Aug 21, 2024 at 12:46:31PM +0900, Michael Paquier wrote: > Sorry, I should have split that for clarity (one patch for the GUC, > one to change the test to use CACHED/LOAD). It is not an error > though: if we don't have a controlled way to disable the stats of the > module, then the test would fail when calling the cached callback > because we'd try to allocate some memory for the dshash entry in > pgstats. > > The second effect of initializing the shmem state of the module with > shared_preload_libraries is condition variables are set up for the > sake if the test, removing the dependency to the SQL load() call. > Both are OK, but I'd prefer introducing one use case for these two > macros in the tree, so as these can be used as a reference in the > future when developing new tests. In short, here is a better patch set, with 0001 and 0002 introducing the pieces that the test would need to be able to use the LOAD() and CACHED() macros in 0003: - 0001: Add shmem callbacks to initialize shmem state of injection_points with shared_preload_libraries. - 0002: Add a GUC to control if the stats of the module are enabled. By default, they are disabled as they are only needed in the TAP test of injection_points for the stats. - 0003: Update the SLRU test to use INJECTION_POINT_LOAD and INJECTION_POINT_CACHED with injection_points loaded via shared_preload_libraries, removing the call to injection_points_load() in the perl test. What do you think? -- Michael From 48a3b8b469ec52d184999d2617134ec680db378f Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 21 Aug 2024 15:12:08 +0900 Subject: [PATCH 1/3] injection_points: Add initialization of shmem state at loading This commits adds callbacks to initialize the shared memory state of the module when loaded with shared_preload_libraries. This is necessary for a upcoming change for a SLRU test, that relies on a critical section where no allocation. Initializing the shared memory state of the module while loading provides a control on the timing where this data is set up. --- .../injection_points/injection_points.c | 65 ++- 1 file changed, 62 insertions(+), 3 deletions(-) diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c index 4e775c7ec6..8d83d8c401 100644 --- a/src/test/modules/injection_points/injection_points.c +++ b/src/test/modules/injection_points/injection_points.c @@ -68,7 +68,12 @@ typedef struct InjectionPointCondition */ static List *inj_list_local = NIL; -/* Shared state information for injection points. */ +/* + * Shared state information for injection points. + * + * This state data can be initialized in two ways: dynamically with a DSM + * or when loading the module. + */ typedef struct InjectionPointSharedState { /* Protects access to other fields */ @@ -97,8 +102,13 @@ extern PGDLLEXPORT void injection_wait(const char *name, /* track if injection points attached in this process are linked to it */ static bool injection_point_local = false; +/* Shared memory init callbacks */ +static shmem_request_hook_type prev_shmem_request_hook = NULL; +static shmem_startup_hook_type prev_shmem_startup_hook = NULL; + /* - * Callback for shared memory area initialization. + * Routine for shared memory area initialization, used as a callback + * when initializing dynamically with a DSM or when loading the module. */ static void injection_point_init_state(void *ptr) @@ -111,8 +121,51 @@ injection_point_init_state(void *ptr) ConditionVariableInit(&state->wait_point); } +/* Shared memory initialization when loading module */ +static void +injection_shmem_request(void) +{ + Size size; + + if (prev_shmem_request_hook) + prev_shmem_request_hook(); + + size = MAXALIGN(sizeof(InjectionPointSharedState)); + RequestAddinShmemSpace(size); +} + +static void +injection_shmem_startup(void) +{ + bool found; + + if (prev_shmem_startup_hook) + prev_shmem_startup_hook(); + + /* reset in case this is a restart within the postmaster */ + inj_state = NULL; + + /* Create or attach to the shared memory state */ + LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE); + + inj_state = ShmemInitStruct("injection_points", +sizeof(InjectionPointSharedState), +&found); + + if (!found) + { + /* + * First time through, so initialize. This is shared with the + * dynamic initialization using a DSM. + */ + injection_point_init_state(inj_state); + } + + LWLockRelease(AddinShmemInitLock); +} + /* - * Initialize shared memory area for this module. + * Initialize shared memory area for this module through DSM. */ static void injection_init_shmem(void) @@ -463,6 +516,12 @@ _PG_init(void) if (!process_shared_preload_libraries_in_progress) return; + /* Shared memory initialization */ + prev_shmem_request_hook = shmem_request_hook; + shmem_request_hook = injection_shmem_request; + prev_shmem_startup_hook = shmem_startup_hook; + shmem_startup
Re: configure failures on chipmunk
On 21/08/2024 01:42, Thomas Munro wrote: On Wed, Aug 21, 2024 at 9:48 AM Alvaro Herrera wrote: Hi. I noticed chipmunk is failing in configure: checking whether the C compiler works... no configure: error: in `/home/pgbfarm/buildroot/HEAD/pgsql.build': configure: error: C compiler cannot create executables One of the runs shows: ./configure: line 4202: 28268 Bus error $CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext 1>&5 Yeah, I've been slowly upgrading it to a new debian/raspbian version, and "ccache" started segfaulting, not sure why. I compiled ccache from sources, and now it seems to work when I run it on its own. Not sure why the buildfarm run still failed. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Conflict detection and logging in logical replication
Here are some review comments for the v19-0001 docs patch. The content seemed reasonable, but IMO it should be presented quite differently. 1. Use sub-sections I expect this logical replication "Conflicts" section is going to evolve into something much bigger. Surely, it's not going to be one humongous page of details, so it will be a section with lots of subsections like all the other in Chapter 29. IMO, you should be writing the docs in that kind of structure from the beginning. For example, I'm thinking something like below (this is just an example - surely lots more subsections will be needed for this topic): 29.6 Conflicts 29.6.1. Conflict types 29.6.2. Logging format 29.6.3. Examples Specifically, this v19-0001 patch information should be put into a subsection like the 29.6.2 shown above. ~~~ 2. Markup + +LOG: conflict detected on relation "schemaname.tablename": conflict=conflict_type +DETAIL: detailed explaination. +Key (column_name, ...)=(column_name, ...); existing local tuple (column_name, ...)=(column_name, ...); remote tuple (column_name, ...)=(column_name, ...); replica identity (column_name, ...)=(column_name, ...). + IMO this should be using markup more like the SQL syntax references. - e.g. I suggest instead of - e.g. I suggest all the substitution parameters (e.g. detailed explanation, conflict_type, column_name, ...) in the log should use and use those markups again later in these docs instead of ~ nit - typo /explaination/explanation/ ~ nit - The amount of scrolling needed makes this LOG format too hard to see. Try to wrap it better so it can fit without being so wide. ~~~ 3. Restructure the list + + I suggest restructuring all this to use a nested list like: LOG - conflict_type DETAIL - detailed_explanation - key - existing_local_tuple - remote_tuple - replica_identity Doing this means you can remove a great deal of the unnecessary junk words like "of the first sentence in the DETAIL", and "sentence of the DETAIL line" etc. The result will be much less text but much simpler text too. == Kind Regards, Peter Smith. Fujitsu Australia
The json_table function returns an incorrect column type
When testing the json_table function, it was discovered that specifying FORMAT JSON in the column definition clause and applying this column to the JSON_OBJECT function results in an output that differs from Oracle's output. The sql statement is as follows: SELECT JSON_OBJECT('config' VALUE config) FROM JSON_TABLE( '[{"type":1, "order":1, "config":{"empno":1001, "ename":"Smith", "job":"CLERK", "sal":1000}}]', '$[*]' COLUMNS ( config varchar(100) FORMAT JSON PATH '$.config' ) ); The execution results of postgresql are as follows: json_object --- {"config" : "{\"job\": \"CLERK\", \"sal\": 1000, \"empno\": 1001, \"ename\": \"Smith\"}"} (1 row) The execution results of oracle are as follows: JSON_OBJECT('CONFIG'VALUECONFIG) - {"config":{"empno":1001,"ename":"Smith","job":"CLERK","sal":1000}} 1 row selected. Elapsed: 00:00:00.00 In PostgreSQL, the return value of the json_table function is treated as plain text, and double quotes are escaped with a backslash. In Oracle, the return value of the json_table function is treated as a JSON document, and the double quotes within it are not escaped with a backslash. Based on the above observation, if the FORMAT JSON option is specified in the column definition clause of the json_table function, the return type should be JSON, rather than a specified type like VARCHAR(100).
Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns
On Mon, Aug 12, 2024 at 11:01 AM John Naylor wrote: > > Extra explicitness might be good. "Response time" seems like a > networking concept, so possibly ", in turn slowing down INSERT/UPDATE > statements." I'm inclined to commit that way in a couple days, barring > further comments. This is done.
Re: Incremental View Maintenance, take 2
On Tue, 20 Aug 2024 at 02:14, Kirill Reshke wrote: > > > == Major suggestions. > > 1) At first glance, working with this IVM/IMMV infrastructure feels > really unintuitive about what servers actually do for query execution. > I do think It will be much better for user experience to add more > EXPLAIN about IVM work done inside IVM triggers. This way it is much > clearer which part is working slow, so which index should be created, > etc. > > 2) The kernel code for IVM lacks possibility to be extended for > further IVM optimizations. The one example is foreign key optimization > described here[1]. I'm not saying we should implement this within this > patchset, but we surely should pave the way for this. I don't have any > good suggestions for how to do this though. > > 3) I don't really think SQL design is good. CREATE [INCREMENTAL] M.V. > is too ad-hoc. I would prefer CREATE M.V. with (maintain_incr=true). > (reloption name is just an example). > This way we can change regular M.V. to IVM and vice versa via ALTER > M.V. SET *reloptions* - a type of syntax that is already present in > PostgreSQL core. > One little follow-up here. Why do we do prepstate visibility the way it is done? Can we instead export the snapshot in BEFORE trigger, save it somewhere and use it after? -- Best regards, Kirill Reshke
Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
On Mon, 19 Aug 2024 at 19:10, Tom Lane wrote: > src/test/recovery/README points out that > > Also, to use "make installcheck", you must have built and installed > contrib/pg_prewarm, contrib/pg_stat_statements and contrib/test_decoding > in addition to the core code. > > I suspect this needs some additional verbiage about also installing > src/test/modules/injection_points if you've enabled injection points. > > (I think we haven't noticed because most people just use "make check" > instead.) > OK, many thanks for a comprehensive explanation! -- Best regards, Maxim Orlov.
Re: Doc limitation update proposal: include out-of-line OID usage per TOAST-ed columns
On Tue, Aug 20, 2024 at 9:03 AM John Naylor wrote: > This is done. Cool! Thanks John and Robert! :) -J.
Re: GUC names in messages
CFBot reported some failures, so I have attached the rebased patch set v9*. I'm hopeful the majority of these might be pushed to avoid more rebasing... == Kind Regards, Peter Smith. Fujitsu Australia v9-0001-Add-quotes-for-GUCs-bool.patch Description: Binary data v9-0005-Add-quotes-for-GUCs-enum.patch Description: Binary data v9-0003-Add-quotes-for-GUCs-real.patch Description: Binary data v9-0004-Add-quotes-for-GUCs-string.patch Description: Binary data v9-0002-Add-quotes-for-GUCs-int.patch Description: Binary data v9-0008-GUC-names-make-common-translatable-messages.patch Description: Binary data v9-0006-GUC-names-fix-case-intervalstyle.patch Description: Binary data v9-0007-GUC-names-fix-case-datestyle.patch Description: Binary data
Re: ANALYZE ONLY
On Tue, 20 Aug 2024 at 07:52, Michael Harris wrote: > 1. Would such a feature be welcomed? Are there any traps I might not > have thought of? I think this sounds like a reasonable feature. > 2. The existing ANALYZE command has the following structure: > > ANALYZE [ ( option [, ...] ) ] [ table_and_columns [, ...] ] > > It would be easiest to add ONLY as another option, but that > doesn't look quite > right to me - surely the ONLY should be attached to the table name? > An alternative would be: > > ANALYZE [ ( option [, ...] ) ] [ONLY] [ table_and_columns [, ...] ] > > Any feedback or advice would be great. Personally I'd prefer a new option to be added. But I agree ONLY isn't a good name then. Maybe something like SKIP_PARTITIONS.
Re: Provide a pg_truncate_freespacemap function
Le dimanche 21 juillet 2024, 07:39:13 UTC+2 Tom Lane a écrit : > Fujii Masao writes: > >> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit : > >>> I agree that this would generally be a useful thing to have. > Sorry for the late reply, as I was not available during the late summer. > Personally, I want to push back on whether this has any legitimate > use-case. Even if the FSM is corrupt, it should self-heal over > time, and I'm not seeing the argument why truncating it would > speed convergence towards correct values. Worse, in the interim > where you don't have any FSM, you will suffer table bloat because > insertions will be appended at the end of the table. So this > looks like a foot-gun, and the patch's lack of user-visible > documentation surely does nothing to make it safer. > (The analogy to pg_truncate_visibility_map seems forced. > If you are in a situation with a trashed visibility map, > you are probably getting wrong query answers, and truncating > the map will make that better. But a trashed FSM doesn't > result in incorrect output, and zeroing it will make things > worse not better.) Now that the other patch for self-healing is in, I agree it may not be that useful. I'm withdrawing the patch and will keep it in mind if we encounter other FSM issues in the future. Best regards, -- Ronan Dunklau
Re: Partial aggregates pushdown
On Thu, 15 Aug 2024 at 23:12, Bruce Momjian wrote: > Third, I would like to show a more specific example to clarify what is > being considered above. If we look at MAX(), we can have FDWs return > the max for each FDW, and the coordinator can chose the highest value. > This is the patch 1 listed above. These can return the > pg_aggregate.aggtranstype data type using the pg_type.typoutput text > output. > > The second case is for something like AVG(), which must return the SUM() > and COUNT(), and we currently have no way to return multiple text values > on the wire. For patch 0002, we have the option of creating functions > that can do this and record them in new pg_attribute columns, or we can > create a data type with these functions, and assign the data type to > pg_aggregate.aggtranstype. > > Is that accurate? It's close to accurate, but not entirely. Patch 1 would actually solves some AVG cases too, because some AVG implementations use an SQL array type to store the transtype instead of an internal type. And by using an SQL array type we *can* send multiple text values on the wire. See below for a list of those aggregates: > select p.oid::regprocedure from pg_aggregate a join pg_proc p on a.aggfnoid = p.oid where aggfinalfn != 0 and aggtranstype::regtype not in ('internal', 'anyenum', 'anyelement', 'anyrange', 'anyarray', 'anymultirange'); oid ─── avg(integer) avg(smallint) avg(real) avg(double precision) avg(interval) var_pop(real) var_pop(double precision) var_samp(real) var_samp(double precision) variance(real) variance(double precision) stddev_pop(real) stddev_pop(double precision) stddev_samp(real) stddev_samp(double precision) stddev(real) stddev(double precision) regr_sxx(double precision,double precision) regr_syy(double precision,double precision) regr_sxy(double precision,double precision) regr_avgx(double precision,double precision) regr_avgy(double precision,double precision) regr_r2(double precision,double precision) regr_slope(double precision,double precision) regr_intercept(double precision,double precision) covar_pop(double precision,double precision) covar_samp(double precision,double precision) corr(double precision,double precision) (28 rows) And to be clear, these are in addition to the MAX type of aggregates you were describing: > select p.oid::regprocedure from pg_aggregate a join pg_proc p on a.aggfnoid = p.oid where aggfinalfn = 0 and aggtranstype::regtype not in ('internal', 'anyenum', 'anyelement', 'anyrange', 'anyarray', 'anymultirange'); oid ─── sum(integer) sum(smallint) sum(real) sum(double precision) sum(money) sum(interval) max(bigint) max(integer) max(smallint) max(oid) max(real) max(double precision) max(date) max(time without time zone) max(time with time zone) max(money) max(timestamp without time zone) max(timestamp with time zone) max(interval) max(text) max(numeric) max(character) max(tid) max(inet) max(pg_lsn) max(xid8) min(bigint) min(integer) min(smallint) min(oid) min(real) min(double precision) min(date) min(time without time zone) min(time with time zone) min(money) min(timestamp without time zone) min(timestamp with time zone) min(interval) min(text) min(numeric) min(character) min(tid) min(inet) min(pg_lsn) min(xid8) count("any") count() regr_count(double precision,double precision) bool_and(boolean) bool_or(boolean) every(boolean) bit_and(smallint) bit_or(smallint) bit_xor(smallint) bit_and(integer) bit_or(integer) bit_xor(integer) bit_and(bigint) bit_or(bigint) bit_xor(bigint) bit_and(bit) bit_or(bit) bit_xor(bit) xmlagg(xml) (65 rows)
Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails
On Wed, 7 Aug 2024 18:50:10 -0400 Alvaro Herrera wrote: > On 2024-Jul-26, Tender Wang wrote: > > > Junwang Zhao 于2024年7月26日周五 14:57写道: > > > > > There is a bug report[0] Tender comments might be the same issue as > > > this one, but I tried Alvaro's and mine patch, neither could solve > > > that problem, I did not tried Tender's earlier patch thought. I post > > > the test script below in case you are interested. > > > > My earlier patch should handle Alexander reported case. But I did not > > do more test. I'm not sure that wether or not has dismatching between > > pg_constraint and pg_trigger. > > > > I aggred with Alvaro said that "requires a much more invasive > > solution". > > Here's the patch which, as far as I can tell, fixes all the reported > problems (other than the one in bug 18541, for which I proposed an > unrelated fix in that thread[1]). If you can double-check, I would very > much appreciate that. Also, I think the test cases the patch adds > reflect the provided examples sufficiently, but if we're still failing > to cover some, please let me know. I'm back on this issue as well. I start poking at this patch to review it, test it, challenge it and then report here. I'll try to check if some other issues might have lost/forgot on they way as well. In the meantime, thank you Alvaro, Tender and Junwang for your work, time, research and patches!