Re: Little cleanup of ShmemInit function names
On 29/08/2024 04:06, Alvaro Herrera wrote: I have had many a chance to visit english.stackexchange.net on account of something I read in pgsql lists or code comments. I see what you did there :-). Committed, with "which see". Thanks everyone! -- Heikki Linnakangas Neon (https://neon.tech)
Re: Disallow USING clause when altering type of generated column
On 22.08.24 10:49, Peter Eisentraut wrote: On 22.08.24 09:59, Yugo NAGATA wrote: Although ERRCODE_INVALID_TABLE_DEFINITION is used for en error on changing type of inherited column, I guess that is because it prevents from breaking consistency between inherited and inheriting tables as a result of the command. In this sense, maybe, ERRCODE_INVALID_COLUMN_DEFINITION is proper here, because this check is to prevent inconsistency between columns in a tuple. Yes, that was my thinking. I think of ERRCODE_FEATURE_NOT_SUPPORTED as "we could add it in the future", but that does not seem to apply here. + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("cannot specify USING when altering type of generated column"), + errdetail("Column \"%s\" is a generated column.", colName))); Do you thnik ERRCODE_INVALID_TABLE_DEFINITION is more proper than ERRCODE_INVALID_COLUMN_DEFINITION in this case? COLUMN seems better here. Committed and backpatched, with that adjustment.
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
On Wed, Aug 28, 2024 at 3:14 AM Jeff Davis wrote: > > On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote: > > 0001 implementation issues: > > > > * We need default implementations for AMs that don't implement the > > new > > APIs, so that the AM will still function even if it only defines the > > single-tuple APIs. If we need to make use of the AM's multi_insert > > method (I'm not sure we do), then the default methods would need to > > handle that as well. (I thought a previous version had these default > > implementations -- is there a reason they were removed?) > > On second thought, it would be easier to just have the caller check > whether the AM supports the multi-insert path; and if not, fall back to > the single-tuple path. The single-tuple path is needed anyway for cases > like before-row triggers. Up until v21, the default implementation existed, see https://www.postgresql.org/message-id/CALj2ACX90L5Mb5Vv%3DjsvhOdZ8BVsfpZf-CdCGhtm2N%2BbGUCSjg%40mail.gmail.com. I then removed it in v22 to keep the code simple. IMO, every caller branching out in the code like if (rel->rd_tableam-> tuple_modify_buffer_insert != NULL) then multi insert; else single insert; doesn't look good. IMO, the default implementation approach keeps things simple which eventually can be removed in *near* future. Thoughts? One change in the default implementation I would do from that of v21 is to assign the default AMs in GetTableAmRoutine() itself to avoid if .. else if .. else in the table_modify_XXX(). -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Redux: Throttle WAL inserts before commit
On Tue, Aug 27, 2024 at 12:51 PM Shirisha Shirisha wrote: > > Hello hackers, > > This is an attempt to resurrect the thread [1] to throttle WAL inserts > before the point of commit. > > Background: > > Transactions on commit, wait for replication and make sure WAL is > flushed up to commit lsn on standby, when synchronous_commit is on. Hi Shirisha, Just to let you know, there was a more recent attempt at that in [1] in Jan 2023 , also with a resurrection attempt there in Nov 2023 by Tomas. Those patches there seemed to have received plenty of attention back then and were also based on SyncRepWaitForLSN(), but somehow maybe we ran out of steam and there was not that big interest back then. Maybe you could post a review there (for Tomas's more modern recent patch), if it is helping your use case even today. That way it could get some traction again? -Jakub Wartak.
Re: type cache cleanup improvements
On Mon, Aug 26, 2024 at 11:26 AM Alexander Korotkov wrote: > > On Mon, Aug 26, 2024 at 9:37 AM Andrei Lepikhov wrote: > > On 25/8/2024 23:22, Alexander Korotkov wrote: > > > On Sun, Aug 25, 2024 at 10:21 PM Alexander Korotkov > > >>> (This Assert is introduced by c14d4acb8.) > > >> > > >> Thank you for noticing. I'm checking this. > > > > > > I didn't take into account that TypeCacheEntry could be invalidated > > > while lookup_type_cache() does syscache lookups. When I realized that > > > I was curious on how does it currently work. It appears that type > > > cache invalidation mostly only clears the flags while values are > > > remaining in place and still available for lookup_type_cache() caller. > > > TypeCacheEntry.tupDesc is invalidated directly, and it has guarantee > > > to survive only because we don't do any syscache lookups for composite > > > data types later in lookup_type_cache(). I'm becoming less fan of how > > > this works... I think these aspects needs to be at least documented > > > in details. > > > > > > Regarding c14d4acb8, it appears to require redesign. I'm going to revert > > > it. > > Sorry, but I don't understand your point. > > Let's refocus on the problem at hand. The issue arose when the > > TypeCacheTypCallback and the TypeCacheRelCallback were executed in > > sequence within InvalidateSystemCachesExtended. > > The first callback cleaned the flags TCFLAGS_HAVE_PG_TYPE_DATA and > > TCFLAGS_CHECKED_DOMAIN_CONSTRAINTS. But the call of the second callback > > checks the typentry->tupDesc and, because it wasn't NULL, attempted to > > remove this record a second time. > > I think there is no case for redesign, but we have a mess in > > insertion/deletion conditions. > > Yes, it's possible to repair the current approach. But we need to do > this correct, not just "not failing with current usages". Then we > need to call insert_rel_type_cache_if_needed() not just when we set > TCFLAGS_HAVE_PG_TYPE_DATA flag, but every time we set any of > TCFLAGS_OPERATOR_FLAGS or tupDesc. That's a lot of places, not as > simple and elegant as it was planned. This is why I wonder if there > is a better approach. > > Secondly, I'm not terribly happy with current state of type cache. > The caller of lookup_type_cache() might get already invalidated data. > This probably OK, because caller probably hold locks on dependent > objects to guarantee that relevant properties of type actually > persists. At very least this should be documented, but it doesn't > seem so. Setting of tupdesc is sensitive to its order of execution. > That feels quite fragile to me, and not documented either. I think > this area needs improvements before we push additional functionality > there. I see fdd965d074 added a proper handling for concurrent invalidation for relation cache. If a concurrent invalidation occurs, we retry building a relation descriptor. Thus, we end up with returning of a valid relation descriptor to caller. I wonder if we can take the same approach to type cache. That would make the whole type cache more consistent and less fragile. Also, this patch will be simpler. -- Regards, Alexander Korotkov Supabase
Re: Add contrib/pg_logicalsnapinspect
On Wed, Aug 28, 2024 at 1:25 AM Bertrand Drouvot wrote: > > On Mon, Aug 26, 2024 at 07:05:27PM +0530, Amit Kapila wrote: > > On Thu, Aug 22, 2024 at 5:56 PM Bertrand Drouvot > > wrote: > > > > > > 2. The SnapBuildOnDisk and SnapBuild structs are now exposed to public. > > > Means > > > we should now pay much more attention when changing their contents but I > > > think > > > it's worth it. > > > > > > > Is it possible to avoid exposing these structures? Can we expose some > > function from snapbuild.c that provides the required information? > > Yeah, that's an option if we don't want to expose those structs to public. > > I think we could 1/ create a function that would return a formed HeapTuple, or > 2/ we could create multiple functions (about 15) that would return the values > we are interested in. > > I think 2/ is fine as it would give more flexiblity (no need to retrieve a > whole > tuple if one is interested to only one value). > True, but OTOH, each time we add a new field to these structures, a new function has to be exposed. I don't have a strong opinion on this but seeing current use cases, it seems okay to expose a single function. > What do you think? Did you have something else in mind? > On similar lines, we can also provide a function to get the slot's on-disk data. IIRC, Bharath had previously proposed a tool to achieve the same. It is fine if we don't want to add that as part of this patch but I mentioned it because by having that we can have a set of functions to view logical decoding data. -- With Regards, Amit Kapila.
Re: Allow logical failover slots to wait on synchronous replication
On Thu, Aug 29, 2024 at 2:31 AM John H wrote: > > Hi Amit, > > On Mon, Aug 26, 2024 at 11:00 PM Amit Kapila wrote: > > I wanted a simple test where in the first case you use > > synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' and in the second case > > use standby_slot_names = A_slot, B_slot, C_slot, D_slot, E_slot. You > > can try some variations of it as well. The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > ... > > What is the difference between "Test failover_slots with > > synchronized_standby_slots = 'rr_1, rr_2, > > > rr_3, rr_4, rr_5'" and "Test failover_slots waiting on sync_rep no new > > > shared cache"? I want to know what configuration did you used for > > > synchronous_standby_names in the latter case. > > Sorry for the confusion due to the bad-naming of the test cases, let > me rephrase. > All three tests had synchronous_standby_names = 'ANY 3 (A,B,C,D,E)' > set with synchronous_commit = 'on', and failover_slots = 'on' > for the 10 logical slots. > > # Test failover_slots with synchronized_standby_slots = 'rr_1, rr_2, > rr_3, rr_4, rr_5' > This is the test you wanted where the logical clients are waiting on > all 5 slots to acknowledge the change since > 'synchronized_standby_slots' takes priority when set. > > # Test failover_slots sync rep no cache > This test has 'synchronized_standby_slots' commented out, and without > relying on the new cache introduced in 0003. > Logical clients will wait on synchronous_standby_names in this case. > > # Test failover slots with additional shared cache > This test also has 'synchronized_standby_slots' commented out, and > logical clients will wait on the LSNs > reported from synchronous_standby_names but it relies on a new cache > to reduce contention on SyncRepLock. > > > The idea is that even if the > > performance is less for synchronous_standby_names configuration, we > > should be able to document it. This will help users to decide what is > > best for them. > > Makes sense. > > > I am also not sure especially as the test results didn't shown much > > improvement and the code also becomes bit complicated. BTW, in the > > 0003 version in the below code: > > That's fair, I've updated to be more in line with 0002. > > > + /* Cache values to reduce contention */ > > + LWLockAcquire(SyncRepLock, LW_SHARED); > > + memcpy((XLogRecPtr *) walsndctl->cached_lsn, (XLogRecPtr *) > > walsndctl->lsn, sizeof(lsn)); > > + LWLockRelease(SyncRepLock); > > > > Which mode lsn is being copied? I am not sure if I understood this > > part of the code. > > All of the mode LSNs are being copied in case SyncRepWaitMode changes in > the next iteration. I've removed that part but kept: > > > + memcpy(lsn, (XLogRecPtr *) walsndctl->lsn, sizeof(lsn)); > > as suggested by Bertrand to avoid the for loop updating values one-by-one. > > Here's what's logged after the memcpy: > > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[0] after memcpy is: > 279/752C7FF0 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[1] after memcpy is: > 279/752C7F20 > 2024-08-28 19:41:13.798 UTC [1160413] LOG: lsn[2] after memcpy is: > 279/752C7F20 > > > In the 0002 version, in the following code [1], you are referring to > > LSN mode which is enabled for logical walsender irrespective of the > > mode used by the physical walsender. It is possible that they are > > always the same but that is not evident from the code or comments in > > the patch. > > They are almost always the same, I tried to indicate that with the > following comment in the patch, but I could make it more explicit? > > /* Initialize value in case SIGHUP changing to SYNC_REP_NO_WAIT */ > > At the beginning we set > > > int mode = SyncRepWaitMode; > > At this time, the logical walsender mode it's checking against is the > same as what the physical walsenders are using. > It's possible that this mode is no longer the same when we execute the > following check: > > > if (lsn[mode] >= wait_for_lsn) > > because of a SIGHUP to synchronous_commit that changes SyncRepWaitMode > to some other value > > We cache the value instead of > > if (lsn[SyncRepWaitMode] >= wait_for_lsn) > > because SYNC_REP_NO_WAIT is -1. If SyncRepWaitMode is set to this it > leads to out of bounds access. > > I've attached a new patch that removes the shared cache introduced in 0003. > Thanks for the patch. Few comments and queries: 1) + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; We shall name it as 'lsns' as there are multiple. 2) + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } Can we do it like below similar to what you have done at another place: memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); 3) + if (!initialized) + { + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) + { + lsn[i] = InvalidXLogRecPtr; + } + } I do not see 'initialized' set to TRUE anywhere. Can you please elabo
Make printtup a bit faster
Usually I see printtup in the perf-report with a noticeable ratio. Take "SELECT * FROM pg_class" for example, we can see: 85.65% 3.25% postgres postgres [.] printtup The high level design of printtup is: 1. Used a pre-allocated StringInfo DR_printtup.buf to store data for each tuples. 2. for each datum in the tuple, it calls the type-specific out function and get a cstring. 3. after get the cstring, we figure out the "len" and add both len and 'data' into DR_printtup.buf. 4. after all the datums are handled, socket_putmessage copies them into PqSendBuffer. 5. When the usage of PgSendBuffer is up to PqSendBufferSize, using send syscall to sent them into client (by copying the data from userspace to kernel space again). Part of the slowness is caused by "memcpy", "strlen" and palloc in outfunction. 8.35% 8.35% postgres libc.so.6 [.] __strlen_avx2 4.27% 0.00% postgres libc.so.6 [.] __memcpy_avx_unaligned_erms 3.93% 3.93% postgres postgres [.] palloc (part of them caused by out function) 5.70% 5.70% postgres postgres [.] AllocSetAlloc (part of them caused by printtup.) My high level proposal is define a type specific print function like: oidprint(Datum datum, StringInfo buf) textprint(Datum datum, StringInfo buf) This function should append both data and len into buf directly. for the oidprint case, we can avoid: 5. the dedicate palloc in oid function. 6. the memcpy from the above memory into DR_printtup.buf for the textprint case, we can avoid 7. strlen, since we can figure out the length from varlena.vl_len int2/4/8/timestamp/date/time are similar with oid. and numeric, varchar are similar with text. This almost covers all the common used type. Hard coding the relationship between common used type and {type}print function OID looks not cool, Adding a new attribute in pg_type looks too aggressive however. Anyway this is the next topic to talk about. If a type's print function is not defined, we can still using the out function (and PrinttupAttrInfo caches FmgrInfo rather than FunctionCallInfo, so there is some optimization in this step as well). This proposal covers the step 2 & 3. If we can do something more aggressively, we can let the xxxprint print to PqSendBuffer directly, but this is more complex and need some infrastructure changes. the memcpy in step 4 is: "1.27% __memcpy_avx_unaligned_erms" in my above case. What do you think? -- Best Regards Andy Fan
Re: Add contrib/pg_logicalsnapinspect
Hi, On Thu, Aug 29, 2024 at 02:51:36PM +0530, Amit Kapila wrote: > On Wed, Aug 28, 2024 at 1:25 AM Bertrand Drouvot > wrote: > > > > On Mon, Aug 26, 2024 at 07:05:27PM +0530, Amit Kapila wrote: > > > On Thu, Aug 22, 2024 at 5:56 PM Bertrand Drouvot > > > wrote: > > > > > > > > 2. The SnapBuildOnDisk and SnapBuild structs are now exposed to public. > > > > Means > > > > we should now pay much more attention when changing their contents but > > > > I think > > > > it's worth it. > > > > > > > > > > Is it possible to avoid exposing these structures? Can we expose some > > > function from snapbuild.c that provides the required information? > > > > Yeah, that's an option if we don't want to expose those structs to public. > > > > I think we could 1/ create a function that would return a formed HeapTuple, > > or > > 2/ we could create multiple functions (about 15) that would return the > > values > > we are interested in. > > > > I think 2/ is fine as it would give more flexiblity (no need to retrieve a > > whole > > tuple if one is interested to only one value). > > > > True, but OTOH, each time we add a new field to these structures, a > new function has to be exposed. I don't have a strong opinion on this > but seeing current use cases, it seems okay to expose a single > function. Yeah that's fair. And now I'm wondering if we need an extra module. I think we could "simply" expose 2 new functions in core, thoughts? > > What do you think? Did you have something else in mind? > > > > On similar lines, we can also provide a function to get the slot's > on-disk data. Yeah, having a way to expose the data from the disk makes fully sense to me. > IIRC, Bharath had previously proposed a tool to achieve > the same. It is fine if we don't want to add that as part of this > patch but I mentioned it because by having that we can have a set of > functions to view logical decoding data. > That's right. I think this one would be simply enough to expose one or two functions in core too (and probably would not need an extra module). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: PG_TEST_EXTRA and meson
On Wed, Aug 28, 2024 at 8:46 PM Nazir Bilal Yavuz wrote: > > Also, I think TEST 110 and 170 do not look correct to me. In the > current way, we do not pass PG_TEST_EXTRA to the make command. > > 110 should be: > 'cd $XID_MODULE_DIR && PG_TEST_EXTRA=xid_wraparound make check' > instead of 'PG_TEST_EXTRA=xid_wraparound cd $XID_MODULE_DIR && make > check' > > 170 should be: > 'cd $XID_MODULE_DIR && PG_TEST_EXTRA="" make check && cd $PGDir' > instead of 'PG_TEST_EXTRA="" cd $XID_MODULE_DIR && make check && cd > $PGDir' > You are right. Jacob did point that out, but I didn't fix all the places back then. Here's updated script. -- Best Wishes, Ashutosh Bapat test_pg_test_extra.sh Description: application/shellscript
Re: Conflict Detection and Resolution
On Fri, Aug 23, 2024 at 10:39 AM shveta malik wrote: > > Please find issues which need some thoughts and approval for > time-based resolution and clock-skew. > > 1) > Time based conflict resolution and two phase transactions: > > Time based conflict resolution (last_update_wins) is the one > resolution which will not result in data-divergence considering > clock-skew is taken care of. But when it comes to two-phase > transactions, it might not be the case. For two-phase transaction, we > do not have commit timestamp when the changes are being applied. Thus > for time-based comparison, initially it was decided to user prepare > timestamp but it may result in data-divergence. Please see the > example at [1]. > > Example at [1] is a tricky situation, and thus in the initial draft, > we decided to restrict usage of 2pc and CDR together. The plan is: > > a) During Create subscription, if the user has given last_update_wins > resolver for any conflict_type and 'two_phase' is also enabled, we > ERROR out. > b) During Alter subscription, if the user tries to update resolver to > 'last_update_wins' but 'two_phase' is enabled, we error out. > > Another solution could be to save both prepare_ts and commit_ts. And > when any txn comes for conflict resolution, we first check if > prepare_ts is available, use that else use commit_ts. Availability of > prepare_ts would indicate it was a prepared txn and thus even if it is > committed, we should use prepare_ts for comparison for consistency. > This will have some overhead of storing prepare_ts along with > commit_ts. But if the number of prepared txns are reasonably small, > this overhead should be less. > Yet another idea is that if the conflict is detected and the resolution strategy is last_update_wins then from that point we start writing all the changes to the file similar to what we do for streaming mode and only once commit_prepared arrives, we will read and apply changes. That will solve this problem. > We currently plan to go with restricting 2pc and last_update_wins > together, unless others have different opinions. > Sounds reasonable but we should add comments on the possible solution like the one I have mentioned so that we can extend it afterwards. > ~~ > > 2) > parallel apply worker and conflict-resolution: > As discussed in [2] (see last paragraph in [2]), for streaming of > in-progress transactions by parallel worker, we do not have > commit-timestamp with each change and thus it makes sense to disable > parallel apply worker with CDR. The plan is to not start parallel > apply worker if 'last_update_wins' is configured for any > conflict_type. > The other idea is that we can let the changes written to file if any conflict is detected and then at commit time let the remaining changes be applied by apply worker. This can introduce some complexity, so similar to two_pc we can extend this functionality later. > ~~ > > 3) > parallel apply worker and clock skew management: > Regarding clock-skew management as discussed in [3], we will wait for > the local clock to come within tolerable range during 'begin' rather > than before 'commit'. And this wait needs commit-timestamp in the > beginning, thus we plan to restrict starting pa-worker even when > clock-skew related GUCs are configured. > > Earlier we had restricted both 2pc and parallel worker worker start > when detect_conflict was enabled, but now since detect_conflict > parameter is removed, we will change the implementation to restrict > all 3 above cases when last_update_wins is configured. When the > changes are done, we will post the patch. > At this stage, we are not sure how we want to deal with clock skew. There is an argument that clock-skew should be handled outside the database, so we can probably have the clock-skew-related stuff in a separate patch. > ~~ > > 4) > > Earlier when 'detect_conflict' was enabled, we were giving WARNING if > 'track_commit_timestamp' was not enabled. This was during CREATE and > ALTER subscription. Now with this parameter removed, this WARNING has > also been removed. But I think we need to bring back this WARNING. > Currently default resolvers set may work without > 'track_commit_timestamp' but when user gives CONFLICT RESOLVER in > create-sub or alter-sub explicitly making them configured to > non-default values (or say any values, does not matter if few are > defaults), we may still emit this warning to alert user: > > 2024-07-26 09:14:03.152 IST [195415] WARNING: conflict detection > could be incomplete due to disabled track_commit_timestamp > 2024-07-26 09:14:03.152 IST [195415] DETAIL: Conflicts update_differ > and delete_differ cannot be detected, and the origin and commit > timestamp for the local row will not be logged. > > Thoughts? > > If we emit this WARNING during each resolution, then it may flood our > log files, thus it seems better to emit it during create or alter > subscription instead of during resolution. > Sounds reasonable. -- W
Re: ANALYZE ONLY
Hi Michael, Michael Harris , 23 Ağu 2024 Cum, 13:01 tarihinde şunu yazdı: > V2 of the patch is attached. > Thanks for updating the patch. I have a few more minor feedbacks. -ANALYZE [ ( option [, ...] ) > ] [ table_and_columns [, ...] ] > +ANALYZE [ ( option [, ...] ) > ] [ [ ONLY ] table_and_columns > [, ...] ] I believe moving "[ ONLY ]" to the definitions of table_and_columns below, as you did with "[ * ]", might be better to be consistent with other places (see [1]) + if ((options & VACOPT_VACUUM) && is_partitioned_table && ! i > nclude_children) There are also some issues with coding conventions in some places (e.g. the space between "!" and "include_children" abode). I think running pgindent would resolve such issue in most places. [1] https://www.postgresql.org/docs/16/sql-createpublication.html Regards, -- Melih Mutlu Microsoft
Re: Make printtup a bit faster
On Thu, 29 Aug 2024 at 21:40, Andy Fan wrote: > > > Usually I see printtup in the perf-report with a noticeable ratio. > Part of the slowness is caused by "memcpy", "strlen" and palloc in > outfunction. Yeah, it's a pretty inefficient API from a performance point of view. > My high level proposal is define a type specific print function like: > > oidprint(Datum datum, StringInfo buf) > textprint(Datum datum, StringInfo buf) I think what we should do instead is make the output functions take a StringInfo and just pass it the StringInfo where we'd like the bytes written. That of course would require rewriting all the output functions for all the built-in types, so not a small task. Extensions make that job harder. I don't think it would be good to force extensions to rewrite their output functions, so perhaps some wrapper function could help us align the APIs for extensions that have not been converted yet. There's a similar problem with input functions not having knowledge of the input length. You only have to look at textin() to see how useful that could be. Fixing that would probably make COPY FROM horrendously faster. Team that up with SIMD for the delimiter char search and COPY go a bit better still. Neil Conway did propose the SIMD part in [1], but it's just not nearly as good as it could be when having to still perform the strlen() calls. I had planned to work on this for PG18, but I'd be happy for some assistance if you're willing. David [1] https://postgr.es/m/caow5syb1hprqkrzjcsrcp1eauqzzy+njz-awbboumogjhjs...@mail.gmail.com
Re: Segfault in jit tuple deforming on arm64 due to LLVM issue
On Wed, Aug 28, 2024 at 12:24 AM Thomas Munro wrote: > 2. I tested against LLVM 10-18, and found that 10 and 11 lack some > needed symbols. So I just hid this code from them. Even though our > stable branches support those and even older versions, I am not sure > if it's worth trying to do something about that for EOL'd distros that > no one has ever complained about. I am willing to try harder if > someone thinks that's important... I would also assume that people using arm64 are more likely to use recent versions than not. I've done some additional tests on different LLVM versions with both the unpatched version (to make sure the crash was triggered) and the patched version. I'm joining the test scripts I've used as reference. They target a kubernetes pod since it was the easiest way for me to get a test ubuntu Jammy: - setup_pod.sh: Install necessary packages, get multiple llvm versions, fetch and compile master and patched version of postgres on different LLVM version - run_test.sh: go through all LLVM versions for both unpatched and patched postgres to run the test_script.sh - test_script.sh: ran inside the pod to setup the db with the necessary tables and check if the crash happens This generated the following output: Test unpatched version on LLVM 19, : Crash triggered Test unpatched version on LLVM 18, libLLVM-18.so.18.1: Crash triggered Test unpatched version on LLVM 17, libLLVM-17.so.1: Crash triggered Test unpatched version on LLVM 16, libLLVM-16.so.1: Crash triggered Test unpatched version on LLVM 15, libLLVM-15.so.1: Crash triggered Test unpatched version on LLVM 14, libLLVM-14.so.1: Crash triggered Test unpatched version on LLVM 13, libLLVM-13.so.1: Crash triggered Test patched version on LLVM 19, : Query ran successfully Test patched version on LLVM 18, libLLVM-18.so.18.1: Query ran successfully Test patched version on LLVM 17, libLLVM-17.so.1: Query ran successfully Test patched version on LLVM 16, libLLVM-16.so.1: Query ran successfully Test patched version on LLVM 15, libLLVM-15.so.1: Query ran successfully Test patched version on LLVM 14, libLLVM-14.so.1: Query ran successfully Test patched version on LLVM 13, libLLVM-13.so.1: Query ran successfully I try to print the libLLVM linked to llvm.jit in the output to double check whether I test on the correct version. The LLVM 19 package only provides static libraries (probably because it's still a release candidate?) so it shows as empty in the output. There was no LLVM 12 available when using the llvm.sh script so I couldn't test it. As for the result, prepatch PG all crashed as expected while the patched version was able to run the query successfully. > Next, I think we should wait to see if the LLVM project commits that > PR, this so that we can sync with their 19.x stable branch, instead of > using code from a PR. Our next minor release is in November, so we > have some time. If they don't commit it, we can consider it anyway: I > mean, it's crashing all over the place in production, and we see that > other projects are shipping this code already. The PR[1] just received an approval and it sounds like they are ok to eventually merge it. [1] https://github.com/llvm/llvm-project/pull/71968 #!/bin/bash # Don't want coredumps ulimit -c 0 # Stop running PG /var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data stop &>> /dev/null # Clean db_data rm -rf /var/lib/postgresql/db_data # Create new dbdata /var/lib/postgresql/.local/bin/initdb -D /var/lib/postgresql/db_data -Apeer > /dev/null echo "port = 5432 shared_buffers='4GB' " > /var/lib/postgresql/db_data/postgresql.conf # Start and setup test partitioned table /var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data start -l /tmp/pg.log > /dev/null /var/lib/postgresql/.local/bin/pgbench -i --partitions=128 2> /dev/null # Run the query to trigger the issue /var/lib/postgresql/.local/bin/psql options=-cjit_above_cost=0 -c 'SELECT count(bid) from pgbench_accounts;' &>> /dev/null exit_code=$? if (( $exit_code == 2 )); then echo "Crash triggered" elif (( $exit_code == 0 )); then echo "Query ran successfully" else echo "Other issue while running the query" fi #!/bin/bash set -eu TEST_POD=$(kubectl get pod -l app="test_pg" -nsre -o=jsonpath='{.items[*].metadata.name}') LLVM_VERSIONS=(19 18 17 16 15 14 13) get_llvm_version() { kubectl exec "$pod" -- bash -c "ldd /var/lib/postgresql/.local/lib/llvmjit.so | grep libLLVM | awk '{print \$1}'" } run_test_on_pod() { local pod="$1" kubectl cp test_script.sh "$pod":/var/lib/postgresql/ for version in ${LLVM_VERSIONS[@]}; do build_dir="~/postgres_build/build_${version}" kubectl exec "$pod" -- bash -c "su postgres -c \" cd $build_dir; make -j19 install &>> /dev/null \" " echo -n "Test unpatched version on LLVM $version, $(get_llvm_version): " kubectl exec "$pod" -- bash -c "su postgres -c \" cd; bash test_script.sh \" " done for version in
Re: Eager aggregation, take 3
On Wed, Aug 28, 2024 at 10:26 PM Richard Guo wrote: > Yeah, I'm concerned about this too. In addition to the inaccuracies > in aggregation estimates, our estimates for joins are sometimes not > very accurate either. All this are likely to result in regressions > with eager aggregation in some cases. Currently I don't have a good > answer to this problem. Maybe we can run some benchmarks first and > investigate the regressions discovered on a case-by-case basis to better > understand the specific issues. While it's true that we can make mistakes during join estimation, I believe aggregate estimation tends to be far worse. -- Robert Haas EDB: http://www.enterprisedb.com
[BUG] Security bugs affected version detected.
Our tool have detected that postgre in the version of REL9_6_18~ REL9_6_24 may also affected by the vulnerability CVE-2022-2625. The vulnerability database does not include these versions and you may not fix it in the REL9_6 branch. Is there a need to backport the patch of CVE-2022-2625?
Re: Streaming read-ready sequential scan code
Hello Thomas, 29.08.2024 01:16, Thomas Munro wrote: Yeah. That's quite interesting, and must destabilise that simple-minded demo. I'm curious to know exactly what contention is causing that (about 3/4 of a millisecond that I don't see and now I want to know what it's waiting for), but it's a very crude test lacking timer resolution in the earlier messages, and it's an unrelated topic and a distraction. Perhaps it explains why you saw two different behaviours in Q15 with the patch and I didn't, though. Really it shouldn't be so sensitive to such variations, it's obviously a terrible plan, and TPC-DS needs a planner hacker mega-brain applied to it; I'm going to try to nerd-snipe one... I looked at two perf profiles of such out-of-sync processes and found no extra calls or whatsoever in the slow one, it just has the number of perf samples increased proportionally. It made me suspect CPU frequency scaling... Indeed, with the "performance" governor set and the boost mode disabled, I'm now seeing much more stable numbers (I do this tuning before running performance tests, but I had forgotten about that when I ran that your test, my bad). I'm sorry for the noise and the distraction. Still, now I can confirm your results. Without the patch, two parallel workers gave "Buffers: shared hit=217 / Buffers: shared hit=226" 10 times out of 10. And with the patch, I've got "shared hit=219 / shared hit=224" 3 times, "shared hit=220 / shared hit=223" 4 times, "shared hit=221 / shared hit=222" 3 times of 10. On b7b0f3f27~1, my results are: "shared hit=219 / shared hit=224": 2 "shared hit=220 / shared hit=223": 3 "shared hit=221 / shared hit=222": 4 "shared hit=218 / shared hit=225": 1 Best regards, Alexander
Re: [BUG] Security bugs affected version detected.
> On 29 Aug 2024, at 14:54, James Watt wrote: > > Our tool have detected that postgre in the version of REL9_6_18~ REL9_6_24 > may also affected by the vulnerability CVE-2022-2625. The vulnerability > database does not include these versions and you may not fix it in the REL9_6 > branch. Is there a need to backport the patch of CVE-2022-2625? 9.6 was EOL at the time of 2022-2625 being announced and thus wasn't considered for a backport of the fix, the project only applies fixes to supported versions. Anyone still running 9.6 in production is highly recommended to upgrade to a supported version. -- Daniel Gustafsson
Re: Eager aggregation, take 3
On Wed, Aug 28, 2024 at 11:38 PM Tender Wang wrote: > I upload EXPLAIN(COSTS ON, ANALYZE) test to [1]. > I ran the same query three times, and I chose the third time result. > You can check 19_off_explain.out and 19_on_explain.out. So, in 19_off_explain.out, we got this: -> Finalize GroupAggregate (cost=666986.48..667015.35 rows=187 width=142) (actual time=272.649..334.318 rows=900 loops=1) -> Gather Merge (cost=666986.48..667010.21 rows=187 width=142) (actual time=272.644..333.847 rows=901 loops=1) -> Partial GroupAggregate (cost=665986.46..665988.60 rows=78 width=142) (actual time=266.379..267.476 rows=300 loops=3) -> Sort (cost=665986.46..665986.65 rows=78 width=116) (actual time=266.367..266.583 rows=5081 loops=3) And in 19_on_explan.out, we got this: -> Finalize GroupAggregate (cost=666987.03..666989.77 rows=19 width=142) (actual time=285.018..357.374 rows=900 loops=1) -> Gather Merge (cost=666987.03..666989.25 rows=19 width=142) (actual time=285.000..352.793 rows=15242 loops=1) -> Sort (cost=665987.01..665987.03 rows=8 width=142) (actual time=273.391..273.580 rows=5081 loops=3) -> Nested Loop (cost=665918.00..665986.89 rows=8 width=142) (actual time=252.667..269.719 rows=5081 loops=3) -> Nested Loop (cost=665917.85..665985.43 rows=8 width=157) (actual time=252.656..264.755 rows=5413 loops=3) -> Partial GroupAggregate (cost=665917.43..665920.10 rows=82 width=150) (actual time=252.643..255.627 rows=5413 loops=3) -> Sort (cost=665917.43..665917.64 rows=82 width=124) (actual time=252.636..252.927 rows=5413 loops=3) So, the patch was expected to cause the number of rows passing through the Gather Merge to decrease from 197 to 19, but actually caused the number of rows passing through the Gather Merge to increase from 901 to 15242. When the PartialAggregate was positioned at the top of the join tree, it reduced the number of rows from 5081 to 300; but when it was pushed down below two joins, it didn't reduce the row count at all, and the subsequent two joins reduced it by less than 10%. Now, you could complain about the fact that the Parallel Hash Join isn't well-estimated here, but my question is: why does the planner think that the PartialAggregate should go specifically here? In both plans, the PartialAggregate isn't expected to change the row count. And if that is true, then it's going to be cheapest to do it at the point where the joins have reduced the row count to the minimum value. Here, that would be at the top of the plan tree, where we have only 5081 estimated rows, but instead, the patch chooses to do it as soon as we have all of the grouping columns, when we. still have 5413 rows. I don't understand why that path wins on cost, unless it's just that the paths compare fuzzily the same, in which case it kind of goes to my earlier point about not really having the statistics to know which way is actually going to be better. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Add contrib/pg_logicalsnapinspect
On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot wrote: > > Yeah that's fair. And now I'm wondering if we need an extra module. I think > we could "simply" expose 2 new functions in core, thoughts? > > > > What do you think? Did you have something else in mind? > > > > > > > On similar lines, we can also provide a function to get the slot's > > on-disk data. > > Yeah, having a way to expose the data from the disk makes fully sense to me. > > > IIRC, Bharath had previously proposed a tool to achieve > > the same. It is fine if we don't want to add that as part of this > > patch but I mentioned it because by having that we can have a set of > > functions to view logical decoding data. > > That's right. I think this one would be simply enough to expose one or two > functions in core too (and probably would not need an extra module). +1 for functions in core unless this extra module pg_logicalsnapinspect works as a tool to be helpful even when the server is down. FWIW, I wrote pg_replslotdata as a tool, not as an extension for reading on-disk replication slot data to help when the server is down - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com. When the server is running, pg_get_replication_slots() pretty much gives the on-disk contents. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Use streaming read API in ANALYZE
On Sat, Aug 24, 2024 at 5:31 AM Thomas Munro wrote: > On Thu, Aug 22, 2024 at 7:31 PM Mats Kindahl wrote: > > The alternate version proposed by Nazir allows you to deide which > interface to use. > > Reverting the patch entirely would also solve the problem. > After digging through the code a little more I discovered that there actually is another one: move the ReadStream struct into read_stream.h. > > Passing down the block sampler and the strategy to scan_begin() and move > the ReadStream setup in analyze.c into initscan() in heapam.c, but this > requires adding new parameters to this function. > > Having accessors that allow you to get the block sampler and strategy > from the ReadStream object. > > I'm a bit confused about how it can make sense to use the same > BlockSampler with a side relation/fork. Could you point me at the > code? > Sorry, that was a bit unclear. Intention was not to re-use the block sampler but to set a new one up with parameters from the original block sampler, which would require access to it. (The strategy is less of a problem since only one is used.) To elaborate on the situation: For the TAM in question we have two different storage areas, both are heaps. Both relations use the same attributes "publicly" (they are internally different, but we transform them to look the same). One of the relations is the "default" one and is stored in rd_rel. In order to run ANALYZE, we need to sample blocks from both relations, in slightly different ways. With the old interface, we faked the number of blocks in relation_size() callback and claimed that there were N + M blocks. When then being asked about a block by block number, we could easily pick the correct relation and just forward the call. With the new ReadStream API, a read-stream is (automatically) set up on the "default" relation, but we can set up a separate read-stream inside the TAM for the other relation. However, the difficulty is in setting it up correctly: We cannot use the "fake number of block"-trick since the read stream does not only compute the block number, but actually tries to read the buffer in the relation provided when setting up the read stream, so a block number outside the range of this relation will not be found since it is in a different relation. If we could create our own read stream with both relations, that could be solved and we could just implement the same logic, but direct it to the correct relations depending on where we want to read the block. Unless I am mistaken, there is already support for this since there is an array of in-progress I/O and it would be trivial to extend this with more relations+forks, if you have access to the structure definition. The ReadStream struct is, however, an opaque struct so it's hard to hack around with it. Just making the struct declaration public would potentially solve a lot of problems here. (See attached patch, which is close to the minimum of what is needed to allow extension writers to tweak the contents.) Since both relations are using the same attributes with the same "analyzability", having that information would be useful to compute the targrows for setting up the additional stream, but it is computed in do_analyze_rel() and not further propagated, so it needs to be re-computed if we want to set up a separate read-stream. > > It would be great if this could be fixed before the PG17 release now > that 27bc1772fc8 was reverted. > > Ack. Thinking... > Right now I think that just making the ReadStream struct available in the header file is the best approach. It is a safe and low-risk fix (so something that can be added to a beta) and will allow extension writers to hack to their hearts' contents. In addition to that, being able to select what interface to use would also help. > Random thought: is there a wiki page or something where we can find > out about all the table AM projects? For the successor to > 27bc1772fc8, I hope they'll be following along. > At this point, unfortunately not, we are quite early in this. Once I have something, I'll share. -- Best wishes, Mats Kindahl, Timescale From ea4bb194e0dcccac8465b3aa13950f721bde3860 Mon Sep 17 00:00:00 2001 From: Mats Kindahl Date: Thu, 29 Aug 2024 10:39:34 +0200 Subject: Make ReadStream struct non-opaque Move the ReadStream struct and two utility functions from read_stream.c to read_stream.h to allow extensions to modify the structure if they need to. --- src/backend/storage/aio/read_stream.c | 59 --- src/include/storage/read_stream.h | 105 ++ 2 files changed, 105 insertions(+), 59 deletions(-) diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index a83c18c2a4..bf2a679037 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -97,65 +97,6 @@ #include "utils/rel.h" #include "utils/spccache.h" -typedef struct InProgressIO -{ - int16 buffer_index; - Read
Re: Switching XLog source from archive to streaming when primary available
Hi, Thanks for looking into this. On Fri, Aug 23, 2024 at 5:03 AM John H wrote: > > For a motivation aspect I can see this being useful > synchronous_replicas if you have commit set to flush mode. In synchronous replication setup, until standby finishes fetching WAL from the archive, the commits on the primary have to wait which can increase the query latency. If the standby can connect to the primary as soon as the broken connection is restored, it can fetch the WAL soon and transaction commits can continue on the primary. Is my understanding correct? Is there anything more to this? I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns about this feature for dealing with changing timelines. I can't think of them right now. And, there were some cautions raised upthread - https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13 and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz. > So +1 on feature, easier configurability, although thinking about it > more you could probably have the restore script be smarter and provide > non-zero exit codes periodically. Interesting. Yes, the restore script has to be smarter to detect the broken connections and distinguish whether the server is performing just the archive recovery/PITR or streaming from standby. Not doing it right, perhaps, can cause data loss (?). > The patch needs to be rebased but I tested this against an older 17 build. Will rebase soon. > > + ereport(DEBUG1, > > + errmsg_internal("switched WAL source from %s to %s after %s", > > + xlogSourceNames[oldSource], > > Not sure if you're intentionally changing from DEBUG1 from DEBUG2. Will change. > > * standby and increase the replication lag on primary. > > Do you mean "increase replication lag on standby"? > nit: reading from archive *could* be faster since you could in theory > it's not single-processed/threaded. Yes. I think we can just say "All of these can impact the recovery performance on + * standby and increase the replication lag." > > However, > > + * exhaust all the WAL present in pg_wal before switching. If successful, > > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls > > + * back to XLOG_FROM_ARCHIVE state. > > I think I'm missing how this happens. Or what "successful" means. If I'm > reading > it right, no matter what happens we will always move to > XLOG_FROM_STREAM based on how > the state machine works? Please have a look at some discussion upthread on exhausting pg_wal before switching - https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13. Even today, the standby exhausts pg_wal before switching to streaming from the archive. > I tested this in a basic RR setup without replication slots (e.g. log > shipping) where the > WAL is available in the archive but the primary always has the WAL > rotated out and > 'streaming_replication_retry_interval = 1'. This leads the RR to > become stuck where it stops fetching from > archive and loops between XLOG_FROM_PG_WAL and XLOG_FROM_STREAM. Nice catch. This is a problem. One idea is to disable streaming_replication_retry_interval feature for slot-less streaming replication - either when primary_slot_name isn't specified disallow the GUC to be set in assign_hook or when deciding to switch the wal source. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Virtual generated columns
On Thu, Aug 29, 2024 at 8:15 PM Peter Eisentraut wrote: > > > > drop table if exists comment_test cascade; > > CREATE TABLE comment_test ( > >id int, > >positive_col int GENERATED ALWAYS AS (22) CHECK (positive_col > 0), > >positive_col1 int GENERATED ALWAYS AS (22) stored CHECK (positive_col > > > 0) , > >indexed_col int, > >CONSTRAINT comment_test_pk PRIMARY KEY (id)); > > CREATE INDEX comment_test_index ON comment_test(indexed_col); > > ALTER TABLE comment_test ALTER COLUMN positive_col1 SET DATA TYPE text; > > ALTER TABLE comment_test ALTER COLUMN positive_col SET DATA TYPE text; > > the last query should work just fine? > > I played with this and I don't see anything wrong with the current > behavior. I noticed that in your test case > > >positive_col1 int GENERATED ALWAYS AS (22) stored CHECK > (positive_col > 0) , > > you have the wrong column name in the check constraint. I'm not sure if > that was intentional. > That's my mistake. sorry for the noise. On Wed, Aug 21, 2024 at 6:52 PM Dean Rasheed wrote: > > > Another argument for doing it that way round is to not add too many > extra cycles to the processing of existing queries that don't > reference generated expressions. ISTM that this patch is potentially > adding quite a lot of additional overhead -- it looks like, for every > Var in the tree, it's calling get_attgenerated(), which involves a > syscache lookup to see if that column is a generated expression (which > most won't be). Ideally, we should be trying to do the minimum amount > of extra work in the common case where there are no generated > expressions. > > Regards, > Dean > > The new patch does some rebasing and contains various fixes to the > issues you presented. As I mentioned, I'll look into improving the > rewriting. based on your latest patch (v4-0001-Virtual-generated-columns.patch), I did some minor cosmetic code change and tried to address get_attgenerated overhead. basically in expand_generated_columns_in_query and expand_generated_columns_in_expr preliminary collect (reloid,attnum) that have generated_virtual flag into expand_generated_context. later in expand_generated_columns_mutator use the collected information. deal with wholerow within the expand_generated_columns_mutator seems tricky, will try later. v4-0001-Virtual-generated-columns_minorchange.no-cfbot Description: Binary data
Re: pg_verifybackup: TAR format backup verification
On Sat, Aug 24, 2024 at 2:02 AM Robert Haas wrote: > > On Wed, Aug 21, 2024 at 7:08 AM Amul Sul wrote: > [] > Then the result verifies. But I feel like we should have some test > cases that do this kind of stuff so that there is automated > verification. In fact, the current patch seems to have no negative > test cases at all. I think we should test all the cases in > 003_corruption.pl with tar format backups as well as with plain format > backups, which we could do by untarring one of the archives, messing > something up, and then retarring it. I also think we should have some > negative test case specific to tar-format backup. I suggest running a > coverage analysis and trying to craft test cases that hit as much of > the code as possible. There will probably be some errors you can't > hit, but you should try to hit the ones you can. > Done. I’ve added a few tests that extract, modify, and repack the tar files, mainly base.tar and skipping tablespace.tar since it mostly duplicate tests. I’ve also updated 002_algorithm.pl to cover tests for tar backups. > > > 0011 patch: Regarding function names: > > > > 1. named the function verify_tar_backup_file() to align with > > verify_plain_backup_file(), but it does not perform the complete > > verification as verify_plain_backup_file does. Not sure if it is the > > right name. > > I was thinking of something like precheck_tar_backup_file(). Done. > > 2. verify_tar_file_contents() is the second and final part of tar > > backup verification. Should its name be aligned with > > verify_tar_backup_file()? I’m unsure what the best name would be. > > Perhaps verify_tar_backup_file_final(), but then > > verify_tar_backup_file() would need to be renamed to something like > > verify_tar_backup_file_initial(), which might be too lengthy. > > verify_tar_file_contents() actually verifies the contents of all the > tar files, not just one, so the name is a bit confusing. Maybe > verify_all_tar_files(). > Done. > > > 3. verify_tar_contents() is the core of verify_tar_file_contents() > > that handles the actual verification. I’m unsure about the current > > naming. Should we rename it to something like > > verify_tar_contents_core()? It wouldn’t be an issue if we renamed > > verify_tar_file_contents() as pointed in point #2. > > verify_one_tar_file()? > Done. > > But with those renames, I think you really start to see why I'm not > very comfortable with verify_backup_directory(). The tar and plain > format cases aren't really doing the same thing. We're just gluing > them into a single function anyway. Agreed. I can see the uncomfortness -- added a new function. > > I am also still uncomfortable with the way you've refactored some of > this so that we end up with very small amounts of code far away from > other code that they influence. Like you end up with this: > > /* Check the backup manifest entry for this file. */ > m = verify_manifest_entry(context, relpath, sb.st_size); > > /* Validate the pg_control information */ > if (should_verify_control_data(context->manifest, m)) > ... > if (show_progress && !context->skip_checksums && > should_verify_checksum(m)) > > But verify_manifest_entry can return NULL or it can set m->bad and > either of those change the result of should_verify_control_data() and > should_verify_checksum(), but none of that is obvious when you just > look at this. Admittedly, the code in master isn't brilliant in terms > of making it obvious what's happening either, but I think this is > worse. I'm not really sure what I think we should do about that yet, > but I'm uncomfortable with it. I am not sure if I fully understand the concern, but I see it differently. The verify_manifest_entry function returns an entry, m, that the caller doesn't need to worry about, as it simply passes it to subsequent routines or macros that are aware of the possible inputs -- whether it's NULL, m->bad, or something else. Regards, Amul From 4f98ffa42916fe179e9a87b9043393b6449f1705 Mon Sep 17 00:00:00 2001 From: Amul Sul Date: Wed, 14 Aug 2024 10:42:37 +0530 Subject: [PATCH v13 06/12] Refactor: split verify_backup_file() function and rename it. The function verify_backup_file() has now been renamed to verify_plain_backup_file() to make it clearer that it is specifically used for verifying files in a plain backup. Similarly, in a future patch, we would have a verify_tar_backup_file() function for verifying TAR backup files. In addition to that, moved the manifest entry verification code into a new function called verify_manifest_entry() so that it can be reused for tar backup verification. If verify_manifest_entry() doesn't find an entry, it reports an error as before and returns NULL to the caller. This is why a NULL check is added to should_verify_checksum(). --- src/bin/pg_verifybackup/pg_verifybackup.c | 58 +++ src/bin/pg_verifybackup/pg_verifybackup.h | 6 ++- 2 files changed,
Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Hello, This patch was a bit discussed on [1], and with more details on [2]. It introduces four new columns in pg_stat_all_tables: * parallel_seq_scan * last_parallel_seq_scan * parallel_idx_scan * last_parallel_idx_scan and two new columns in pg_stat_all_indexes: * parallel_idx_scan * last_parallel_idx_scan As Benoit said yesterday, the intent is to help administrators evaluate the usage of parallel workers in their databases and help configuring parallelization usage. A test script (test.sql) is attached. You can execute it with "psql -Xef test.sql your_database" (your_database should not contain a t1 table as it will be dropped and recreated). Here is its result, a bit commented: DROP TABLE IF EXISTS t1; DROP TABLE CREATE TABLE t1 (id integer); CREATE TABLE INSERT INTO t1 SELECT generate_series(1, 10_000_000); INSERT 0 1000 VACUUM ANALYZE t1; VACUUM SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan, last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1' -[ RECORD 1 ]--+--- relname| t1 seq_scan | 0 last_seq_scan | parallel_seq_scan | 0 last_parallel_seq_scan | ==> no scan at all, the table has just been created SELECT * FROM t1 LIMIT 1; id 1 (1 row) SELECT pg_sleep(1); SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan, last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1' -[ RECORD 1 ]--+-- relname| t1 seq_scan | 1 last_seq_scan | 2024-08-29 15:43:17.377182+02 parallel_seq_scan | 0 last_parallel_seq_scan | ==> one sequential scan, no parallelization SELECT count(*) FROM t1; count -- 1000 (1 row) SELECT pg_sleep(1); SELECT relname, seq_scan, last_seq_scan, parallel_seq_scan, last_parallel_seq_scan FROM pg_stat_user_tables WHERE relname='t1' -[ RECORD 1 ]--+-- relname| t1 seq_scan | 4 last_seq_scan | 2024-08-29 15:43:18.504533+02 parallel_seq_scan | 3 last_parallel_seq_scan | 2024-08-29 15:43:18.504533+02 ==> one parallel sequential scan ==> I use the default configuration, so parallel_leader_participation = on, max_parallel_workers_per_gather = 2 ==> meaning 3 parallel sequential scans (1 leader, two workers) ==> take note that seq_scan was also incremented... we didn't change the previous behaviour for this column CREATE INDEX ON t1(id); CREATE INDEX SELECT indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE relname='t1' -[ RECORD 1 ]--+-- indexrelname | t1_id_idx idx_scan | 0 last_idx_scan | parallel_idx_scan | 0 last_parallel_idx_scan | idx_tup_read | 0 idx_tup_fetch | 0 ==> no scan at all, the index has just been created SELECT * FROM t1 WHERE id=15; id 15 (1 row) SELECT pg_sleep(1); SELECT indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE relname='t1' -[ RECORD 1 ]--+-- indexrelname | t1_id_idx idx_scan | 1 last_idx_scan | 2024-08-29 15:43:22.020853+02 parallel_idx_scan | 0 last_parallel_idx_scan | idx_tup_read | 1 idx_tup_fetch | 0 ==> one index scan, no parallelization SELECT * FROM t1 WHERE id BETWEEN 10 AND 40; SELECT pg_sleep(1); pg_sleep -- (1 row) SELECT indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE relname='t1' -[ RECORD 1 ]--+-- indexrelname | t1_id_idx idx_scan | 2 last_idx_scan | 2024-08-29 15:43:23.136665+02 parallel_idx_scan | 0 last_parallel_idx_scan | idx_tup_read | 32 idx_tup_fetch | 0 ==> another index scan, no parallelization SELECT count(*) FROM t1 WHERE id BETWEEN 10 AND 40; count 31 (1 row) SELECT pg_sleep(1); SELECT indexrelname,idx_scan,last_idx_scan,parallel_idx_scan,last_parallel_idx_scan,idx_tup_read,idx_tup_fetch FROM pg_stat_user_indexes WHERE relname='t1' -[ RECORD 1 ]--+- indexrelname | t1_id_idx idx_scan | 5 last_idx_scan | 2024-08-29 15:43:24.16057+02 parallel_idx_scan | 3 last_parallel_idx_scan | 2024-08-29 15:43:24.16057+02 idx_tup_read | 63 idx_tup_fetch | 0 ==> one parallel index scan ==> same thing, 3 parallel index scans (1 leader, two workers) ==> also, take note that idx_scan was also incremented... we didn't change the previous behaviour for this column First time I had to add new columns to a statistics catalog. I'm actually not sure that we were right to change pg_proc.dat manually. We'l
Re: Add contrib/pg_logicalsnapinspect
Hi, On Thu, Aug 29, 2024 at 06:33:19PM +0530, Bharath Rupireddy wrote: > On Thu, Aug 29, 2024 at 3:44 PM Bertrand Drouvot > wrote: > > > > That's right. I think this one would be simply enough to expose one or two > > functions in core too (and probably would not need an extra module). > > +1 for functions in core unless this extra module > pg_logicalsnapinspect works as a tool to be helpful even when the > server is down. Thanks for the feedback! I don't see any use case where it could be useful when the server is down. So, I think I'll move forward with in core functions (unless someone has a different opinion). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Removing log_cnt from pg_sequence_read_tuple()
On Thu, Aug 29, 2024 at 02:11:22PM +0900, Michael Paquier wrote: > Okay, here is a v2 of the patch using this name for the function. LGTM -- nathan
Re: 039_end_of_wal: error in "xl_tot_len zero" test
On Thu, Aug 29, 2024 at 01:55:27AM -0400, Tom Lane wrote: > Thomas Munro writes: >> The fix I propose to commit shortly is just the first of those new >> lines, to homogenise the initial state. See attached. The previous >> idea works too, I think, but this bigger hammer is more obviously >> removing variation. > > +1, but a comment explaining the need for the pg_switch_wal call > seems in order. +1 -- nathan
Re: Partitioned tables and [un]loggedness
On Thu, Aug 29, 2024 at 03:44:45PM +0900, Michael Paquier wrote: > On Tue, Aug 27, 2024 at 04:01:58PM -0500, Nathan Bossart wrote: >> My current thinking is that it would be better to disallow marking >> partitioned tables as LOGGED/UNLOGGED and continue to have users explicitly >> specify what they want for each partition. It'd still probably be good to >> expand the documentation, but a clear ERROR when trying to set a >> partitioned table as UNLOGGED would hopefully clue folks in. > > The addition of the new LOGGED keyword is not required if we limit > ourselves to an error when defining UNLOGGED, so if we drop this > proposal, let's also drop this part entirely and keep DefineRelation() > simpler. +1 > Actually, is really issuing an error the best thing we can > do after so many years allowing this grammar flavor to go through, > even if it is perhaps accidental? relpersistence is marked correctly > for partitioned tables, it's just useless. Expanding the > documentation sounds fine to me, one way or the other, to tell what > happens with partitioned tables. IMHO continuing to allow partitioned tables to be marked UNLOGGED just preserves the illusion that it does something. An ERROR could help dispel that misconception. -- nathan
Re: pgstattuple: fix free space calculation
On 8/23/24 12:51, Frédéric Yhuel wrote: On 8/23/24 12:02, Rafia Sabih wrote: On the other hand, this got me thinking about the purpose of this space information. If we want to understand that there's still some space for the tuples in a page, then using PageGetExactFreeSpace is not doing justice in case of heap page, because we will not be able to add any more tuples there if there are already MaxHeapTuplesPerPage tuples there. We won't be able to add, but we will be able to update a tuple in this page. Sorry, that's not true. So in this marginal case we have free space that's unusable in practice. No INSERT or UPDATE (HOT or not) is possible inside the page. I don't know what pgstattuple should do in this case. However, we should never encounter this case in practice (maybe on some exotic architectures with strange alignment behavior?). As I said, I can't fit more than 226 tuples per page on my machine, while MaxHeapTuplesPerPage is 291. Am I missing something? Besides, pgstattuple isn't mission critical, is it? So I think we should just use PageGetExactFreeSpace(). Here is a v3 patch. It's the same as v2, I only removed the last paragraph in the commit message. Thank you Rafia and Andreas for your review and test. Best regards, FrédéricFrom f749d4ca2d6881a916c6ca2a3b882bb2740188d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fr=C3=A9d=C3=A9ric=20Yhuel?= Date: Wed, 21 Aug 2024 15:05:11 +0200 Subject: [PATCH v3] pgstattuple: use PageGetExactFreeSpace() instead of PageGetHeapFreeSpace() or PageGetFreeSpace() We want the exact free space, we don't care if there's enough room for another line pointer. --- contrib/pgstattuple/pgstatapprox.c | 4 ++-- contrib/pgstattuple/pgstatindex.c | 2 +- contrib/pgstattuple/pgstattuple.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c index c84c642355..b63a9932d7 100644 --- a/contrib/pgstattuple/pgstatapprox.c +++ b/contrib/pgstattuple/pgstatapprox.c @@ -107,11 +107,11 @@ statapprox_heap(Relation rel, output_type *stat) page = BufferGetPage(buf); /* - * It's not safe to call PageGetHeapFreeSpace() on new pages, so we + * It's not safe to call PageGetExactFreeSpace() on new pages, so we * treat them as being free space for our purposes. */ if (!PageIsNew(page)) - stat->free_space += PageGetHeapFreeSpace(page); + stat->free_space += PageGetExactFreeSpace(page); else stat->free_space += BLCKSZ - SizeOfPageHeaderData; diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c index 5c06ba6db4..1b6b768cf8 100644 --- a/contrib/pgstattuple/pgstatindex.c +++ b/contrib/pgstattuple/pgstatindex.c @@ -311,7 +311,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo) max_avail = BLCKSZ - (BLCKSZ - ((PageHeader) page)->pd_special + SizeOfPageHeaderData); indexStat.max_avail += max_avail; - indexStat.free_space += PageGetFreeSpace(page); + indexStat.free_space += PageGetExactFreeSpace(page); indexStat.leaf_pages++; diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c index 3bd8b96197..7e2a7262a3 100644 --- a/contrib/pgstattuple/pgstattuple.c +++ b/contrib/pgstattuple/pgstattuple.c @@ -372,7 +372,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, RBM_NORMAL, hscan->rs_strategy); LockBuffer(buffer, BUFFER_LOCK_SHARE); - stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer)); + stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer)); UnlockReleaseBuffer(buffer); block++; } @@ -385,7 +385,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block, RBM_NORMAL, hscan->rs_strategy); LockBuffer(buffer, BUFFER_LOCK_SHARE); - stat.free_space += PageGetHeapFreeSpace((Page) BufferGetPage(buffer)); + stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer)); UnlockReleaseBuffer(buffer); block++; } @@ -565,7 +565,7 @@ pgstat_index_page(pgstattuple_type *stat, Page page, { OffsetNumber i; - stat->free_space += PageGetFreeSpace(page); + stat->free_space += PageGetExactFreeSpace(page); for (i = minoff; i <= maxoff; i = OffsetNumberNext(i)) { -- 2.39.2
Re: Parallel workers stats in pg_stat_database
Hi, This is a new patch which: * fixes some typos * changes the execgather / execgathermerge code so that the stats are accumulated in EState and inserted in pg_stat_database only once, during ExecutorEnd * adds tests (very ugly, but I could get the parallel plan to be stable across make check executions.) On 8/28/24 17:10, Benoit Lobréau wrote: Hi hackers, This patch introduces four new columns in pg_stat_database: * parallel_worker_planned * parallel_worker_launched * parallel_maint_worker_planned * parallel_maint_worker_launched The intent is to help administrators evaluate the usage of parallel workers in their databases and help sizing max_worker_processes, max_parallel_workers or max_parallel_maintenance_workers). Here is a test script: psql << _EOF_ -- Index creation DROP TABLE IF EXISTS test_pql; CREATE TABLE test_pql(i int, j int); INSERT INTO test_pql SELECT x,x FROM generate_series(1,100) as F(x); -- 0 planned / 0 launched EXPLAIN (ANALYZE) SELECT 1; -- 2 planned / 2 launched EXPLAIN (ANALYZE) SELECT i, avg(j) FROM test_pql GROUP BY i; SET max_parallel_workers TO 1; -- 4 planned / 1 launched EXPLAIN (ANALYZE) SELECT i, avg(j) FROM test_pql GROUP BY i UNION SELECT i, avg(j) FROM test_pql GROUP BY i; RESET max_parallel_workers; -- 1 planned / 1 launched CREATE INDEX ON test_pql(i); SET max_parallel_workers TO 0; -- 1 planned / 0 launched CREATE INDEX ON test_pql(j); -- 1 planned / 0 launched CREATE INDEX ON test_pql(i, j); SET maintenance_work_mem TO '96MB'; RESET max_parallel_workers; -- 2 planned / 2 launched VACUUM (VERBOSE) test_pql; SET max_parallel_workers TO 1; -- 2 planned / 1 launched VACUUM (VERBOSE) test_pql; -- TOTAL: parallel workers: 6 planned / 3 launched -- TOTAL: parallel maint workers: 7 planned / 4 launched _EOF_ And the output in pg_stat_database a fresh server without any configuration change except thoses in the script: [local]:5445 postgres@postgres=# SELECT datname, parallel_workers_planned, parallel_workers_launched, parallel_maint_workers_planned, parallel_maint_workers_launched FROM pg _stat_database WHERE datname = 'postgres' \gx -[ RECORD 1 ]---+- datname | postgres parallel_workers_planned | 6 parallel_workers_launched | 3 parallel_maint_workers_planned | 7 parallel_maint_workers_launched | 4 Thanks to: Jehan-Guillaume de Rorthais, Guillaume Lelarge and Franck Boudehen for the help and motivation boost. --- Benoit Lobréau Consultant http://dalibo.com -- Benoit Lobréau Consultant http://dalibo.comFrom 0338cfb11ab98594b2f16d143b505e269566bb6e Mon Sep 17 00:00:00 2001 From: benoit Date: Wed, 28 Aug 2024 02:27:13 +0200 Subject: [PATCH] Adds four parallel workers stat columns to pg_stat_database * parallel_workers_planned * parallel_workers_launched * parallel_maint_workers_planned * parallel_maint_workers_launched --- doc/src/sgml/monitoring.sgml | 36 src/backend/access/brin/brin.c | 4 +++ src/backend/access/nbtree/nbtsort.c | 4 +++ src/backend/catalog/system_views.sql | 4 +++ src/backend/commands/vacuumparallel.c| 5 +++ src/backend/executor/execMain.c | 5 +++ src/backend/executor/execUtils.c | 3 ++ src/backend/executor/nodeGather.c| 3 ++ src/backend/executor/nodeGatherMerge.c | 3 ++ src/backend/utils/activity/pgstat_database.c | 36 src/backend/utils/adt/pgstatfuncs.c | 12 +++ src/include/catalog/pg_proc.dat | 20 +++ src/include/nodes/execnodes.h| 3 ++ src/include/pgstat.h | 7 src/test/regress/expected/rules.out | 4 +++ src/test/regress/expected/stats.out | 17 + src/test/regress/sql/stats.sql | 14 17 files changed, 180 insertions(+) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 55417a6fa9..8c4b11c11d 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3508,6 +3508,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage + + + parallel_workers_planned bigint + + + Number of parallel workers planned by queries on this database + + + + + + parallel_workers_launched bigint + + + Number of parallel workers obtained by queries on this database + + + + + + parallel_maint_workers_planned bigint + + + Number of parallel workers planned by utilities on this database + + + + + + parallel_maint_workers_launched bigint + + + Number of parallel workers obtained by utilities on this database + + + stats_reset timestamp with time zone diff --git a
Re: Make printtup a bit faster
David Rowley writes: > [ redesign I/O function APIs ] > I had planned to work on this for PG18, but I'd be happy for some > assistance if you're willing. I'm skeptical that such a thing will ever be practical. To avoid breaking un-converted data types, all the call sites would have to support both old and new APIs. To avoid breaking non-core callers, all the I/O functions would have to support both old and new APIs. That probably adds enough overhead to negate whatever benefit you'd get. regards, tom lane
Re: race condition in pg_class
On Thu, Aug 29, 2024 at 8:11 PM Noah Misch wrote: > > On Tue, Aug 20, 2024 at 11:59:45AM +0300, Heikki Linnakangas wrote: > > My order of preference is: 2, 1, 3. > > I kept tuple locking responsibility in heapam.c. That's simpler and better > for modularity, but it does mean we release+acquire after any xmax wait. > Before, we avoided that if the next genam.c scan found the same TID. (If the > next scan finds the same TID, the xmax probably aborted.) I think DDL aborts > are rare enough to justify simplifying as this version does. I don't expect > anyone to notice the starvation outside of tests built to show it. (With > previous versions, one can show it with a purpose-built test that commits > instead of aborting, like the "001_pgbench_grant@9" test.) > > This move also loses the optimization of unpinning before XactLockTableWait(). > heap_update() doesn't optimize that way, so that's fine. > > The move ended up more like (1), though I did do > s/systable_inplace_update_scan/systable_inplace_update_begin/ like in (2). I > felt that worked better than (2) to achieve lock release before > CacheInvalidateHeapTuple(). Alternatives that could be fine: > >From a consistency point of view, I find it cleaner if we can have all the heap_inplace_lock and heap_inplace_unlock in the same set of functions. So here those would be the systable_inplace_* functions. > - In the cancel case, call both systable_inplace_update_cancel and > systable_inplace_update_end. _finish or _cancel would own unlock, while > _end would own systable_endscan(). > What happens to CacheInvalidateHeapTuple() in this approach? I think it will still need to be brought to the genam.c layer if we are releasing the lock in systable_inplace_update_finish. > - Hoist the CacheInvalidateHeapTuple() up to the genam.c layer. While > tolerable now, this gets less attractive after the inplace160 patch from > https://postgr.es/m/flat/20240523000548.58.nmi...@google.com > I skimmed through the inplace160 patch. It wasn't clear to me why this becomes less attractive with the patch. I see there is a new CacheInvalidateHeapTupleInPlace but that looks like it would be called while holding the lock. And then there is an AcceptInvalidationMessages which can perhaps be moved to the genam.c layer too. Is the concern that one invalidation call will be in the heapam layer and the other will be in the genam layer? Also I have a small question from inplace120. I see that all the places we check resultRelInfo->ri_needLockTagTuple, we can just call IsInplaceUpdateRelation(resultRelInfo->ri_RelationDesc). Is there a big advantage of storing a separate bool field? Also there is another write to ri_RelationDesc in CatalogOpenIndexes in src/backlog/catalog/indexing.c. I think ri_needLockTagTuple needs to be set there also to keep it consistent with ri_RelationDesc. Please let me know if I am missing something about the usage of the new field. Thanks & Regards, Nitin Motiani Google
Re: RFC: Additional Directory for Extensions
On Aug 27, 2024, at 22:24, Craig Ringer wrote: > `pg_config` only cares about compile-time settings, so I would not > expect its output to change. Right, of course that’s its original purpose, but extensions depend on it to determine where to install extensions. Not just PGXS, but also pgrx and various Makefile customizations I’ve seen in the wild. > I suspect we'd have to add PGXS extension-path awareness if going for > per-extension subdir support. I'm not sure it makes sense to teach > `pg_config` about this, since it'd need to have a different mode like > >pg_config --extension myextname --extension-sharedir > > since the extension's "sharedir" is > $basedir/extensions/myextension/share or whatever. Right. PGXS would just need to know where to put the directory for an extension. There should be a default for the project, and then it can be overridden with something like DESTDIR (but without full paths under that prefix). > Supporting this looks to be a bit intrusive in the makefiles, > requiring them to differentiate between "share dir for extensions" and > "share dir for !extensions", etc. I'm not immediately sure if it can > be done in a way that transparently converts unmodified extension PGXS > makefiles to target the new paths; it might require an additional > define, or use of new variables and an ifdef block to add > backwards-compat to the extension makefile for building on older > postgres. Yeah, might just have to be an entirely new thing, though it sure would be nice for existing PGXS-using Makefiles to do the right thing. Maybe for the new version of the server with the proposed new pattern it would dispatch to the new thing somehow without modifying all the rest of its logic. > Right. The proposed structure is rather a bigger change than I was > thinking when I suggested supporting an extension search path not just > a single extra path. But it's also a cleaner proposal; the > per-extension directory would make it easier to ensure that the > extension control file, sql scripts, and dynamic library all match the > same extension and version if multiple ones are on the path. Which is > desirable when doing things like testing a new version of an in-core > extension. 💯 > Right. And I've been on the receiving end of having a small, focused > patch derailed by others jumping in and scope-exploding it into > something completely different to solve a much wider but related > problem. I’m not complaining, I would definitely prefer to see consensus on a cleaner proposal along the lines we’ve discussed and a commitment to time from parties able to get it done in time for v18. I’m willing to help where I can with my baby C! Failing that, we can fall back on the destdir patch. > I'm definitely not trying to stand in the way of progress with this; I > just want to make sure that it doesn't paint us into a corner that > prevents a more general solution from being adopted later. That's why > I'm suggesting making the config a multi-value string (list of paths) > and raising a runtime "ERROR: searching multiple paths for extensions > not yet supported" or something if >1 path configured. > > If that doesn't work, no problem. I think the logic would have to be different, so they’d be different GUCs with their own semantics. But if the core team and committers are on board with the general idea of search paths and per-extension directory organization, it would be best to avoid getting stuck with maintaining the current patch’s GUC. OTOH, we could get it committed now and revert it later if we get the better thing done and committed. >> I think we should get some clarity on the proposal, and then consensus, as >> you say. I say “get some clarity” because my proposal doesn’t require >> recursing, and I’m not sure why it’d be needed. > > From what you and Gabriele are discussing (which I agree with), it wouldn’t. Ah, great. I’ll try to put some thought into a more formal proposal in a new thread next week. Unless your Gabriele beats me to it 😂. Best, David
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
SQL:2023 JSON simplified accessor support
Hello Hackers, I’ve attached a patch to start adding SQL:2023 JSON simplified accessor support. This allows accessing JSON or JSONB fields using dot notation (e.g., colname.field.field...), similar to composite types. Currently, PostgreSQL uses nonstandard syntax like colname->x->y for JSON and JSONB, and colname['blah'] for JSONB. These existing syntaxes predate the standard. Oracle already supports the standard dot notation syntax [1]. The full specification for the JSON simplified accessor format is as follows: ::= ::= | ::= | | | | I’ve implemented the member and array accessors and attached two alternative patches: 1. v1-0001-Add-JSON-JSONB-simplified-accessor.patch: This patch enables dot access to JSON object fields and subscript access to indexed JSON array elements by converting "." and "[]" indirection into a JSON_QUERY JsonFuncExpr node. 2. v2-0001-Transform-JSON-dot-access-to-arrow-operator.txt: This alternative patch implements dot access to JSON object fields by transforming the "." indirection into a "->" operator. The upside of the v1 patch is that it strictly aligns with the SQL standard, which specifies that the simplified access is equivalent to: JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) However, the performance of JSON_QUERY might be suboptimal due to function call overhead. Therefore, I implemented the v2 alternative using the "->" operator. There is some uncertainty about the semantics of conditional array wrappers. Currently, there is at least one subtle difference between the "->" operator and JSON_QUERY, as shown: postgres=# select '{"a": 42}'::json->'a'; ?column? -- 42 (1 row) postgres=# select json_query('{"a": 42}'::json, 'lax $.a' with conditional array wrapper null on empty null on error); json_query [42] (1 row) JSON_QUERY encloses the JSON value 42 in brackets, which may be a bug, as Peter noted [2]. If there are no other semantic differences, we could implement simple access without using JSON_QUERY to avoid function call overhead. I aim to first enable standard dot notation access to JSON object fields. Both patches implement this, and I’m also open to alternative approaches. For subscripting access to jsonb array elements, jsonb already supports this via the subscripting handler interface. In the v1 patch, I added json support using JSON_QUERY, but I can easily adapt this for the v2 patch using the -> operator. I did not leverage the subscripting handler interface for json because implementing the fetch/assign functions for json seems challenging for plain text. Let me know if you have a different approach in mind. Finally, I have not implemented wildcard or item method accessors yet and would appreciate input on their necessity. [1] https://docs.oracle.com/en/database/oracle/oracle-database/21/adjsn/simple-dot-notation-access-to-json-data.html#GUID-7249417B-A337-4854-8040-192D5CEFD576 [2] https://www.postgresql.org/message-id/8022e067-818b-45d3-8fab-6e0d94d03...@eisentraut.org From 99ee1cbc45ebd76ce21881bda95933d8a5a104c6 Mon Sep 17 00:00:00 2001 From: Alexandra Wang Date: Thu, 15 Aug 2024 02:11:33 -0700 Subject: [PATCH v2] Transform JSON dot access to arrow operator Enabled dot-notation access to JSON/JSONB object by making a syntatic sugar for the "->" operator in ParseFuncOrColumn() for arg of JSON/JSONB type. JSON array access via subscripting is not yet supported in this patch, but can be implemented similarly by creating an OpExpr for the json_array_element "->" operator. Note that the output of the "->" operators are not wrapped by brackets, which differs from the SQL standard specification for the JSON simplified accessor equivalence shown below: JSON_QUERY (VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON EMPTY NULL ON ERROR) --- src/backend/parser/parse_func.c | 57 +--- src/include/catalog/pg_operator.dat | 4 +- src/include/parser/parse_type.h | 1 + src/test/regress/expected/json.out | 67 + src/test/regress/expected/jsonb.out | 55 +++ src/test/regress/sql/json.sql | 20 + src/test/regress/sql/jsonb.sql | 17 7 files changed, 214 insertions(+), 7 deletions(-) diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c index 9b23344a3b..431c9883f2 100644 --- a/src/backend/parser/parse_func.c +++ b/src/backend/parser/parse_func.c @@ -33,6 +33,8 @@ #include "utils/builtins.h" #include "utils/lsyscache.h" #include "utils/syscache.h" +#include "parser/parse_oper.h" +#include "catalog/pg_operator_d.h" /* Possible error codes from LookupFuncNameInternal */ @@ -48,6 +50,8 @@ static void unify_hypothetical_args(ParseState *pstate, static Oid FuncNameAsType(List *funcname); static Node *ParseComplexProjection(ParseState *pstate, const char *funcname,
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
Hi all, Please find attached the patch that re-enables support for sequences within the pgstattuple extension. I have also included the necessary test cases for sequences, implemented in the form of regress tests. Here is the commitfest link for the same - https://commitfest.postgresql.org/49/5215/ Regards Ayush Vatsa AWS From 95ab0aa019e1cfec73bc94448faeafeac67b434e Mon Sep 17 00:00:00 2001 From: Ayush Vatsa Date: Thu, 29 Aug 2024 21:40:29 +0530 Subject: [PATCH v1] Introduced support for sequences back in pgstattuple extension --- contrib/pgstattuple/expected/pgstattuple.out | 24 contrib/pgstattuple/pgstattuple.c| 6 - contrib/pgstattuple/sql/pgstattuple.sql | 10 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/contrib/pgstattuple/expected/pgstattuple.out b/contrib/pgstattuple/expected/pgstattuple.out index 283856e109..1e79fd9036 100644 --- a/contrib/pgstattuple/expected/pgstattuple.out +++ b/contrib/pgstattuple/expected/pgstattuple.out @@ -273,6 +273,30 @@ select pgstathashindex('test_partition_hash_idx'); (4,8,0,1,0,0,0,100) (1 row) +-- test on sequence +-- only pgstattuple and pg_relpages should work +create sequence serial; +select * from pgstattuple('serial'); + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +---+-+---+---+--++++-- + 8192 | 1 |41 | 0.5 |0 | 0 | 0 | 8104 |98.93 +(1 row) + +select pgstatindex('serial'); +ERROR: relation "serial" is not a btree index +select pgstatginindex('serial'); +ERROR: relation "serial" is not a GIN index +select pgstathashindex('serial'); +ERROR: relation "serial" is not a hash index +select pg_relpages('serial'); + pg_relpages +- + 1 +(1 row) + +select * from pgstattuple_approx('serial'); +ERROR: relation "serial" is of wrong relation kind +DETAIL: This operation is not supported for sequences. drop table test_partitioned; drop view test_view; drop foreign table test_foreign_table; diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c index 3bd8b96197..13c70c4152 100644 --- a/contrib/pgstattuple/pgstattuple.c +++ b/contrib/pgstattuple/pgstattuple.c @@ -323,7 +323,11 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo) pgstattuple_type stat = {0}; SnapshotData SnapshotDirty; - if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) + /** + * Sequences don't fall under heap AM but are still + * allowed for obtaining tuple-level statistics. + */ + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID && rel->rd_rel->relkind != RELKIND_SEQUENCE) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("only heap AM is supported"))); diff --git a/contrib/pgstattuple/sql/pgstattuple.sql b/contrib/pgstattuple/sql/pgstattuple.sql index b08c31c21b..ea121df0c7 100644 --- a/contrib/pgstattuple/sql/pgstattuple.sql +++ b/contrib/pgstattuple/sql/pgstattuple.sql @@ -119,6 +119,16 @@ create index test_partition_hash_idx on test_partition using hash (a); select pgstatindex('test_partition_idx'); select pgstathashindex('test_partition_hash_idx'); +-- test on sequence +-- only pgstattuple and pg_relpages should work +create sequence serial; +select * from pgstattuple('serial'); +select pgstatindex('serial'); +select pgstatginindex('serial'); +select pgstathashindex('serial'); +select pg_relpages('serial'); +select * from pgstattuple_approx('serial'); + drop table test_partitioned; drop view test_view; drop foreign table test_foreign_table; -- 2.41.0
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? 2) I would agree that other PLs should get the same detail. I don't know the other ones as I've been only working in pl/perl.
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
On Wed, Aug 28, 2024 at 9:25 AM Robert Haas wrote: > On Tue, Aug 27, 2024 at 3:04 PM Tom Lane wrote: > I kind of had that reaction too initially, but I think that was mostly > because "Primitive Index Scans" seemed extremely unclear. I think > "Index Searches" is pretty comprehensible, honestly. Why shouldn't > someone be able to figure out what that means? Worth noting that Lukas Fittl made a point of prominently highlighting the issue with how this works when he explained the Postgres 17 nbtree work: https://pganalyze.com/blog/5mins-postgres-17-faster-btree-index-scans And no, I wasn't asked to give any input to the blog post. Lukas has a general interest in making the system easier to understand for ordinary users. Presumably that's why he zeroed in on this one aspect of the work. It's far from an esoteric implementation detail. -- Peter Geoghegan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
On Thu, Aug 29, 2024 at 10:17:57PM +0530, Ayush Vatsa wrote: > Please find attached the patch that re-enables > support for sequences within the pgstattuple extension. > I have also included the necessary test cases for > sequences, implemented in the form of regress tests. Thanks. Robert, do you have any concerns with this? +select * from pgstattuple('serial'); + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +---+-+---+---+--++++-- + 8192 | 1 |41 | 0.5 |0 | 0 | 0 | 8104 |98.93 +(1 row) I'm concerned that some of this might be platform-dependent and make the test unstable. Perhaps we should just select count(*) here. + /** +* Sequences don't fall under heap AM but are still +* allowed for obtaining tuple-level statistics. +*/ I think we should be a bit more descriptive here, like the comment in verify_heapam.c: /* * Sequences always use heap AM, but they don't show that in the catalogs. * Other relkinds might be using a different AM, so check. */ -- nathan
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
On Thu, Aug 29, 2024 at 12:36:35PM -0500, Nathan Bossart wrote: > +select * from pgstattuple('serial'); > + table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | > dead_tuple_len | dead_tuple_percent | free_space | free_percent > +---+-+---+---+--++++-- > + 8192 | 1 |41 | 0.5 |0 | > 0 | 0 | 8104 |98.93 > +(1 row) > > I'm concerned that some of this might be platform-dependent and make the > test unstable. Perhaps we should just select count(*) here. Sure enough, the CI testing for 32-bit is failing here [0]. [0] https://api.cirrus-ci.com/v1/artifact/task/4798423386292224/testrun/build-32/testrun/pgstattuple/regress/regression.diffs -- nathan
Re: proposal: schema variables
Hi út 27. 8. 2024 v 16:52 odesílatel Laurenz Albe napsal: > time""On Tue, 2024-08-27 at 08:52 +0200, Pavel Stehule wrote: > > I can throw 200KB from another 300KB patch set which can be better for > review, but it > > can be harder to maintain this patch set. I'll try it, and I'll send a > reduced version. > > That was not a criticism, and I think the way you split up the patch set > right now > is as good as it probably gets. Ideally, one could say something like "we > need at least > patch 1 to 4, 5 and 6 are optional, but desirable, all others can easily > be deferred > to a later time". > It was mentioned here more times (I thought). 1..4 are fundamental 5..6 optional (6 are just tests) others can be deferred (5-6 can be deferred too). Without support of temporary session variables, it is not too probable to repeatedly CREATE, DROP the same variable in one session, so memory usage can be similar to today's workarounds, but against workarounds, session variables use types and access rights. Note - plpgsql doesn't try to delete compiled code from cache immediately too - so the behaviour implemented in 1..4 is "similar" to plpgsql func cache 14 .. allow you to use session variables as arguments of CALL or EXECUTE statement, and variables can be used in plpgsql simple expr. 15 .. variables don't block parallelism 16 .. the result of plpgsql simple expr can be assigned to session variables 17 .. expr with session variable can be inlined in SQL 14-17 are performance related 7 - was requested by some people - some functionality can be possibly replaced by plpgsql_check. It has only 40 rows - it just raise warning or error when some conditions are true Regards Pavel > > Yours, > Laurenz Albe >
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
> I'm concerned that some of this might be platform-dependent and make the > test unstable. Perhaps we should just select count(*) here. > Sure enough, the CI testing for 32-bit is failing here [0]. Thanks for catching that! I wasn't aware of this earlier. > I think we should be a bit more descriptive here Regarding the comment, I've tried to make it more descriptive and simpler than the existing one in verify_heapam.c. Here’s the comment I propose: /* * Sequences implicitly use the heap AM, even though it's not explicitly * recorded in the catalogs. For other relation kinds, verify that the AM * is heap; otherwise, raise an error. */ Please let me know if this still isn’t clear enough, then I can make further revisions in line with verify_heapam.c. The patch with all the changes is attached. Regards Ayush Vatsa AWS v1-0001-Introduced-support-for-sequences-back-in-pgstattu.patch Description: Binary data
Re: [PATCH] Add CANONICAL option to xmlserialize
út 27. 8. 2024 v 13:57 odesílatel Jim Jones napsal: > > > On 26.08.24 16:59, Pavel Stehule wrote: > > > > 1. what about behaviour of NO INDENT - the implementation is not too > > old, so it can be changed if we want (I think), and it is better to do > > early than too late > > While checking the feasibility of removing indentation with NO INDENT I > may have found a bug in XMLSERIALIZE ... INDENT. > xmlSaveToBuffer seems to ignore elements if there are whitespaces > between them: > > SELECT xmlserialize(DOCUMENT '42' AS text INDENT); > xmlserialize > - >+ >42+ > + > > (1 row) > > SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text > INDENT); > xmlserialize > > 42 + > > (1 row) > > I'll take a look at it. > +1 > Regarding removing indentation: yes, it would be possible with libxml2. > The question is if it would be right to do so. > > 2. Are we able to implement SQL/XML syntax with libxml2? > > > > 3. Are we able to implement Oracle syntax with libxml2? And there are > > benefits other than higher possible compatibility? > I guess it would be beneficial if you're migrating from oracle to > postgres - or the other way around. It certainly wouldn't hurt, but so > far I personally had little use for the oracle's extra xmlserialize > features. > > > > 4. Can there be some possible collision (functionality, syntax) with > > CANONICAL? > I couldn't find anything in the SQL/XML spec that might refer to > canonocal xml. > > > > 5. SQL/XML XMLSERIALIZE supports other target types than varchar. I > > can imagine XMLSERIALIZE with CANONICAL to bytea (then we don't need > > to force database encoding). Does it make sense? Are the results > > comparable? > | > As of pg16 bytea is not supported. Currently type| can be |character|, > |character varying|, or |text - also their other flavours like 'name'. > I know, but theoretically, there can be some benefit for CANONICAL if pg supports bytea there. Lot of databases still use non utf8 encoding. It is a more theoretical question - if pg supports different types there in future (because SQL/XML or Oracle), then CANONICAL can be used without limit, or CANONICAL can be used just for text? And you are sure, so you can compare text X text, instead xml X xml? +SELECT xmlserialize(CONTENT doc AS text CANONICAL) = xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM xmltest_serialize; + ?column? +-- + t + t +(2 rows) Maybe I am a little bit confused by these regress tests, because at the end it is not too useful - you compare two identical XML, and WITH COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search for a sense of this test. Better to use really different documents (columns) instead. Regards Pavel > > | > > -- > Jim > >
Primary and standby setting cross-checks
Currently, if you configure a hot standby server with a smaller max_connections setting than the primary, the server refuses to start up: LOG: entering standby mode FATAL: recovery aborted because of insufficient parameter settings DETAIL: max_connections = 10 is a lower setting than on the primary server, where its value was 100. HINT: You can restart the server after making the necessary configuration changes. Or if you change the setting in the primary while the standby is running, replay pauses: WARNING: hot standby is not possible because of insufficient parameter settings DETAIL: max_connections = 100 is a lower setting than on the primary server, where its value was 200. CONTEXT: WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: max_connections=200 max_worker_processes=8 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical wal_log_hints=off track_commit_timestamp=off LOG: recovery has paused DETAIL: If recovery is unpaused, the server will shut down. HINT: You can then restart the server after making the necessary configuration changes. CONTEXT: WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: max_connections=200 max_worker_processes=8 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical wal_log_hints=off track_commit_timestamp=off Both of these are rather unpleasant behavior. I thought I could get rid of that limitation with my CSN snapshot patch [1], because it gets rid of the fixed-size known-assigned XIDs array, but there's a second reason for these limitations. It's also used to ensure that the standby has enough space in the lock manager to hold possible AccessExclusiveLocks taken by transactions in the primary. So firstly, I think that's a bad tradeoff. In vast majority of cases, you would not run out of lock space anyway, if you just started up the system. Secondly, that cross-check of settings doesn't fully prevent the problem. It ensures that the lock tables are large enough to accommodate all the locks you could possibly hold in the primary, but that doesn't take into account any additional locks held by read-only queries in the hot standby. So if you have queries running in the standby that take a lot of locks, this can happen anyway: 2024-08-29 21:44:32.634 EEST [668327] FATAL: out of shared memory 2024-08-29 21:44:32.634 EEST [668327] HINT: You might need to increase "max_locks_per_transaction". 2024-08-29 21:44:32.634 EEST [668327] CONTEXT: WAL redo at 2/FD40FCC8 for Standby/LOCK: xid 996 db 5 rel 154045 2024-08-29 21:44:32.634 EEST [668327] WARNING: you don't own a lock of type AccessExclusiveLock 2024-08-29 21:44:32.634 EEST [668327] LOG: RecoveryLockHash contains entry for lock no longer recorded by lock manager: xid 996 database 5 relation 154045 TRAP: failed Assert("false"), File: "../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327 postgres: startup recovering 0001000200FD(ExceptionalCondition+0x6e)[0x556a4588396e] postgres: startup recovering 0001000200FD(+0x44156e)[0x556a4571356e] postgres: startup recovering 0001000200FD(StandbyReleaseAllLocks+0x78)[0x556a45712738] postgres: startup recovering 0001000200FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685] postgres: startup recovering 0001000200FD(shmem_exit+0x111)[0x556a457062e1] postgres: startup recovering 0001000200FD(+0x434132)[0x556a45706132] postgres: startup recovering 0001000200FD(proc_exit+0x59)[0x556a45706079] postgres: startup recovering 0001000200FD(errfinish+0x278)[0x556a45884708] postgres: startup recovering 0001000200FD(LockAcquireExtended+0xa46)[0x556a45719386] postgres: startup recovering 0001000200FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d] postgres: startup recovering 0001000200FD(standby_redo+0x70)[0x556a45713690] postgres: startup recovering 0001000200FD(PerformWalRecovery+0x7b3)[0x556a4547d313] postgres: startup recovering 0001000200FD(StartupXLOG+0xac3)[0x556a4546dae3] postgres: startup recovering 0001000200FD(StartupProcessMain+0xe8)[0x556a45693558] postgres: startup recovering 0001000200FD(+0x3ba95d)[0x556a4568c95d] postgres: startup recovering 0001000200FD(+0x3bce41)[0x556a4568ee41] postgres: startup recovering 0001000200FD(PostmasterMain+0x116e)[0x556a4568eaae] postgres: startup recovering 0001000200FD(+0x2f960e)[0x556a455cb60e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45] postgres: startup recovering 0001000200FD(_start+0x21)[0x556a453af011] 2024-08-29 21:44:32.641 EEST [668324] LOG: startup process (PID 668327) was terminated by signal 6: Aborted 2024-08-29 21:44:32.641 EEST [668324] LOG: terminating any other active
Re: Pgstattuple on Sequences: Seeking Community Feedback on Potential Patch
On Fri, Aug 30, 2024 at 12:17:47AM +0530, Ayush Vatsa wrote: > The patch with all the changes is attached. Looks generally reasonable to me. -- nathan
Re: Allow logical failover slots to wait on synchronous replication
Hi Shveta, Thanks for reviewing it so quickly. On Thu, Aug 29, 2024 at 2:35 AM shveta malik wrote: > > Thanks for the patch. Few comments and queries: > > 1) > + static XLogRecPtr lsn[NUM_SYNC_REP_WAIT_MODE]; > > We shall name it as 'lsns' as there are multiple. > This follows the same naming convention in walsender_private.h > 2) > > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > > Can we do it like below similar to what you have done at another place: > memset(lsn, InvalidXLogRecPtr, sizeof(lsn)); > I don't think memset works in this case? Well, I think *technically* works but not sure if that's something worth optimizing. If I understand correctly, memset takes in a char for the value and not XLogRecPtr (uint64). So something like: memset(lsn, 0, sizeof(lsn)) InvalidXLogRecPtr is defined as 0 so I think it works but there's an implicit dependency here for correctness. > 3) > + if (!initialized) > + { > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > + } > > I do not see 'initialized' set to TRUE anywhere. Can you please > elaborate the intent here? You're right I thought I fixed this. WIll update. > > 4) > + int mode = SyncRepWaitMode; > + int i; > + > + if (!initialized) > + { > + for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++) > + { > + lsn[i] = InvalidXLogRecPtr; > + } > + } > + if (mode == SYNC_REP_NO_WAIT) > + return true; > > I do not understand this code well. As Amit also pointed out, 'mode' > may change. When we initialize 'mode' lets say SyncRepWaitMode is > SYNC_REP_NO_WAIT but by the time we check 'if (mode == > SYNC_REP_NO_WAIT)', SyncRepWaitMode has changed to say > SYNC_REP_WAIT_FLUSH, if so, then we will wrongly return true from > here. Is that a possibility? ProcessConfigFile() is in the caller, and > thus we may end up using the wrong mode. > Yes it's possible for mode to change. In my comment to Amit in the other thread, I think we have to store mode and base our execution of this logic and ignore SyncRepWaitMode changing due to ProcesConfigFile/SIGHUP for one iteration. We can store the value of mode later, so something like: > if (SyncRepWaitMode == SYNC_REP_NO_WAIT) > return true; > mode = SyncRepWaitMode > if (lsn[mode] >= wait_for_lsn) > return true; But it's the same issue which is when you check lsn[mode], SyncRepWaitMode could have changed to something else, so you always have to initialize the value and will always have this discrepancy. I'm skeptical end users are changing SyncRepWaitMode in their database clusters as it has implications for their durability and I would assume they run with the same durability guarantees. Thanks, -- John Hsu - Amazon Web Services
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote: > IMO, every caller branching out in the code like if (rel->rd_tableam- > > > tuple_modify_buffer_insert != NULL) then multi insert; else single > insert; doesn't look good. IMO, the default implementation approach > keeps things simple which eventually can be removed in *near* future. > Thoughts? I believe we need the branching in the caller anyway: 1. If there is a BEFORE row trigger with a volatile function, the visibility rules[1] mean that the function should see changes from all the rows inserted so far this command, which won't work if they are still in the buffer. 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say that the function should see all previous rows inserted. 3. If there are volatile functions in the target list or WHERE clause, the same visibility semantics apply. 4. If there's a "RETURNING ctid" clause, we need to either come up with a way to return the tuples after flushing, or we need to use the single-tuple path. (Similarly in the future when we support UPDATE ... RETURNING, as Matthias pointed out.) If we need two paths in each caller anyway, it seems cleaner to just wrap the check for tuple_modify_buffer_insert in table_modify_buffer_enabled(). We could perhaps use a one path and then force a batch size of one or something, which is an alternative, but we have to be careful not to introduce a regression (and it still requires a solution for #4). Regards, Jeff Davis [1] https://www.postgresql.org/docs/devel/trigger-datachanges.html
Add parallel columns for pg_stat_statements
Hello, This patch was a bit discussed on [1], and with more details on [2]. It's based on another patch sent in 2022 (see [3]). It introduces seven new columns in pg_stat_statements: * parallelized_queries_planned, number of times the query has been planned to be parallelized, * parallelized_queries_launched, number of times the query has been executed with parallelization, * parallelized_workers_planned, number of parallel workers planned for this query, * parallelized_workers_launched, number of parallel workers executed for this query, * parallelized_nodes, number of parallelized nodes, * parallelized_nodes_all_workers, number of parallelized nodes which had all requested workers, * parallelized_nodes_no_worker, number of parallelized nodes which had no requested workers. As Benoit said yesterday, the intent is to help administrators evaluate the usage of parallel workers in their databases and help configuring parallelization usage. A test script (test2.sql) is attached. You can execute it with "psql -Xef test2.sql your_database" (your_database should not contain a t1 table as it will be dropped and recreated). Here is its result, a bit commented: CREATE EXTENSION IF NOT EXISTS pg_stat_statements; CREATE EXTENSION SELECT pg_stat_statements_reset(); pg_stat_statements_reset --- 2024-08-29 18:00:35.314557+02 (1 row) DROP TABLE IF EXISTS t1; DROP TABLE CREATE TABLE t1 (id integer); CREATE TABLE INSERT INTO t1 SELECT generate_series(1, 10_000_000); INSERT 0 1000 VACUUM ANALYZE t1; VACUUM SELECT query, parallelized_queries_planned, parallelized_queries_launched, parallelized_workers_planned, parallelized_workers_launched, parallelized_nodes, parallelized_nodes_all_workers, parallelized_nodes_no_worker FROM pg_stat_statements WHERE query LIKE 'SELECT%t1%' (0 rows) SELECT * FROM t1 LIMIT 1; id 1 (1 row) SELECT pg_sleep(1); SELECT query, parallelized_queries_planned, parallelized_queries_launched, parallelized_workers_planned, parallelized_workers_launched, parallelized_nodes, parallelized_nodes_all_workers, parallelized_nodes_no_worker FROM pg_stat_statements WHERE query LIKE 'SELECT%t1%' -[ RECORD 1 ]--+-- query | SELECT * FROM t1 LIMIT $1 parallelized_queries_planned | 0 parallelized_queries_launched | 0 parallelized_workers_planned | 0 parallelized_workers_launched | 0 parallelized_nodes | 0 parallelized_nodes_all_workers | 0 parallelized_nodes_no_worker | 0 ==> no parallelization SELECT count(*) FROM t1; count -- 1000 (1 row) SELECT pg_sleep(1); SELECT query, parallelized_queries_planned, parallelized_queries_launched, parallelized_workers_planned, parallelized_workers_launched, parallelized_nodes, parallelized_nodes_all_workers, parallelized_nodes_no_worker FROM pg_stat_statements WHERE query LIKE 'SELECT%t1%' -[ RECORD 1 ]--+-- query | SELECT count(*) FROM t1 parallelized_queries_planned | 1 parallelized_queries_launched | 1 parallelized_workers_planned | 2 parallelized_workers_launched | 2 parallelized_nodes | 1 parallelized_nodes_all_workers | 1 parallelized_nodes_no_worker | 0 -[ RECORD 2 ]--+-- query | SELECT * FROM t1 LIMIT $1 parallelized_queries_planned | 0 parallelized_queries_launched | 0 parallelized_workers_planned | 0 parallelized_workers_launched | 0 parallelized_nodes | 0 parallelized_nodes_all_workers | 0 parallelized_nodes_no_worker | 0 ==> one parallelized query ==> I have the default configuration, so 2 for max_parallel_worker_per_gather ==> hence two workers, with one node with all workers SET max_parallel_workers_per_gather TO 5; SET SELECT count(*) FROM t1; count -- 1000 (1 row) SELECT pg_sleep(1); SELECT query, parallelized_queries_planned, parallelized_queries_launched, parallelized_workers_planned, parallelized_workers_launched, parallelized_nodes, parallelized_nodes_all_workers, parallelized_nodes_no_worker FROM pg_stat_statements WHERE query LIKE 'SELECT%t1%' -[ RECORD 1 ]--+-- query | SELECT count(*) FROM t1 parallelized_queries_planned | 2 parallelized_queries_launched | 2 parallelized_workers_planned | 6 parallelized_workers_launched | 6 parallelized_nodes | 2 parallelized_nodes_all_workers | 2 parallelized_nodes_no_worker | 0 -[ RECORD 2 ]--+-- query | SELECT * FROM t1 LIMIT $1 parallelized_queries_planned | 0 parallelized_queries_launched | 0 parallelized_workers_planned | 0 parallelized_workers_launched | 0 parallelized_nodes | 0 parallelized_nodes_all_workers | 0 parallelized_nodes_no_worker | 0 ==> another parallelized query ==> with
Re: [PATCH] pg_stat_activity: make slow/hanging authentication more visible
On Sun, Jun 30, 2024 at 10:48 AM Noah Misch wrote:v > > Would anyone like me to be more aggressive, and create a pgstat entry > > as soon as we have the opening transaction? Or... as soon as a > > connection is made? > > All else being equal, I'd like backends to have one before taking any lmgr > lock or snapshot. v3-0003 pushes the pgstat creation as far back as I felt comfortable, right after the PGPROC registration by InitProcessPhase2(). That function does lock the ProcArray, but if it gets held forever due to some bug, you won't be able to use pg_stat_activity to debug it anyway. And with this ordering, pg_stat_get_activity() will be able to retrieve the proc entry by PID without a race. This approach ends up registering an early entry for more cases than the original patchset. For example, autovacuum and other background workers will now briefly get their own "authenticating" state, which seems like it could potentially confuse people. Should I rename the state, or am I overthinking it? > You could release the xmin before calling PAM or LDAP. If you've copied all > relevant catalog content to local memory, that's fine to do. I played with the xmin problem a little bit, but I've shelved it for now. There's probably a way to do that safely; I just don't understand enough about the invariants to do it. For example, there's a comment later on that says * We established a catalog snapshot while reading pg_authid and/or * pg_database; and I'm a little nervous about invalidating the snapshot halfway through that process. Even if PAM and LDAP don't rely on pg_authid or other shared catalogs today, shouldn't they be allowed to in the future, without being coupled to InitPostgres implementation order? And I don't think we can move the pg_database checks before authentication. As for the other patches, I'll ping Andrew about 0001, and 0004 remains in its original WIP state. Anyone excited about that wait event idea? Thanks! --Jacob v3-0002-Test-Cluster-let-background_psql-work-asynchronou.patch Description: Binary data v3-0001-BackgroundPsql-handle-empty-query-results.patch Description: Binary data v3-0003-pgstat-report-in-earlier-with-STATE_AUTHENTICATIN.patch Description: Binary data v3-0004-WIP-report-external-auth-calls-as-wait-events.patch Description: Binary data
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
On Sun, May 19, 2024 at 6:46 AM Noah Misch wrote: > > On Wed, May 15, 2024 at 02:56:54PM +0900, Masahiko Sawada wrote: > > > > How about extending amcheck to support GIN and GIst indexes so that it > > can detect potential data incompatibility due to changing 'char' to > > 'unsigned char'? I think these new tests would be useful also for > > users to check if they really need to reindex indexes due to such > > changes. Also we fix pg_trgm so that it uses 'unsigned char' in PG18. > > Users who upgraded to PG18 can run the new amcheck tests on the > > primary as well as the standby. > > If I were standardizing pg_trgm on one or the other notion of "char", I would > choose signed char, since I think it's still the majority. More broadly, I > see these options to fix pg_trgm: > > 1. Change to signed char. Every arm64 system needs to scan pg_trgm indexes. > 2. Change to unsigned char. Every x86 system needs to scan pg_trgm indexes. Even though it's true that signed char systems are the majority, it would not be acceptable to force the need to scan pg_trgm indexes on unsigned char systems. > 3. Offer both, as an upgrade path. For example, pg_trgm could have separate >operator classes gin_trgm_ops and gin_trgm_ops_unsigned. Running >pg_upgrade on an unsigned-char system would automatically map v17 >gin_trgm_ops to v18 gin_trgm_ops_unsigned. This avoids penalizing any >architecture with upgrade-time scans. Very interesting idea. How can new v18 users use the correct operator class? I don't want to require users to specify the correct signed or unsigned operator classes when creating a GIN index. Maybe we need to dynamically use the correct compare function for the same operator class depending on the char signedness. But is it possible to do it on the extension (e.g. pg_trgm) side? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables.
Richard Guo writes: > On Thu, Aug 29, 2024 at 4:47 AM Tom Lane wrote: >> In the meantime, I think this test case is mighty artificial, >> and it wouldn't bother me any to just take it out again for the >> time being. > Yeah, I think we can remove the 't1.two+t2.two' test case if we go > with your proposed patch to address this performance regression. Here's a polished-up patchset for that. I made the memoize test removal a separate patch because (a) it only applies to master and (b) it seems worth calling out as something we might be able to revert later. I found one bug in the draft patch: add_nulling_relids only processes Vars of level zero, so we have to apply it before not after adjusting the Vars' levelsup. An alternative could be to add a levelsup parameter to add_nulling_relids, but that seemed like unnecessary complication. regards, tom lane From 8fafdc4852b8f2164286e6863219eb6b4d267639 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Thu, 29 Aug 2024 16:25:23 -0400 Subject: [PATCH v1 1/2] Remove one memoize test case added by commit 069d0ff02. This test case turns out to depend on the assumption that a non-Var subquery output that's underneath an outer join will always get wrapped in a PlaceHolderVar. But that behavior causes performance regressions in some cases compared to what happened before v16. The next commit will avoid inserting a PHV in the same cases where pre-v16 did, and that causes get_memoized_path to not detect that a memoize plan could be used. Commit this separately, in hopes that we can restore the test after making get_memoized_path smarter. (It's failing to find memoize plans in adjacent cases where no PHV was ever inserted, so there is definitely room for improvement there.) Discussion: https://postgr.es/m/cag1ps1xvntzcekk24oufmklpvdp2vjt-d+f2aocwbw_v3ke...@mail.gmail.com --- src/test/regress/expected/memoize.out | 30 --- src/test/regress/sql/memoize.sql | 11 -- 2 files changed, 41 deletions(-) diff --git a/src/test/regress/expected/memoize.out b/src/test/regress/expected/memoize.out index df2ca5ba4e..9ee09fe2f5 100644 --- a/src/test/regress/expected/memoize.out +++ b/src/test/regress/expected/memoize.out @@ -160,36 +160,6 @@ WHERE s.c1 = s.c2 AND t1.unique1 < 1000; 1000 | 9.5000 (1 row) --- Try with LATERAL references within PlaceHolderVars -SELECT explain_memoize(' -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false); - explain_memoize --- - Aggregate (actual rows=1 loops=N) - -> Nested Loop (actual rows=1000 loops=N) - -> Seq Scan on tenk1 t1 (actual rows=1000 loops=N) - Filter: (unique1 < 1000) - Rows Removed by Filter: 9000 - -> Memoize (actual rows=1 loops=N) - Cache Key: t1.two - Cache Mode: binary - Hits: 998 Misses: 2 Evictions: Zero Overflows: 0 Memory Usage: NkB - -> Seq Scan on tenk1 t2 (actual rows=1 loops=N) - Filter: ((t1.two + two) = unique1) - Rows Removed by Filter: -(12 rows) - --- And check we get the expected results. -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000; - count |avg + - 1000 | 9. -(1 row) - -- Ensure we do not omit the cache keys from PlaceHolderVars SELECT explain_memoize(' SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN diff --git a/src/test/regress/sql/memoize.sql b/src/test/regress/sql/memoize.sql index 059bec5f4f..2eaeb1477a 100644 --- a/src/test/regress/sql/memoize.sql +++ b/src/test/regress/sql/memoize.sql @@ -85,17 +85,6 @@ SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN LATERAL (SELECT t1.two+1 AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE WHERE s.c1 = s.c2 AND t1.unique1 < 1000; --- Try with LATERAL references within PlaceHolderVars -SELECT explain_memoize(' -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000;', false); - --- And check we get the expected results. -SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -LATERAL (SELECT t1.two+t2.two AS c1, t2.unique1 AS c2 FROM tenk1 t2) s ON TRUE -WHERE s.c1 = s.c2 AND t1.unique1 < 1000; - -- Ensure we do not omit the cache keys from PlaceHolderVars SELECT explain_memoize(' SELECT COUNT(*), AVG(t1.twenty) FROM tenk1 t1 LEFT JOIN -- 2.43.5 From 7b48394838797489e7ab869f97ca06449fdbcee3 Mon Sep 17 00:00:00 2001 F
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 2024-08-29 Th 1:01 PM, Mark Murawski wrote: On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? It's not the same as the trigger data case because the function name is knowable at compile time, as in fact you have demonstrated. You just find it a bit inconvenient to have to code for that knowledge. By contrast, trigger data is ONLY knowable at run time. But I don't see that it is such a heavy burden to have to write my $funcname = "foo"; or similar in your function. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Switching XLog source from archive to streaming when primary available
Hi, On Thu, Aug 29, 2024 at 6:32 AM Bharath Rupireddy wrote: > In synchronous replication setup, until standby finishes fetching WAL > from the archive, the commits on the primary have to wait which can > increase the query latency. If the standby can connect to the primary > as soon as the broken connection is restored, it can fetch the WAL > soon and transaction commits can continue on the primary. Is my > understanding correct? Is there anything more to this? > Yup, if you're running with synchronous_commit = 'on' with synchronous_replicas, then you can have the replica continue streaming changes into pg_wal faster than WAL replay so commits may be unblocked faster. > I talked to Michael Paquier at PGConf.Dev 2024 and got some concerns > about this feature for dealing with changing timelines. I can't think > of them right now. I'm not sure what the risk would be if the WAL/history files we sync from streaming is the same as we replay from archive. > And, there were some cautions raised upthread - > https://www.postgresql.org/message-id/20240305020452.GA3373526%40nathanxps13 > and https://www.postgresql.org/message-id/ZffaQt7UbM2Q9kYh%40paquier.xyz. Yup agreed. I need to understand this area a lot better before I can do a more in-depth review. > Interesting. Yes, the restore script has to be smarter to detect the > broken connections and distinguish whether the server is performing > just the archive recovery/PITR or streaming from standby. Not doing it > right, perhaps, can cause data loss (?). I don't think there would be data-loss, only replay is stuck/slows down. It wouldn't be any different today if the restore-script returned a non-zero exit status. The end-user could configure their restore-script to return a non-zero status, based on some condition, to move to streaming. > > > However, > > > + * exhaust all the WAL present in pg_wal before switching. If successful, > > > + * the state machine moves to XLOG_FROM_STREAM state, otherwise it falls > > > + * back to XLOG_FROM_ARCHIVE state. > > > > I think I'm missing how this happens. Or what "successful" means. If I'm > > reading > > it right, no matter what happens we will always move to > > XLOG_FROM_STREAM based on how > > the state machine works? > > Please have a look at some discussion upthread on exhausting pg_wal > before switching - > https://www.postgresql.org/message-id/20230119005014.GA3838170%40nathanxps13. > Even today, the standby exhausts pg_wal before switching to streaming > from the archive. > I'm getting caught on the word "successful". My rough understanding of WaitForWALToBecomeAvailable is that once you're in XLOG_FROM_PG_WAL, if it was unsuccessful for whatever reason, it will still transition to XLOG_FROM_STREAMING. It does not loop back to XLOG_FROM_ARCHIVE if XLOG_FROM_PG_WAL fails. > Nice catch. This is a problem. One idea is to disable > streaming_replication_retry_interval feature for slot-less streaming > replication - either when primary_slot_name isn't specified disallow > the GUC to be set in assign_hook or when deciding to switch the wal > source. Thoughts? I don't think it's dependent on slot-less streaming. You would also run into the issue if the WAL is no longer there on the primary, which can occur with 'max_slot_wal_keep_size' as well. IMO the guarantee we need to make is that when we transition from XLOG_FROM_STREAMING to XLOG_FROM_ARCHIVE for a "fresh start", we should attempt to restore from archive at least once. I think this means that wal_source_switch_state should be reset back to SWITCH_TO_STREAMING_NONE whenever we transition to XLOG_FROM_ARCHIVE. We've attempted the switch to streaming once, so let's not continually re-try if it failed. Thanks, -- John Hsu - Amazon Web Services
Re: pl/pgperl Patch for adding $_FN detail just like triggers have for $_TD
On 8/29/24 16:54, Andrew Dunstan wrote: On 2024-08-29 Th 1:01 PM, Mark Murawski wrote: On 8/29/24 11:56, Andrew Dunstan wrote: On 2024-08-28 We 5:53 PM, Mark Murawski wrote: Hi Hackers! This would be version v1 of this feature Basically, the subject says it all: pl/pgperl Patch for being able to tell which function you're in. This is a hashref so it will be possible to populate new and exciting other details in the future as the need arises This also greatly improves logging capabilities for things like catching warnings, Because as it stands right now, there's no information that can assist with locating the source of a warning like this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $prefix in concatenation (.) or string at (eval 531) line 48. Now, with $_FN you can do this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use warnings; use strict; use Data::Dumper; $SIG{__WARN__} = sub { elog(NOTICE, Dumper($_FN)); print STDERR "In Function: $_FN->{name}: $_[0]\n"; }; my $a; print "$a"; # uninit! return undef; $function$ ; This patch is against 12 which is still our production branch. This could easily be also patched against newer releases as well. I've been using this code in production now for about 3 years, it's greatly helped track down issues. And there shouldn't be anything platform-specific here, it's all regular perl API I'm not sure about adding testing. This is my first postgres patch, so any guidance on adding regression testing would be appreciated. The rationale for this has come from the need to know the source function name, and we've typically resorted to things like this in the past: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $function_name = 'throw_warning'; $SIG{__WARN__} = sub { print STDERR "In Function: $function_name: $_[0]\n"; } $function$ ; We've literally had to copy/paste this all over and it's something that postgres should just 'give you' since it knows the name already, just like when triggers pass you $_TD with all the pertinent information A wishlist item would be for postgres plperl to automatically prepend the function name and schema when throwing perl warnings so you don't have to do your own __WARN__ handler, but this is the next best thing. I'm not necessarily opposed to this, but the analogy to $_TD isn't really apt. You can't know the trigger data at compile time, whereas you can know the function's name at compile time, using just the mechanism you find irksome. And if we're going to do it for plperl, shouldn't we do it for other PLs? Hi Andrew, Thanks for the feedback. 1) Why is this not similar to _TD? It literally operates identically. At run-time it passes you $_TD for triggers. Same her for functions. This is all run-time. What exactly is the issue you're trying to point out? It's not the same as the trigger data case because the function name is knowable at compile time, as in fact you have demonstrated. You just find it a bit inconvenient to have to code for that knowledge. By contrast, trigger data is ONLY knowable at run time. But I don't see that it is such a heavy burden to have to write my $funcname = "foo"; or similar in your function. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com Understood, regarding knowability. Trigger data is definitely going to be very dynamic in that regard. No, It's not a heavy burden to hard code the function name. But what my ideal goal would be is this: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ use 'PostgresWarnHandler'; # <--- imagine this registers a WARN handler and outputs $_FN->{name} for you as part of the final warning my $a; print $a; etc and then there's nothing 'hard coded' regarding the name of the function, anywhere. It just seems nonsensical that postgres plperl can't send you the name of your registered function and there's absolutely no way to get it other than hard coding the name (redundantly, considering you're already know the name when you're defining the function in the first place) But even better would be catching the warn at the plperl level, prepending the function name to the warn message, and then you only need: CREATE OR REPLACE FUNCTION throw_warning() RETURNS text LANGUAGE plperlu AS $function$ my $a; print $a; etc And then in this hypothetical situation -- magic ensues, and you're left with this: # tail -f /var/log/postgresql.log *** GOT A WARNING - Use of uninitialized value $a in concatenation (.) or string in function public.throw_warning() line 1
Re: [PATCH] Add CANONICAL option to xmlserialize
On 29.08.24 20:50, Pavel Stehule wrote: > > I know, but theoretically, there can be some benefit for CANONICAL if > pg supports bytea there. Lot of databases still use non utf8 encoding. > > It is a more theoretical question - if pg supports different types > there in future (because SQL/XML or Oracle), then CANONICAL can be > used without limit, I like the idea of extending the feature to support bytea. I can definitely take a look at it, but perhaps in another patch? This change would most likely involve transformXmlSerialize in parse_expr.c, and I'm not sure of the impact in other usages of XMLSERIALIZE. > or CANONICAL can be used just for text? And you are sure, so you can > compare text X text, instead xml X xml? Yes, currently it only supports varchar or text - and their cousins. The idea is to format the xml and serialize it as text in a way that they can compared based on their content, independently of how they were written, e.g '' is equal to ''. > > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) = > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM > xmltest_serialize; > + ?column? > +-- > + t > + t > +(2 rows) > > Maybe I am a little bit confused by these regress tests, because at > the end it is not too useful - you compare two identical XML, and WITH > COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search > for a sense of this test. Better to use really different documents > (columns) instead. Yeah, I can see that it's confusing. In this example I actually just wanted to test that the default option of CANONICAL is CANONICAL WITH COMMENTS, even if you don't mention it. In the docs I mentioned it like this: "The optional parameters WITH COMMENTS (which is the default) or WITH NO COMMENTS, respectively, keep or remove XML comments from the given document." Perhaps I should rephrase it? Or maybe a comment in the regression tests would suffice? Thanks a lot for the input! -- Jim
Re: [BUG?] XMLSERIALIZE( ... INDENT) won't work with blank nodes
On 28.08.24 10:19, Jim Jones wrote: > Hi, > > While testing a feature reported by Pavel in this thread[1] I realized > that elements containing whitespaces between them won't be indented with > XMLSERIALIZE( ... INDENT) > > SELECT xmlserialize(DOCUMENT '42' AS text INDENT); > > xmlserialize > - > + > 42+ > + > > (1 row) > > SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text > INDENT); > xmlserialize > > 42 + > > (1 row) > > > Other products have a different approach[2] > > Perhaps simply setting xmltotext_with_options' parameter > "perserve_whitespace" to false when XMLSERIALIZE(.. INDENT) would do the > trick. > > doc = xml_parse(data, xmloption_arg, !indent ? true : false, > GetDatabaseEncoding(), > &parsed_xmloptiontype, &content_nodes, > (Node *) &escontext); > > > (diff attached) > > SELECT xmlserialize(DOCUMENT ' 42 '::xml AS text > INDENT); > xmlserialize > - >+ >42+ > + > > (1 row) > > If this is indeed the way to go I can update the regression tests accordingly. > > Best, > Just created a CF entry for this: https://commitfest.postgresql.org/49/5217/ v1 attached includes regression tests. -- Jim From e474cdc457a91e96432f5a14c69b4ecbc0345579 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Thu, 29 Aug 2024 19:41:02 +0200 Subject: [PATCH v1] Bug fix for XMLSERIALIZE(...INDENT) for xml containing blank nodes This fixes a bug that let XMLSERIALIZE(... INDENT) to ignore blank nodes. It basically sets xml_parse's parameter 'preserve_whitespace' to false if the INDENT flag was used in XMLSERIALIZE. New regression tests are also included. --- src/backend/utils/adt/xml.c | 10 -- src/test/regress/expected/xml.out | 19 +++ src/test/regress/expected/xml_1.out | 11 +++ src/test/regress/expected/xml_2.out | 19 +++ src/test/regress/sql/xml.sql| 3 +++ 5 files changed, 60 insertions(+), 2 deletions(-) diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 447e72b21e..1cd4929870 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -677,8 +677,14 @@ xmltotext_with_options(xmltype *data, XmlOptionType xmloption_arg, bool indent) } #ifdef USE_LIBXML - /* Parse the input according to the xmloption */ - doc = xml_parse(data, xmloption_arg, true, GetDatabaseEncoding(), + /* + * Parse the input according to the xmloption + * preserve_whitespace is set to false in case the function should + * return an indented xml, otherwise libxml2 will ignore the elements + * that contain whitespaces between them. + */ + doc = xml_parse(data, xmloption_arg, !indent ? true : false, + GetDatabaseEncoding(), &parsed_xmloptiontype, &content_nodes, (Node *) &escontext); if (doc == NULL || escontext.error_occurred) diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out index 93a79cda8f..6f073101a1 100644 --- a/src/test/regress/expected/xml.out +++ b/src/test/regress/expected/xml.out @@ -663,6 +663,25 @@ SELECT xmlserialize(CONTENT '42' AS text t (1 row) +-- indent xml strings containing blank nodes +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); + xmlserialize +-- ++ + + + + + +(1 row) + +SELECT xmlserialize(CONTENT 'text node ' AS text INDENT); + xmlserialize +-- + text node + ++ + + + +(1 row) + SELECT xml 'bar' IS DOCUMENT; ?column? -- diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out index 9323b84ae2..d26e10441e 100644 --- a/src/test/regress/expected/xml_1.out +++ b/src/test/regress/expected/xml_1.out @@ -443,6 +443,17 @@ ERROR: unsupported XML feature LINE 1: SELECT xmlserialize(CONTENT '42<... ^ DETAIL: This functionality requires the server to be built with libxml support. +-- indent xml strings containing blank nodes +SELECT xmlserialize(DOCUMENT ' ' AS text INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(DOCUMENT ' '... + ^ +DETAIL: This functionality requires the server to be built with libxml support. +SELECT xmlserialize(CONTENT 'text node ' AS text INDENT); +ERROR: unsupported XML feature +LINE 1: SELECT xmlserialize(CONTENT 'text node ... + ^ +DETAIL: This functionality requires the server to be built with libxml support. SELECT xml 'bar' IS DOCUMENT; ERROR: unsupported XML feature LINE 1: SELECT xml 'bar' IS DOCUMENT; diff --git a/src/test/regress/expected/xml_2.out b/src/test/regress/expected/xml_2.out index f956322c69..7b154da4ba 100644 --- a/src/test/regress/expected/xml_2.out +++ b/src/test/regress/expected/xml_2.out @@ -649,6 +649,25 @@ S
Re: allowing extensions to control planner behavior
On Wed, 2024-08-28 at 19:25 -0400, Tom Lane wrote: > This does not seem remarkably problematic to me, given Robert's > proposal of a bitmask of allowed plan types per RelOptInfo. > You just do something like > > rel->allowed_plan_types = DESIRED_PLAN_TYPE; > > The names of the bits you aren't setting are irrelevant to you. I don't see that in the code yet, so I assume you are referring to the comment at [1]? I still like my idea to generalize the pathkey infrastructure, and Robert asked for other approaches to consider. It would allow us to hold onto multiple paths for longer, similar to pathkeys, which might offer some benefits or simplifications. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/CA+TgmoZQyVxnRU--4g2bJonJ8RyJqNi2CHpy-=nwwbtnpaj...@mail.gmail.com
Re: Removing log_cnt from pg_sequence_read_tuple()
On Thu, Aug 29, 2024 at 09:28:49AM -0500, Nathan Bossart wrote: > On Thu, Aug 29, 2024 at 02:11:22PM +0900, Michael Paquier wrote: > > Okay, here is a v2 of the patch using this name for the function. > > LGTM Thanks, applied that, after one tweak for the #define name. -- Michael signature.asc Description: PGP signature
Re: Make printtup a bit faster
David Rowley writes: Hello David, >> My high level proposal is define a type specific print function like: >> >> oidprint(Datum datum, StringInfo buf) >> textprint(Datum datum, StringInfo buf) > > I think what we should do instead is make the output functions take a > StringInfo and just pass it the StringInfo where we'd like the bytes > written. > > That of course would require rewriting all the output functions for > all the built-in types, so not a small task. Extensions make that job > harder. I don't think it would be good to force extensions to rewrite > their output functions, so perhaps some wrapper function could help us > align the APIs for extensions that have not been converted yet. I have the similar concern as Tom that this method looks too aggressive. That's why I said: "If a type's print function is not defined, we can still using the out function." AND "Hard coding the relationship between [common] used type and {type}print function OID looks not cool, Adding a new attribute in pg_type looks too aggressive however. Anyway this is the next topic to talk about." What would be the extra benefit we redesign all the out functions? > There's a similar problem with input functions not having knowledge of > the input length. You only have to look at textin() to see how useful > that could be. Fixing that would probably make COPY FROM horrendously > faster. Team that up with SIMD for the delimiter char search and COPY > go a bit better still. Neil Conway did propose the SIMD part in [1], > but it's just not nearly as good as it could be when having to still > perform the strlen() calls. OK, I think I can understand the needs to make in-function knows the input length and good to know the SIMD part for delimiter char search. strlen looks like a delimiter char search ('\0') as well. Not sure if "strlen" has been implemented with SIMD part, but if not, why? > I had planned to work on this for PG18, but I'd be happy for some > assistance if you're willing. I see you did many amazing work with cache-line-frindly data struct design, branch predition optimization and SIMD optimization. I'd like to try one myself. I'm not sure if I can meet the target, what if we handle the out/in function separately (can be by different people)? -- Best Regards Andy Fan
Re: Make printtup a bit faster
On Fri, 30 Aug 2024 at 03:33, Tom Lane wrote: > > David Rowley writes: > > [ redesign I/O function APIs ] > > I had planned to work on this for PG18, but I'd be happy for some > > assistance if you're willing. > > I'm skeptical that such a thing will ever be practical. To avoid > breaking un-converted data types, all the call sites would have to > support both old and new APIs. To avoid breaking non-core callers, > all the I/O functions would have to support both old and new APIs. > That probably adds enough overhead to negate whatever benefit you'd > get. Scepticism is certainly good when it comes to such a large API change. I don't want to argue with you, but I'd like to state a few things about why I think you're wrong on this... So, we currently return cstrings in our output functions. Let's take jsonb_out() as an example, to build that cstring, we make a *new* StringInfoData on *every call* inside JsonbToCStringWorker(). That gives you 1024 bytes before you need to enlarge it. However, it's maybe not all bad as we have some size estimations there to call enlargeStringInfo(), only that's a bit wasteful as it does a repalloc() which memcpys the freshly allocated 1024 bytes allocated in initStringInfo() when it doesn't yet contain any data. After jsonb_out() has returned and we have the cstring, only we forgot the length of the string, so most places will immediately call strlen() or do that indirectly via appendStringInfoString(). For larger JSON documents, that'll likely require pulling cachelines back into L1 again. I don't know how modern CPU cacheline eviction works, but if it was as simple as FIFO then the strlen() would flush all those cachelines only for memcpy() to have to read them back again for output strings larger than L1. If we rewrote all of core's output functions to use the new API, then the branch to test the function signature would be perfectly predictable and amount to an extra cmp and jne/je opcode. So, I just don't agree with the overheads negating the benefits comment. You're probably off by 1 order of magnitude at the minimum and for medium/large varlena types likely 2-3+ orders. Even a simple int4out requires a palloc()/memcpy. If we were outputting lots of data, e.g. in a COPY operation, the output buffer would seldom need to be enlarged as it would quickly adjust to the correct size. For the input functions, the possible gains are extensive too. textin() is a good example, it uses cstring_to_text(), but could be changed to use cstring_to_text_with_len(). Knowing the input string length also opens the door to SIMD. Take int4in() as an example, if pg_strtoint32_safe() knew its input length there are a bunch of prechecks that could be done with either 64-bit SWAR or with SIMD. For example, if you knew you had an 8-char string of decimal digits then converting that to an int32 is quite cheap. It's impossible to overflow an int32 with 8 decimal digits, so no overflow checks need to be done until there are at least 10 decimal digits. ca6fde922 seems like good enough example of the possible gains of SIMD vs byte-at-a-time processing. I saw some queries go 4x faster there and that was me trying to keep the JSON document sizes realistic. byte-at-a-time is just not enough to saturate RAM speed. Take DDR5, for example, Wikipedia says it has a bandwidth of 32–64 GB/s, so unless we discover room temperature superconductors, we're not going to see any massive jump in clock speeds any time soon, and with 5 or 6Ghz CPUs, there's just no way to get anywhere near that bandwidth by processing byte-at-a-time. For some sort of nieve strcpy() type function, you're going to need at least a cmp and mov, even if those were latency=1 (which they're not, see [1]), you can only do 2.5 billion of those two per second on a 5Ghz processor. I've done tested, but hypothetically (assuming latency=1) that amounts to processing 2.5GB/s, i.e. a long way from DDR5 RAM speed and that's not taking into account having to increment pointers to the next byte on each loop. David [1] https://www.agner.org/optimize/instruction_tables.pdf
Re: Make printtup a bit faster
On Fri, 30 Aug 2024 at 12:10, Andy Fan wrote: > What would be the extra benefit we redesign all the out functions? If I've understood your proposal correctly, it sounds like you want to invent a new "print" output function for each type to output the Datum onto a StringInfo, if that's the case, what would be the point of having both versions? If there's anywhere we call output functions where the resulting value isn't directly appended to a StringInfo, then we could just use a temporary StringInfo to obtain the cstring and its length. David
Re: Make printtup a bit faster
David Rowley writes: > On Fri, 30 Aug 2024 at 12:10, Andy Fan wrote: >> What would be the extra benefit we redesign all the out functions? > > If I've understood your proposal correctly, it sounds like you want to > invent a new "print" output function for each type to output the Datum > onto a StringInfo, Mostly yes, but not for [each type at once], just for the [common used type], like int2/4/8, float4/8, date/time/timestamp, text/.. and so on. > if that's the case, what would be the point of having both versions? The biggest benefit would be compatibility. In my opinion, print function (not need to be in pg_type at all) is as an optimization and optional, in some performance critical path we can replace the out-function with printfunction, like (printtup). if such performance-critical path find a type without a print-function is defined, just keep the old way. Kind of like supportfunction for proc, this is for data type? Within this way, changes would be much smaller and step-by-step. > If there's anywhere we call output functions > where the resulting value isn't directly appended to a StringInfo, > then we could just use a temporary StringInfo to obtain the cstring > and its length. I think this is true, but it requests some caller's code change. -- Best Regards Andy Fan
Re: Make printtup a bit faster
On Fri, 30 Aug 2024 at 13:04, Andy Fan wrote: > > David Rowley writes: > > If there's anywhere we call output functions > > where the resulting value isn't directly appended to a StringInfo, > > then we could just use a temporary StringInfo to obtain the cstring > > and its length. > > I think this is true, but it requests some caller's code change. Yeah, calling code would need to be changed to take advantage of the new API, however, the differences in which types support which API could be hidden inside OutputFunctionCall(). That function could just fake up a StringInfo for any types that only support the old cstring API. That means we don't need to add handling for both cases everywhere we need to call the output function. It's possible that could make some operations slightly slower when only the old API is available, but then maybe not as we do now have read-only StringInfos. Maybe the StringInfoData.data field could just be set to point to the given cstring using initReadOnlyStringInfo() rather than doing appendBinaryStringInfo() onto yet another buffer for the old API. David
Re: Make printtup a bit faster
David Rowley writes: > On Fri, 30 Aug 2024 at 13:04, Andy Fan wrote: >> >> David Rowley writes: >> > If there's anywhere we call output functions >> > where the resulting value isn't directly appended to a StringInfo, >> > then we could just use a temporary StringInfo to obtain the cstring >> > and its length. >> >> I think this is true, but it requests some caller's code change. > > Yeah, calling code would need to be changed to take advantage of the > new API, however, the differences in which types support which API > could be hidden inside OutputFunctionCall(). That function could just > fake up a StringInfo for any types that only support the old cstring > API. That means we don't need to add handling for both cases > everywhere we need to call the output function. We can do this, then the printtup case (stands for some performance crital path) still need to change discard OutputFunctionCall() since it uses the fake StringInfo then a memcpy is needed again IIUC. Besides above, my major concerns about your proposal need to change [all the type's outfunction at once] which is too aggresive for me. In the fresh setup without any extension is created, "SELECT count(*) FROM pg_type" returns 627 already, So other piece of my previous reply is more important to me. It is great that both of us feeling the current stategy is not good for performance:) -- Best Regards Andy Fan
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, here are some review comments for patch v43-0001. == Commit message 1. ... introduces a GUC allowing users set inactive timeout. ~ 1a. You should give the name of the new GUC in the commit message. 1b. /set/to set/ == doc/src/sgml/config.sgml GUC "replication_slot_inactive_timeout" 2. Invalidates replication slots that are inactive for longer than specified amount of time nit - suggest use similar wording as the prior GUC (wal_sender_timeout): Invalidate replication slots that are inactive for longer than this amount of time. ~ 3. This invalidation check happens either when the slot is acquired for use or during a checkpoint. The time since the slot has become inactive is known from its inactive_since value using which the timeout is measured. nit - the wording is too complicated. suggest: The timeout check occurs when the slot is next acquired for use, or during a checkpoint. The slot's 'inactive_since' field value is when the slot became inactive. ~ 4. Note that the inactive timeout invalidation mechanism is not applicable for slots on the standby that are being synced from a primary server (whose synced field is true). nit - that word "whose" seems ambiguous. suggest: (e.g. the standby slot has 'synced' field true). == doc/src/sgml/system-views.sgml 5. inactive_timeout means that the slot has been inactive for the duration specified by replication_slot_inactive_timeout parameter. nit - suggestion ("longer than"): ... the slot has been inactive for longer than the duration specified by the replication_slot_inactive_timeout parameter. == src/backend/replication/slot.c 6. /* Maximum number of invalidation causes */ -#define RS_INVAL_MAX_CAUSES RS_INVAL_WAL_LEVEL +#define RS_INVAL_MAX_CAUSES RS_INVAL_INACTIVE_TIMEOUT IMO this #define belongs in the slot.h, immediately below where the enum is defined. ~~~ 7. ReplicationSlotAcquire: I had a fundamental question about this logic. IIUC the purpose of the patch was to invalidate replication slots that have been inactive for too long. So, it makes sense to me that some periodic processing (e.g. CheckPointReplicationSlots) might do a sweep over all the slots, and invalidate the too-long-inactive ones that it finds. OTOH, it seemed quite strange to me that the patch logic is also detecting and invalidating inactive slots during the ReplicationSlotAcquire function. This is kind of saying "ERROR - sorry, because this was inactive for too long you can't have it" at the very moment that you wanted to use it again! IIUC such a slot would be invalidated by the function InvalidatePossiblyObsoleteSlot(), but therein lies my doubt -- how can the slot be considered as "obsolete" when we are in the very act of trying to acquire/use it? I guess it might be argued this is not so different to the scenario of attempting to acquire a slot that had been invalidated momentarily before during checkpoint processing. But, somehow that scenario seems more like bad luck to me, versus ReplicationSlotAcquire() deliberately invalidating something we *know* is wanted. ~ 8. + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("can no longer get changes from replication slot \"%s\"", + NameStr(s->data.name)), + errdetail("This slot has been invalidated because it was inactive since %s for more than %d seconds specified by \"replication_slot_inactive_timeout\".", +timestamptz_to_str(s->inactive_since), +replication_slot_inactive_timeout))); nit - IMO the info should be split into errdetail + errhint. Like this: errdetail("The slot became invalid because it was inactive since %s, which is more than %d seconds ago."...) errhint("You might need to increase \"%s\".", "replication_slot_inactive_timeout") ~~~ 9. ReportSlotInvalidation + appendStringInfo(&err_detail, + _("The slot has been inactive since %s for more than %d seconds specified by \"replication_slot_inactive_timeout\"."), + timestamptz_to_str(inactive_since), + replication_slot_inactive_timeout); + break; IMO this error in ReportSlotInvalidation() should be the same as the other one from ReplicationSlotAcquire(), which I suggested above (comment #8) should include a hint. Also, including a hint here will make this new message consistent with the other errhint (for "max_slot_wal_keep_size") that is already in this function. ~~~ 10. InvalidatePossiblyObsoleteSlot + if (cause == RS_INVAL_INACTIVE_TIMEOUT && + (replication_slot_inactive_timeout > 0 && + s->inactive_since > 0 && + !(RecoveryInProgress() && s->data.synced))) 10a. Everything here is && so this has some redundant parentheses. 10b. Actually, IMO this complicated condition is overkill. Won't it be better to just unconditionally assign now = GetCurrentTimestamp(); here? ~ 11. + * Note that we don't invalidate synced slots because, + * they are typically considered not active as they don't + * perform logical decoding to produce the changes. nit - tweaked punctuation ~ 12. + * If the
JIT: Remove some unnecessary instructions.
Hi hackers, I find there are some unnecessary load/store instructions being emitted by the JIT compiler. E.g., ``` v_boolnull = l_load(b, TypeStorageBool, v_resnullp, ""); v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, ""); /* set resnull to boolnull */ LLVMBuildStore(b, v_boolnull, v_resnullp); /* set revalue to boolvalue */ LLVMBuildStore(b, v_boolvalue, v_resvaluep); ``` v_boolnull is loaded from v_resnullp and stored to v_resnullp immediately. v_boolvalue is loaded from v_resvaluep and stored to v_resvaluep immediately. The attached patch is trying to fix them. Best Regards, Xing From 6c7503b9e92c10b3dd9e6d33391273b854d53437 Mon Sep 17 00:00:00 2001 From: Xing Guo Date: Fri, 30 Aug 2024 11:43:45 +0800 Subject: [PATCH] JIT: Remove some unnecessary instructions. There're some unnecessary instructions being emitted by the JIT compiler. This patch removes them. --- src/backend/jit/llvm/llvmjit_expr.c | 15 +-- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index 48ccdb942a..1edc476c59 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -720,11 +720,6 @@ llvm_compile_expr(ExprState *state) v_boolnull = l_load(b, TypeStorageBool, v_resnullp, ""); v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, ""); - /* set resnull to boolnull */ - LLVMBuildStore(b, v_boolnull, v_resnullp); - /* set revalue to boolvalue */ - LLVMBuildStore(b, v_boolvalue, v_resvaluep); - /* check if current input is NULL */ LLVMBuildCondBr(b, LLVMBuildICmp(b, LLVMIntEQ, v_boolnull, @@ -816,11 +811,6 @@ llvm_compile_expr(ExprState *state) v_boolnull = l_load(b, TypeStorageBool, v_resnullp, ""); v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, ""); - /* set resnull to boolnull */ - LLVMBuildStore(b, v_boolnull, v_resnullp); - /* set revalue to boolvalue */ - LLVMBuildStore(b, v_boolvalue, v_resvaluep); - LLVMBuildCondBr(b, LLVMBuildICmp(b, LLVMIntEQ, v_boolnull, l_sbool_const(1), ""), @@ -875,10 +865,8 @@ llvm_compile_expr(ExprState *state) case EEOP_BOOL_NOT_STEP: { LLVMValueRef v_boolvalue; - LLVMValueRef v_boolnull; LLVMValueRef v_negbool; - v_boolnull = l_load(b, TypeStorageBool, v_resnullp, ""); v_boolvalue = l_load(b, TypeSizeT, v_resvaluep, ""); v_negbool = LLVMBuildZExt(b, @@ -887,8 +875,7 @@ llvm_compile_expr(ExprState *state) l_sizet_const(0), ""), TypeSizeT, ""); - /* set resnull to boolnull */ - LLVMBuildStore(b, v_boolnull, v_resnullp); + /* set revalue to !boolvalue */ LLVMBuildStore(b, v_negbool, v_resvaluep); -- 2.46.0
Re: configure failures on chipmunk
Hello Heikki, 26.08.2024 07:00, Alexander Lakhin wrote: Could you also take a look at the kerberos setup on chipmunk? Now that chipmunk goes beyond configure (on HEAD and REL_17_STABLE), it fails on the kerberosCheck stage, ... I see that chipmunk turned green. Thank you for taking care of that! Best regards, Alexander
Re: Collect statistics about conflicts in logical replication
On Thu, Aug 29, 2024 at 11:06 AM Zhijie Hou (Fujitsu) wrote: > > > Agreed. Here is new version patch which change the names as suggested. I also > rebased the patch based on another renaming commit 640178c9. > Thanks for the patch. Few minor things: 1) conflict.h: * This enum is used in statistics collection (see * PgStat_StatSubEntry::conflict_count) as well, therefore, when adding new * values or reordering existing ones, ensure to review and potentially adjust * the corresponding statistics collection codes. --We shall mention PgStat_BackendSubEntry as well. 026_stats.pl: 2) # Now that the table is empty, the # update conflict (update_existing) ERRORs will stop happening. --Shall it be update_exists instead of update_existing here: 3) This is an existing comment above insert_exists conflict capture: # Wait for the apply error to be reported. --Shall we change to: # Wait for the subscriber to report apply error and insert_exists conflict. thanks Shveta
Re: [PATCH] Add CANONICAL option to xmlserialize
čt 29. 8. 2024 v 23:54 odesílatel Jim Jones napsal: > > > On 29.08.24 20:50, Pavel Stehule wrote: > > > > I know, but theoretically, there can be some benefit for CANONICAL if > > pg supports bytea there. Lot of databases still use non utf8 encoding. > > > > It is a more theoretical question - if pg supports different types > > there in future (because SQL/XML or Oracle), then CANONICAL can be > > used without limit, > I like the idea of extending the feature to support bytea. I can > definitely take a look at it, but perhaps in another patch? This change > would most likely involve transformXmlSerialize in parse_expr.c, and I'm > not sure of the impact in other usages of XMLSERIALIZE. > > or CANONICAL can be used just for text? And you are sure, so you can > > compare text X text, instead xml X xml? > Yes, currently it only supports varchar or text - and their cousins. The > idea is to format the xml and serialize it as text in a way that they > can compared based on their content, independently of how they were > written, e.g '' is equal to ''. > > > > > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) = > > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM > > xmltest_serialize; > > + ?column? > > +-- > > + t > > + t > > +(2 rows) > > > > Maybe I am a little bit confused by these regress tests, because at > > the end it is not too useful - you compare two identical XML, and WITH > > COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search > > for a sense of this test. Better to use really different documents > > (columns) instead. > > Yeah, I can see that it's confusing. In this example I actually just > wanted to test that the default option of CANONICAL is CANONICAL WITH > COMMENTS, even if you don't mention it. In the docs I mentioned it like > this: > > "The optional parameters WITH COMMENTS (which is the default) or WITH NO > COMMENTS, respectively, keep or remove XML comments from the given > document." > > Perhaps I should rephrase it? Or maybe a comment in the regression tests > would suffice? > comment will be enough > > Thanks a lot for the input! > > -- > Jim > >
Re: Make printtup a bit faster
David Rowley writes: > On Fri, 30 Aug 2024 at 03:33, Tom Lane wrote: >> >> David Rowley writes: >> > [ redesign I/O function APIs ] >> > I had planned to work on this for PG18, but I'd be happy for some >> > assistance if you're willing. >> >> I'm skeptical that such a thing will ever be practical. To avoid >> breaking un-converted data types, all the call sites would have to >> support both old and new APIs. To avoid breaking non-core callers, >> all the I/O functions would have to support both old and new APIs. >> That probably adds enough overhead to negate whatever benefit you'd >> get. > > So, we currently return cstrings in our output functions. Let's take > jsonb_out() as an example, to build that cstring, we make a *new* > StringInfoData on *every call* inside JsonbToCStringWorker(). That > gives you 1024 bytes before you need to enlarge it. However, it's > maybe not all bad as we have some size estimations there to call > enlargeStringInfo(), only that's a bit wasteful as it does a > repalloc() which memcpys the freshly allocated 1024 bytes allocated in > initStringInfo() when it doesn't yet contain any data. After > jsonb_out() has returned and we have the cstring, only we forgot the > length of the string, so most places will immediately call strlen() or > do that indirectly via appendStringInfoString(). For larger JSON > documents, that'll likely require pulling cachelines back into L1 > again. I don't know how modern CPU cacheline eviction works, but if it > was as simple as FIFO then the strlen() would flush all those > cachelines only for memcpy() to have to read them back again for > output strings larger than L1. The attached is PoC of this idea, not matter which method are adopted (rewrite all the outfunction or a optional print function), I think the benefit will be similar. In the blew test case, it shows us 10%+ improvements. (0.134ms vs 0.110ms) create table demo as select oid as oid1, relname::text as text1, relam, relname::text as text2 from pg_class; pgbench: select * from demo; -- Best Regards Andy Fan >From 5ace763a5126478da1c3cb68d5221d83e45d2f34 Mon Sep 17 00:00:00 2001 From: Andy Fan Date: Fri, 30 Aug 2024 12:50:54 +0800 Subject: [PATCH v20240830 1/1] Avoiding some memcpy, strlen, palloc in printtup. https://www.postgresql.org/message-id/87wmjzfz0h.fsf%40163.com --- src/backend/access/common/printtup.c | 40 ++-- src/backend/utils/adt/oid.c | 20 ++ src/backend/utils/adt/varlena.c | 17 src/include/catalog/pg_proc.dat | 9 ++- 4 files changed, 77 insertions(+), 9 deletions(-) diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c index c78cc39308..ecba4a7113 100644 --- a/src/backend/access/common/printtup.c +++ b/src/backend/access/common/printtup.c @@ -19,6 +19,7 @@ #include "libpq/pqformat.h" #include "libpq/protocol.h" #include "tcop/pquery.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memdebug.h" #include "utils/memutils.h" @@ -49,6 +50,7 @@ typedef struct bool typisvarlena; /* is it varlena (ie possibly toastable)? */ int16 format; /* format code for this column */ FmgrInfo finfo; /* Precomputed call info for output fn */ + FmgrInfo p_finfo;/* Precomputed call info for print fn if any */ } PrinttupAttrInfo; typedef struct @@ -274,10 +276,25 @@ printtup_prepare_info(DR_printtup *myState, TupleDesc typeinfo, int numAttrs) thisState->format = format; if (format == 0) { - getTypeOutputInfo(attr->atttypid, - &thisState->typoutput, - &thisState->typisvarlena); - fmgr_info(thisState->typoutput, &thisState->finfo); + /* + * If the type defines a print function, then use it + * rather than outfunction. + * + * XXX: need a generic function to improve the if-elseif. + */ + if (attr->atttypid == OIDOID) +fmgr_info(F_OIDPRINT, &thisState->p_finfo); + else if (attr->atttypid == TEXTOID) +fmgr_info(F_TEXTPRINT, &thisState->p_finfo); + else + { +getTypeOutputInfo(attr->atttypid, + &thisState->typoutput, + &thisState->typisvarlena); +fmgr_info(thisState->typoutput, &thisState->finfo); +/* mark print function is invalid */ +thisState->p_finfo.fn_oid = InvalidOid; + } } else if (format == 1) { @@ -355,10 +372,17 @@ printtup(TupleTableSlot *slot, DestReceiver *self) if (thisState->format == 0) { /* Text output */ - char *outputstr; - - outputstr = OutputFunctionCall(&thisState->finfo, attr); - pq_sendcountedtext(buf, outputstr, strlen(outputstr)); + if (thisState->p_finfo.fn_oid) + { +FunctionCall2(&thisState->p_finfo, attr, PointerGetDatum(buf)); + } + else + { +char *outputstr; + +outputstr = OutputFunctionCall(&thisState->finfo, attr); +pq_sendcountedtext(buf, outputstr, strlen(outputstr)); + } } else { diff --git a/src/backend/utils/adt/oid
Re: [PATCH] Enable using llvm jitlink as an alternative llvm jit linker of old Rtdyld.
Hi Alex, JITLink came back onto the radar screen: we see that LLVM 20 will deprecate the RuntimeDyld link layer that we're using. JITLink would in fact fix the open bug report we have about LLVM crashing all over the place on ARM[1], and I can see that would be quite easy to do what you showed, but we can't use that solution for that problem because it only works on LLVM 15+ (and maybe doesn't even support all the architectures until later releases, I haven't followed the details). So we'll probably finish up backporting that fix for RuntimeDyld, which seems like it's going to work OK (thanks Anthonin, CCd, for diagnosing that). That's all fine and good, but if my crystal ball is operating correctly, fairly soon we'll have a situation where RHEL is shipping versions that *only* support JITLink, while other distros are still shipping versions that need RuntimeDyld because they don't yet have JITLink or it is not mature enough yet for all architectures. So we'll need to support both for a while. That's all fine, and I can see that it's going to be pretty easy to do, it's mostly just LLVMOrcCreateThis() or LLVMOrcCreateThat() with some #ifdef around it, job done. The question I have is: is someone looking into getting the C API we need for that into the LLVM main branch (LLVM 20-to-be)? I guess I would prefer to be able to use that, rather than adding more of our own C++ wrapper code into our tree, if we can, and it seems like now would be a good time to get ahead of that. [1] https://www.postgresql.org/message-id/flat/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
Re: Collect statistics about conflicts in logical replication
Hi Hou-San. Here are my review comments for v4-0001. == 1. Add links in the docs IMO it would be good for all these confl_* descriptions (in doc/src/sgml/monitoring.sgml) to include links back to where each of those conflict types was defined [1]. Indeed, when links are included to the original conflict type information then I think you should remove mentioning "track_commit_timestamp": + counted only when the + track_commit_timestamp + option is enabled on the subscriber. It should be obvious that you cannot count a conflict if the conflict does not happen, but I don't think we should scatter/duplicate those rules in different places saying when certain conflicts can/can't happen -- we should just link everywhere back to the original description for those rules. ~~~ 2. Arrange all the counts into an intuitive/natural order There is an intuitive/natural ordering for these counts. For example, the 'confl_*' count fields are in the order insert -> update -> delete, which LGTM. Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not in a good order. IMO it makes more sense if everything is ordered as: 'sync_error_count', then 'apply_error_count', then all the 'confl_*' counts. This comment applies to lots of places, e.g.: - docs (doc/src/sgml/monitoring.sgml) - function pg_stat_get_subscription_stats (pg_proc.dat) - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) - TAP test SELECTs (test/subscription/t/026_stats.pl) As all those places are already impacted by this patch, I think it would be good if (in passing) we (if possible) also swapped the sync/apply counts so everything is ordered intuitively top-to-bottom or left-to-right. == [1] https://www.postgresql.org/docs/devel/logical-replication-conflicts.html#LOGICAL-REPLICATION-CONFLICTS Kind Regards, Peter Smith. Fujitsu Australia
Supporting pg freeze in pg_dump, restore.
H, hackersi! Tell me, please Do I understand correctly that pg_dump/pg_restore has not been taught to do COPY FREEZE when filling data? The last mention of the plans was Bruce in 2013: https://postgrespro.com/list/thread-id/1815657 https://paquier.xyz/postgresql-2/postgres-9-3-feature-highlight-copy-freeze/ Best regards, Stepan Neretin.
Re: [PATCH] Add CANONICAL option to xmlserialize
On 30.08.24 06:46, Pavel Stehule wrote: > > > čt 29. 8. 2024 v 23:54 odesílatel Jim Jones > napsal: > > > > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) = > > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM > > xmltest_serialize; > > + ?column? > > +-- > > + t > > + t > > +(2 rows) > > > > Maybe I am a little bit confused by these regress tests, because at > > the end it is not too useful - you compare two identical XML, > and WITH > > COMMENTS and WITHOUT COMMENTS is tested elsewhere. I tried to search > > for a sense of this test. Better to use really different documents > > (columns) instead. > > Yeah, I can see that it's confusing. In this example I actually just > wanted to test that the default option of CANONICAL is CANONICAL WITH > COMMENTS, even if you don't mention it. In the docs I mentioned it > like > this: > > "The optional parameters WITH COMMENTS (which is the default) or > WITH NO > COMMENTS, respectively, keep or remove XML comments from the given > document." > > Perhaps I should rephrase it? Or maybe a comment in the regression > tests > would suffice? > > > comment will be enough > v12 attached adds a comment to this test. Thanks -- Jim From 8b17a05a3386aaffcc6c5dafa9c57f49278dc4b6 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Fri, 30 Aug 2024 07:55:08 +0200 Subject: [PATCH v12] Add CANONICAL output format to xmlserialize This patch introduces the CANONICAL option to xmlserialize, which serializes xml documents in their canonical form - as described in the W3C Canonical XML Version 1.1 specification. This option can be used with the additional parameter WITH [NO] COMMENTS to keep or remove xml comments from the canonical xml output. In case no parameter is provided, WITH COMMENTS will be used as default. This feature is based on the function xmlC14NDocDumpMemory from the C14N module of libxml2. This patch also includes regression tests and documentation. --- doc/src/sgml/datatype.sgml| 41 +++- src/backend/executor/execExprInterp.c | 2 +- src/backend/parser/gram.y | 21 +- src/backend/parser/parse_expr.c | 2 +- src/backend/utils/adt/xml.c | 271 -- src/include/nodes/parsenodes.h| 1 + src/include/nodes/primnodes.h | 10 + src/include/parser/kwlist.h | 1 + src/include/utils/xml.h | 2 +- src/test/regress/expected/xml.out | 115 +++ src/test/regress/expected/xml_1.out | 109 +++ src/test/regress/expected/xml_2.out | 115 +++ src/test/regress/sql/xml.sql | 66 +++ src/tools/pgindent/typedefs.list | 1 + 14 files changed, 647 insertions(+), 110 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index e0d33f12e1..19ba5bbf7f 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -4496,7 +4496,7 @@ xml 'bar' xml, uses the function xmlserialize:xmlserialize -XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] ) +XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ { [ NO ] INDENT ] | CANONICAL [ WITH [NO] COMMENTS ]}) type can be character, character varying, or @@ -4513,6 +4513,45 @@ XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS + +The option CANONICAL converts a given +XML document to its https://www.w3.org/TR/xml-c14n11/#Terminology";>canonical form +based on the https://www.w3.org/TR/xml-c14n11/";>W3C Canonical XML 1.1 Specification. +It is basically designed to provide applications the ability to compare xml documents or test if they +have been changed. The optional parameters WITH COMMENTS (which is the default) or +WITH NO COMMENTS, respectively, keep or remove XML comments from the given document. + + + + Example: + + + When a character string value is cast to or from type xml without going through XMLPARSE or diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 77394e76c3..9222eed1e2 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -4189,7 +4189,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op) *op->resvalue = PointerGetDatum(xmltotext_with_options(DatumGetXmlP(value), xexpr->xmloption, - xexpr->indent)); + xexpr->format)); *op->resnull = false; } break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 84cef57a70..bb05d2cb3b 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -618,12 +618,13 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type xml_root_version opt_xml_root_standalone %type xmlexists_argument %type document_or_content -%type xml_indent_o
Re: Collect statistics about conflicts in logical replication
On Fri, Aug 30, 2024 at 9:40 AM shveta malik wrote: > > On Thu, Aug 29, 2024 at 11:06 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > Agreed. Here is new version patch which change the names as suggested. I > > also > > rebased the patch based on another renaming commit 640178c9. > > > > Thanks for the patch. Few minor things: > > 1) > conflict.h: > * This enum is used in statistics collection (see > * PgStat_StatSubEntry::conflict_count) as well, therefore, when adding new > * values or reordering existing ones, ensure to review and potentially adjust > * the corresponding statistics collection codes. > > --We shall mention PgStat_BackendSubEntry as well. > > 026_stats.pl: > 2) > # Now that the table is empty, the > # update conflict (update_existing) ERRORs will stop happening. > > --Shall it be update_exists instead of update_existing here: > > 3) > This is an existing comment above insert_exists conflict capture: > # Wait for the apply error to be reported. > > --Shall we change to: > # Wait for the subscriber to report apply error and insert_exists conflict. > 1) I have tested the patch, works well. 2) Verified headers inclusions, all good 3) All my comments (very old ones when the patch was initially posted) are now addressed. So apart from the comments I posted in [1], I have no more comments. [1]: https://www.postgresql.org/message-id/CAJpy0uAZpzustNOMBhxBctHHWbBA%3DsnTAYsLpoWZg%2BcqegmD-A%40mail.gmail.com thanks Shveta
Re: Collect statistics about conflicts in logical replication
On Fri, Aug 30, 2024 at 10:53 AM Peter Smith wrote: > > Hi Hou-San. Here are my review comments for v4-0001. > > == > > 1. Add links in the docs > > IMO it would be good for all these confl_* descriptions (in > doc/src/sgml/monitoring.sgml) to include links back to where each of > those conflict types was defined [1]. > > Indeed, when links are included to the original conflict type > information then I think you should remove mentioning > "track_commit_timestamp": > + counted only when the > +linkend="guc-track-commit-timestamp">track_commit_timestamp > + option is enabled on the subscriber. > > It should be obvious that you cannot count a conflict if the conflict > does not happen, but I don't think we should scatter/duplicate those > rules in different places saying when certain conflicts can/can't > happen -- we should just link everywhere back to the original > description for those rules. +1, will make the doc better. > ~~~ > > 2. Arrange all the counts into an intuitive/natural order > > There is an intuitive/natural ordering for these counts. For example, > the 'confl_*' count fields are in the order insert -> update -> > delete, which LGTM. > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not > in a good order. > > IMO it makes more sense if everything is ordered as: > 'sync_error_count', then 'apply_error_count', then all the 'confl_*' > counts. > > This comment applies to lots of places, e.g.: > - docs (doc/src/sgml/monitoring.sgml) > - function pg_stat_get_subscription_stats (pg_proc.dat) > - view pg_stat_subscription_stats (src/backend/catalog/system_views.sql) > - TAP test SELECTs (test/subscription/t/026_stats.pl) > > As all those places are already impacted by this patch, I think it > would be good if (in passing) we (if possible) also swapped the > sync/apply counts so everything is ordered intuitively top-to-bottom > or left-to-right. Not sure about this though. It does not seem to belong to the current patch. thanks Shveta
Re: Conflict Detection and Resolution
On Fri, Aug 23, 2024 at 10:39 AM shveta malik wrote: > > On Thu, Aug 22, 2024 at 3:44 PM shveta malik wrote: > > > > > > For clock-skew and timestamp based resolution, if needed, I will post > > another email for the design items where suggestions are needed. > > > > Please find issues which need some thoughts and approval for > time-based resolution and clock-skew. > > 1) > Time based conflict resolution and two phase transactions: > > Time based conflict resolution (last_update_wins) is the one > resolution which will not result in data-divergence considering > clock-skew is taken care of. But when it comes to two-phase > transactions, it might not be the case. For two-phase transaction, we > do not have commit timestamp when the changes are being applied. Thus > for time-based comparison, initially it was decided to user prepare > timestamp but it may result in data-divergence. Please see the > example at [1]. > > Example at [1] is a tricky situation, and thus in the initial draft, > we decided to restrict usage of 2pc and CDR together. The plan is: > > a) During Create subscription, if the user has given last_update_wins > resolver for any conflict_type and 'two_phase' is also enabled, we > ERROR out. > b) During Alter subscription, if the user tries to update resolver to > 'last_update_wins' but 'two_phase' is enabled, we error out. > > Another solution could be to save both prepare_ts and commit_ts. And > when any txn comes for conflict resolution, we first check if > prepare_ts is available, use that else use commit_ts. Availability of > prepare_ts would indicate it was a prepared txn and thus even if it is > committed, we should use prepare_ts for comparison for consistency. > This will have some overhead of storing prepare_ts along with > commit_ts. But if the number of prepared txns are reasonably small, > this overhead should be less. > > We currently plan to go with restricting 2pc and last_update_wins > together, unless others have different opinions. > Done. v11-004 implements the idea of restricting 2pc and last_update_wins together. > ~~ > > 2) > parallel apply worker and conflict-resolution: > As discussed in [2] (see last paragraph in [2]), for streaming of > in-progress transactions by parallel worker, we do not have > commit-timestamp with each change and thus it makes sense to disable > parallel apply worker with CDR. The plan is to not start parallel > apply worker if 'last_update_wins' is configured for any > conflict_type. > Done. > ~~ > > 3) > parallel apply worker and clock skew management: > Regarding clock-skew management as discussed in [3], we will wait for > the local clock to come within tolerable range during 'begin' rather > than before 'commit'. And this wait needs commit-timestamp in the > beginning, thus we plan to restrict starting pa-worker even when > clock-skew related GUCs are configured. > Done. v11 implements it. > Earlier we had restricted both 2pc and parallel worker worker start > when detect_conflict was enabled, but now since detect_conflict > parameter is removed, we will change the implementation to restrict > all 3 above cases when last_update_wins is configured. When the > changes are done, we will post the patch. > > ~~ > -- Thanks, Nisha
Re: Conflict Detection and Resolution
On Mon, Aug 26, 2024 at 2:23 PM Amit Kapila wrote: > > On Thu, Aug 22, 2024 at 3:45 PM shveta malik wrote: > > > > On Wed, Aug 21, 2024 at 4:08 PM Nisha Moond > > wrote: > > > > > > The patches have been rebased on the latest pgHead following the merge > > > of the conflict detection patch [1]. > > > > Thanks for working on patches. > > > > Summarizing the issues which need some suggestions/thoughts. > > > > 1) > > For subscription based resolvers, currently the syntax implemented is: > > > > 1a) > > CREATE SUBSCRIPTION > > CONNECTION PUBLICATION > > CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > 1b) > > ALTER SUBSCRIPTION CONFLICT RESOLVER > > (conflict_type1 = resolver1, conflict_type2 = resolver2, > > conflict_type3 = resolver3,...); > > > > Earlier the syntax suggested in [1] was: > > CREATE SUBSCRIPTION CONNECTION PUBLICATION > > CONFLICT RESOLVER 'conflict_resolver1' FOR 'conflict_type1', > > CONFLICT RESOLVER 'conflict_resolver2' FOR 'conflict_type2'; > > > > I think the currently implemented syntax is good as it has less > > repetition, unless others think otherwise. > > > > ~~ > > > > 2) > > For subscription based resolvers, do we need a RESET command to reset > > resolvers to default? Any one of below or both? > > > > 2a) reset all at once: > > ALTER SUBSCRIPTION RESET CONFLICT RESOLVERS > > > > 2b) reset one at a time: > > ALTER SUBSCRIPTION RESET CONFLICT RESOLVER for 'conflict_type'; > > > > The issue I see here is, to implement 1a and 1b, we have introduced > > the 'RESOLVER' keyword. If we want to implement 2a, we will have to > > introduce the 'RESOLVERS' keyword as well. But we can come up with > > some alternative syntax if we plan to implement these. Thoughts? > > > > It makes sense to have a RESET on the lines of (a) and (b). At this > stage, we should do minimal in extending the syntax. How about RESET > CONFLICT RESOLVER ALL for (a)? > Done, v11 implements the suggested RESET command. > > ~~ > > -- Thanks, Nisha
Re: define PG_REPLSLOT_DIR
On Tue, Aug 20, 2024 at 04:23:06PM +, Bertrand Drouvot wrote: > Please find attached v3 that: > > - takes care of your comments (and also removed the use of PG_TBLSPC_DIR in > RELATIVE_PG_TBLSPC_DIR). > - removes the new macros from the comments (see Michael's and Yugo-San's > comments in [0] resp. [1]). > - adds a missing sizeof() (see [1]). > - implements Ashutosh's idea of adding a new SLOT_DIRNAME_ARGS (see [2]). It's > done in 0002 (I used REPLSLOT_DIR_ARGS though). > - fixed a macro usage in ReorderBufferCleanupSerializedTXNs() (was not at the > right location, discovered while implementing 0002). I was looking at that, and applied 0001 for pg_replslot and merged 0003~0005 together for pg_logical. For the first one, at the end I have updated the comments in genfile.c. For the second one, it looked a bit strange to ignore "pg_logical/", which is the base for all the others, so I have added a variable for it. Locating them in reorderbuffer.h with the GUCs was OK, but perhaps there was an argument for logical.h. The paths in origin.c refer to files, not directories. Not sure that 0002 is an improvement overall, so I have left this part out. In 0006, I am not sure that RELATIVE_PG_TBLSPC_DIR is really something we should have. Sure, that's a special case for base backups when sending a directory, but we have also pg_wal/ and its XLOGDIR so that's inconsistent. Perhaps this part should be left as-is. -- Michael signature.asc Description: PGP signature
Re: list of acknowledgments for PG17
On 27.08.24 11:26, Etsuro Fujita wrote: On Sat, Aug 24, 2024 at 11:27 PM Peter Eisentraut wrote: As usual, please check for problems such as wrong sorting, duplicate names in different variants, or names in the wrong order etc. I think Japanese names are in the right order except “Sutou Kouhei”. I am 100% sure his given name is Kouhei. Fixed. Thank you for checking.
Re: Conflict Detection and Resolution
On Wed, Aug 28, 2024 at 10:58 AM shveta malik wrote: > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian wrote: > > > >> 2) > >> Currently pg_dump is dumping even the default resolvers configuration. > >> As an example if I have not changed default configuration for say > >> sub1, it still dumps all: > >> > >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH () > >> CONFLICT RESOLVER (insert_exists = 'error', update_differ = > >> 'apply_remote', update_exists = 'error', update_missing = 'skip', > >> delete_differ = 'apply_remote', delete_missing = 'skip'); > >> > >> I am not sure if we need to dump default resolvers. Would like to know > >> what others think on this. > >> Normally, we don't add defaults in the dumped command. For example, dumpSubscription won't dump the options where the default is unchanged. We shouldn't do it unless we have a reason for dumping defaults. > >> 3) > >> Why in 002_pg_dump.pl we have default resolvers set explicitly? > >> > > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the > > regexp to check the pg_dump generated command for creating subscriptions. > > This is again connected to your 2nd question. > > Okay so we may not need this change if we plan to *not *dump defaults > in pg_dump. > > Another point about 'defaults' is regarding insertion into the > pg_subscription_conflict table. We currently do insert default > resolvers into 'pg_subscription_conflict' even if the user has not > explicitly configured them. > I don't see any problem with it. BTW, if we don't do it, I think wherever we are referring the resolvers for a conflict, we need some special handling for default and non-default. Am I missing something? -- With Regards, Amit Kapila.
RE: Collect statistics about conflicts in logical replication
On Friday, August 30, 2024 2:24 PM shveta malik wrote: > > On Fri, Aug 30, 2024 at 10:53 AM Peter Smith > wrote: > > > > Hi Hou-San. Here are my review comments for v4-0001. Thanks Shveta and Peter for giving comments ! > > > > == > > > > 1. Add links in the docs > > > > IMO it would be good for all these confl_* descriptions (in > > doc/src/sgml/monitoring.sgml) to include links back to where each of > > those conflict types was defined [1]. > > > > Indeed, when links are included to the original conflict type > > information then I think you should remove mentioning > > "track_commit_timestamp": > > + counted only when the > > +linkend="guc-track-commit-timestamp">track_commit_timesta > mp > > + option is enabled on the subscriber. > > > > It should be obvious that you cannot count a conflict if the conflict > > does not happen, but I don't think we should scatter/duplicate those > > rules in different places saying when certain conflicts can/can't > > happen -- we should just link everywhere back to the original > > description for those rules. > > +1, will make the doc better. Changed. To add link to each conflict type, I added " > > ~~~ > > > > 2. Arrange all the counts into an intuitive/natural order > > > > There is an intuitive/natural ordering for these counts. For example, > > the 'confl_*' count fields are in the order insert -> update -> > > delete, which LGTM. > > > > Meanwhile, the 'apply_error_count' and the 'sync_error_count' are not > > in a good order. > > > > IMO it makes more sense if everything is ordered as: > > 'sync_error_count', then 'apply_error_count', then all the 'confl_*' > > counts. > > > > This comment applies to lots of places, e.g.: > > - docs (doc/src/sgml/monitoring.sgml) > > - function pg_stat_get_subscription_stats (pg_proc.dat) > > - view pg_stat_subscription_stats > > (src/backend/catalog/system_views.sql) > > - TAP test SELECTs (test/subscription/t/026_stats.pl) > > > > As all those places are already impacted by this patch, I think it > > would be good if (in passing) we (if possible) also swapped the > > sync/apply counts so everything is ordered intuitively top-to-bottom > > or left-to-right. > > Not sure about this though. It does not seem to belong to the current patch. I also don't think we should handle that in this patch. Here is V5 patch which addressed above and Shveta's[1] comments. [1] https://www.postgresql.org/message-id/CAJpy0uAZpzustNOMBhxBctHHWbBA%3DsnTAYsLpoWZg%2BcqegmD-A%40mail.gmail.com Best Regards, Hou zj v5-0001-Collect-statistics-about-conflicts-in-logical-rep.patch Description: v5-0001-Collect-statistics-about-conflicts-in-logical-rep.patch
Re: ANALYZE ONLY
Hi Michael, On 2024-08-23 19:01, Michael Harris wrote: V2 of the patch is attached. Thanks for the proposal and the patch. You said this patch was a first draft version, so these may be too minor comments, but I will share them: -- https://www.postgresql.org/docs/devel/progress-reporting.html Note that when ANALYZE is run on a partitioned table, all of its partitions are also recursively analyzed. Should we also note this is the default, i.e. not using ONLY option behavior here? -- https://www.postgresql.org/docs/devel/ddl-partitioning.html If you are using manual VACUUM or ANALYZE commands, don't forget that you need to run them on each child table individually. A command like: ANALYZE measurement; will only process the root table. This part also should be modified, shouldn't it? When running ANALYZE VERBOSE ONLY on a partition table, the INFO message is like this: =# ANALYZE VERBOSE ONLY only_parted; INFO: analyzing "public.only_parted" inheritance tree I may be wrong, but 'inheritance tree' makes me feel it includes child tables. Removing 'inheritance tree' and just describing the table name as below might be better: INFO: analyzing "public.only_parted" -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
Re: Conflict Detection and Resolution
On Fri, Aug 30, 2024 at 12:13 PM Amit Kapila wrote: > > On Wed, Aug 28, 2024 at 10:58 AM shveta malik wrote: > > > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian wrote: > > > > > >> 2) > > >> Currently pg_dump is dumping even the default resolvers configuration. > > >> As an example if I have not changed default configuration for say > > >> sub1, it still dumps all: > > >> > > >> CREATE SUBSCRIPTION sub1 CONNECTION '..' PUBLICATION pub1 WITH () > > >> CONFLICT RESOLVER (insert_exists = 'error', update_differ = > > >> 'apply_remote', update_exists = 'error', update_missing = 'skip', > > >> delete_differ = 'apply_remote', delete_missing = 'skip'); > > >> > > >> I am not sure if we need to dump default resolvers. Would like to know > > >> what others think on this. > > >> > > Normally, we don't add defaults in the dumped command. For example, > dumpSubscription won't dump the options where the default is > unchanged. We shouldn't do it unless we have a reason for dumping > defaults. Agreed, we should not dump defaults. I had the same opinion. > > > >> 3) > > >> Why in 002_pg_dump.pl we have default resolvers set explicitly? > > >> > > > In 003_pg_dump.pl, default resolvers are not set explicitly, that is the > > > regexp to check the pg_dump generated command for creating subscriptions. > > > This is again connected to your 2nd question. > > > > Okay so we may not need this change if we plan to *not *dump defaults > > in pg_dump. > > > > Another point about 'defaults' is regarding insertion into the > > pg_subscription_conflict table. We currently do insert default > > resolvers into 'pg_subscription_conflict' even if the user has not > > explicitly configured them. > > > > I don't see any problem with it. Yes, no problem > BTW, if we don't do it, I think > wherever we are referring the resolvers for a conflict, we need some > special handling for default and non-default. Yes, we will need special handling in such a case. Thus we shall go with inserting defaults. > Am I missing something? No, I just wanted to know others' opinions, so I asked. thanks Shveta
Re: Conflict Detection and Resolution
On Wed, Aug 28, 2024 at 4:07 PM shveta malik wrote: > > > On Wed, Aug 28, 2024 at 10:30 AM Ajin Cherian wrote: > > > > > The review is WIP. Please find a few comments on patch001. > > 1) > logical-repliction.sgmlL > > + Additional logging is triggered for specific conflict_resolvers. > Users can also configure conflict_types while creating the > subscription. Refer to section CONFLICT RESOLVERS for details on > conflict_types and conflict_resolvers. > > Can we please change it to: > > Additional logging is triggered in various conflict scenarios, each > identified as a conflict type. Users have the option to configure a > conflict resolver for each conflict type when creating a subscription. > For more information on the conflict types detected and the supported > conflict resolvers, refer to the section > > 2) > SetSubConflictResolver > > + for (type = 0; type < resolvers_cnt; type++) > > 'type' does not look like the correct name here. The variable does not > state conflict_type, it is instead a resolver-array-index, so please > rename accordingly. Maybe idx or res_idx? > > 3) > CreateSubscription(): > > +if (stmt->resolvers) > +check_conflict_detection(); > > 3a) We can have a comment saying warn users if prerequisites are not met. > > 3b) Also, I do not find the name 'check_conflict_detection' > appropriate. One suggestion could be > 'conf_detection_check_prerequisites' (similar to > replorigin_check_prerequisites) > > 3c) We can move the below comment after check_conflict_detection() as > it makes more sense there. > /* > * Parse and check conflict resolvers. Initialize with default values > */ > > 4) > Should we allow repetition/duplicates of 'conflict_type=..' in CREATE > and ALTER SUB? As an example: > ALTER SUBSCRIPTION sub1 CONFLICT RESOLVER (insert_exists = > 'apply_remote', insert_exists = 'error'); > > Such a repetition works for Create-Sub but gives some internal error > for alter-sub. (ERROR: tuple already updated by self). Behaviour > should be the same for both. And if we give an error, it should be > some user understandable one. But I would like to know the opinions of > others. Shall it give an error or the last one should be accepted as > valid configuration in case of repetition? > I have tried the below statement to check existing behavior: create subscription sub1 connection 'dbname=postgres' publication pub1 with (streaming = on, streaming=off); ERROR: conflicting or redundant options LINE 1: ...=postgres' publication pub1 with (streaming = on, streaming=... So duplicate options are not allowed. If we see any challenges to follow same for resolvers then we can discuss but it seems better to follow the existing behavior of other subscription options. Also, the behavior for CREATE/ALTER should be the same. -- With Regards, Amit Kapila.