Re: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
> On 20 Jun 2024, at 06:29, Noah Misch wrote: > > This resolves the last inplace update defect known to me. That’s a huge amount of work, thank you! Do I get it right, that inplace updates are catalog-specific and some other OOM corruptions [0] and Standby corruptions [1] are not related to this fix. Bot cases we observed on regular tables. Or that might be effect of vacuum deepening corruption after observing wrong datfrozenxid? Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/67EADE8F-AEA6-4B73-8E38-A69E5D48BAFE%40yandex-team.ru#1266dd8b898ba02686c2911e0a50ab47 [1] https://www.postgresql.org/message-id/flat/CAFj8pRBEFMxxFSCVOSi-4n0jHzSaxh6Ze_cZid5eG%3Dtsnn49-A%40mail.gmail.com
Re: Pgoutput not capturing the generated columns
On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut wrote: > > On 19.06.24 13:22, Shubham Khanna wrote: > > All the comments are handled. > > > > The attached Patch contains all the suggested changes. > > Please also take a look at the proposed patch for virtual generated > columns [0] and consider how that would affect your patch. I think your > feature can only replicate *stored* generated columns. So perhaps the > documentation and terminology in your patch should reflect that. This patch is unable to manage virtual generated columns because it stores NULL values for them. Along with documentation the initial sync command being generated also should be changed to sync data exclusively for stored generated columns, omitting virtual ones. I suggest treating these changes as a separate patch(0003) for future merging or a separate commit, depending on the order of patch acceptance. Regards, Vignesh
Re: Remove distprep
On 16.06.24 21:34, Noah Misch wrote: On Thu, Oct 05, 2023 at 05:46:46PM +0200, Peter Eisentraut wrote: --- a/src/backend/Makefile +++ b/src/backend/Makefile $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ - cd '$(dir $@)' && rm -f $(notdir $@) && \ - $(LN_S) "$$prereqdir/$(notdir $<)" . + rm -f '$@' + $(LN_S) ../../backend/$< '$@' $(top_builddir)/src/include/utils/wait_event_types.h: utils/activity/wait_event_types.h - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ - cd '$(dir $@)' && rm -f $(notdir $@) && \ - $(LN_S) "$$prereqdir/$(notdir $<)" . + rm -f '$@' + $(LN_S) ../../backend/$< '$@' These broke the https://www.postgresql.org/docs/17/installation-platform-notes.html#INSTALLATION-NOTES-MINGW build, where LN_S='cp -pR'. On other platforms, "make LN_S='cp -pR'" reproduces this. Reverting the above lines fixes things. The buildfarm has no coverage for that build scenario (fairywren uses Meson). Is it just these two instances? Commit 721856ff24b contains a few more hunks that change something about LN_S. Are those ok?
Re: ON ERROR in json_query and the like
Hi, On Mon, Jun 17, 2024 at 9:47 PM Chapman Flack wrote: > On 06/17/24 02:20, Amit Langote wrote: > >>>Apparently, the functions expect JSONB so that a cast is implied > >>>when providing TEXT. However, the errors during that cast are > >>>not subject to the ON ERROR clause. > >>> > >>>17beta1=# SELECT JSON_QUERY('invalid', '$' NULL ON ERROR); > >>>ERROR: invalid input syntax for type json > >>>DETAIL: Token "invalid" is invalid. > >>>CONTEXT: JSON data, line 1: invalid > >>> > >>>Oracle DB and Db2 (LUW) both return NULL in that case. > > I wonder, could prosupport rewriting be used to detect that the first > argument is supplied by a cast, and rewrite the expression to apply the > cast 'softly'? Or would that behavior be too magical? I don't think prosupport rewriting can be used, because JSON_QUERY(). We could possibly use "runtime coercion" for context_item so that the coercion errors can be "caught", which is how we coerce the jsonpath result to the RETURNING type. For now, I'm inclined to simply document the limitation that errors when coercing string arguments to json are always thrown. -- Thanks, Amit Langote
Re: suspicious valgrind reports about radixtree/tidstore on arm64
On Thu, Jun 20, 2024 at 8:12 AM Masahiko Sawada wrote: > On Thu, Jun 20, 2024 at 7:54 AM Tom Lane wrote: > > > I agree with radixtree-fix-proposed.patch. Even if we zero more fields > in the node it would not add noticeable overheads. +1 in general, although I'm slightly concerned about this part: > > (The RT_NODE_48 case could be optimized a bit if we cared > > to swap the order of its slot_idxs[] and isset[] arrays; > > then the initial zeroing could just go up to slot_idxs[]. - memset(n48->isset, 0, sizeof(n48->isset)); + memset(n48, 0, offsetof(RT_NODE_48, children)); memset(n48->slot_idxs, RT_INVALID_SLOT_IDX, sizeof(n48->slot_idxs)); I was a bit surprised that neither gcc 14 nor clang 18 can figure out that they can skip zeroing the slot index array since it's later filled in with "invalid index", so they actually zero out 272 bytes before re-initializing 256 of those bytes. It may not matter in practice, but it's also not free, and trivial to avoid. > > I don't know if there's any reason why the current order > > is preferable.) > > IIUC there is no particular reason for the current order in RT_NODE_48. Yeah. I found that simply swapping them enables clang to avoid double-initialization, but gcc still can't figure it out and must be told to stop at slot_idxs[]. I'd prefer to do it that way and document that slot_idxs is purposefully the last member of the fixed part of the struct. If that's agreeable I'll commit it that way tomorrow unless someone beats me to it.
Re: suspicious valgrind reports about radixtree/tidstore on arm64
John Naylor writes: > On Thu, Jun 20, 2024 at 8:12 AM Masahiko Sawada wrote: >> IIUC there is no particular reason for the current order in RT_NODE_48. > Yeah. I found that simply swapping them enables clang to avoid > double-initialization, but gcc still can't figure it out and must be > told to stop at slot_idxs[]. I'd prefer to do it that way and document > that slot_idxs is purposefully the last member of the fixed part of > the struct. WFM. > If that's agreeable I'll commit it that way tomorrow > unless someone beats me to it. I was going to push it, but feel free. regards, tom lane
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi, I looked through the patch and have some comments. == L68: +Recovery Procedures It looks somewhat confusing and appears as if the section is intended to explain how to perform recovery. Since this is the first built-in procedure, I'm not sure how should this section be written. However, the section immediately above is named "Recovery Control Functions", so "Reocvery Synchronization Functions" would align better with the naming of the upper section. (I don't believe we need to be so strcit about the distinction between functions and procedures here.) It looks strange that the procedure signature includes the return type. == L93: +If timeout is not specified or zero, this +procedure returns once WAL is replayed upto +target_lsn. +If the timeout is specified (in +milliseconds) and greater than zero, the procedure waits until the +server actually replays the WAL upto target_lsn or +until the given time has passed. On timeout, an error is emitted. The first sentence should mention the main functionality. Following precedents, it might be better to use something like "Waits until recovery surpasses the specified LSN. If no timeout is specified or it is set to zero, this procedure waits indefinitely for the LSN. If the timeout is specified (in milliseconds) and is greater than zero, the procedure waits until the LSN is reached or the specified time has elapsed. On timeout, or if the server is promoted before the LSN is reached, an error is emitted." The detailed explanation that follows the above seems somewhat too verbose to me, as other functions don't have such detailed examples. == L484 /* +* Set latches for processes, whose waited LSNs are already replayed. This +* involves spinlocks. So, we shouldn't do this under a spinlock. +*/ Here, I'm not quite sure what specifically spinlock (or mutex?) is referring to. However, more importantly, shouldn't we explain that it is okay not to use any locks at all, rather than mentioning that spinlocks should not be used here? I found a similar case around freelist.c:238, which is written as follows. >* Not acquiring ProcArrayLock here which is slightly icky. It's >* actually fine because procLatch isn't ever freed, so we just > can >* potentially set the wrong process' (or no process') latch. >*/ > SetLatch(&ProcGlobal->allProcs[bgwprocno].procLatch); = L518 +void +WaitForLSN(XLogRecPtr targetLSN, int64 timeout) This function is only called within the same module. I'm not sure if we need to expose it. I we do, the name should probably be more specific. I'm not quite sure if the division of functionality between this function and its only caller function is appropriate. As a possible refactoring, we could have WaitForLSN() just return the result as [reached, timedout, promoted] and delegate prerequisition checks and error reporting to the SQL function. = L524 + /* Shouldn't be called when shmem isn't initialized */ + Assert(waitLSN); Seeing this assertion, I feel that the name "waitLSN" is a bit obscure. How about renaming it to "waitLSNStates"? = L527 + /* Should be only called by a backend */ + Assert(MyBackendType == B_BACKEND && MyProcNumber <= MaxBackends); This is somewhat excessive, causing a server crash when ending with an error would suffice. By the way, if I execute "CALL pg_wal_replay_wait('0/0')" on a logical wansender, the server crashes. The condition doesn't seem appropriate. = L561 +errdetail("Recovery ended before replaying the target LSN %X/%X; last replay LSN %X/%X.", I don't think we need "the" before "target" in the above message. = L565 + if (timeout > 0) + { + delay_ms = (endtime - GetCurrentTimestamp()) / 1000; + latch_events |= WL_TIMEOUT; + if (delay_ms <= 0) + break; "timeout" is immutable in the function. Therefore, we can calculate "latch_events" before entering the loop. By the way, the name 'latch_events' seems a bit off. Latch is a kind of event the function can wait for. Therefore, something like wait_events might be more appropriate. = L567 + delay_ms = (endtime - GetCurrentTimestamp()) / 1000; We can use TimestampDifferenceMilliseconds() here. L578 + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); I think we usually reset latches unconditionally after returning from WaitLatch(), even when waiting for timeouts. = L756 +{ oid => '16387', descr => 'wait for LSN with timeout', The description seems a bit off. While timeout is mentioned, the more important characteristic that the LSN is the replay LSN is not included. = L791 + * WaitLSNProcInfo [e2 80
Re: AIX support
On 19/06/2024 17:55, Srirama Kucherlapati wrote: +/* Commenting for XLC + * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline + * expansions of ginCompareItemPointers() "long long" arithmetic. To take + * advantage of inlining, build a 64-bit PostgreSQL. +#if defined(__ILP32__) && defined(__IBMC__) +#define PG_FORCE_DISABLE_INLINE +#endif + */ This seems irrelevant. + * Ordinarily, we'd code the branches here using GNU-style local symbols, that + * is "1f" referencing "1:" and so on. But some people run gcc on AIX with + * IBM's assembler as backend, and IBM's assembler doesn't do local symbols. + * So hand-code the branch offsets; fortunately, all PPC instructions are + * exactly 4 bytes each, so it's not too hard to count. Could you use GCC assembler to avoid this? @@ -662,6 +666,21 @@ tas(volatile slock_t *lock) #if !defined(HAS_TEST_AND_SET) /* We didn't trigger above, let's try here */ +#if defined(_AIX) /* AIX */ +/* + * AIX (POWER) + */ +#define HAS_TEST_AND_SET + +#include + +typedef int slock_t; + +#define TAS(lock) _check_lock((slock_t *) (lock), 0, 1) +#define S_UNLOCK(lock) _clear_lock((slock_t *) (lock), 0) +#endif /* _AIX */ + + /* These are in sunstudio_(sparc|x86).s */ #if defined(__SUNPRO_C) && (defined(__i386) || defined(__x86_64__) || defined(__sparc__) || defined(__sparc)) What CPI/compiler/OS configuration is this for, exactly? Could we rely on GCC-provided __sync_lock_test_and_set() builtin function instead? +# Allow platforms with buggy compilers to force restrict to not be +# used by setting $FORCE_DISABLE_RESTRICT=yes in the relevant +# template. Surely we don't need that anymore? Or is the compiler still buggy? Do you still care about 32-bit binaries on AIX? If not, let's make that the default in configure or a check for it, and remove the instructions on building 32-bit binaries from the docs. Please try hard to remove any changes from the diff that are not absolutely necessary. - Heikki
Re: ON ERROR in json_query and the like
Hi, On Mon, Jun 17, 2024 at 10:07 PM Markus Winand wrote: > > On 17.06.2024, at 08:20, Amit Langote wrote: > > Agree that the documentation needs to be clear about this. I'll update > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query > > Functions. > > Considering another branch of this thread [1] I think the > "Supported Features” appendix of the docs should mention that as well. > > The way I see it is that the standards defines two overloaded > JSON_QUERY functions, of which PostgreSQL will support only one. > In case of valid JSON, the implied CAST makes it look as though > the second variant of these function was supported as well but that > illusion totally falls apart once the JSON is not valid anymore. > > I think it affects the following feature IDs: > > - T821, Basic SQL/JSON query operators > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > - T828, JSON_QUERY > > Also, how hard would it be to add the functions that accept > character strings? Is there, besides the effort, any thing else > against it? I’m asking because I believe once released it might > never be changed — for backward compatibility. Hmm, I'm starting to think that adding the implied cast to json wasn't such a great idea after all, because it might mislead the users to think that JSON parsing is transparent (respects ON ERROR), which is what you are saying, IIUC. I'm inclined to push the attached patch which puts back the restriction to allow only jsonb arguments, asking users to add an explicit cast if necessary. -- Thanks, Amit Langote v1-0001-SQL-JSON-Only-allow-arguments-of-jsonb-type.patch Description: Binary data
Re: pg_combinebackup --clone doesn't work
On 6/20/24 07:55, Peter Eisentraut wrote: > The pg_combinebackup --clone option currently doesn't work at all. Even > on systems where it should it be supported, you'll always get a "file > cloning not supported on this platform" error. > > The reason is this checking code in pg_combinebackup.c: > > #if (defined(HAVE_COPYFILE) && defined(COPYFILE_CLONE_FORCE)) || \ > (defined(__linux__) && defined(FICLONE)) > > if (opt.dry_run) > pg_log_debug("would use cloning to copy files"); > else > pg_log_debug("will use cloning to copy files"); > > #else > pg_fatal("file cloning not supported on this platform"); > #endif > > The problem is that this file does not include the appropriate OS header > files that would define COPYFILE_CLONE_FORCE or FICLONE, respectively. > > The same problem also exists in copy_file.c. (That one has the right > header file for macOS but still not for Linux.) > Seems like my bug, I guess :-( Chances are the original patches had the include, but it got lost during refactoring or something. Anyway, will fix shortly. > This should be pretty easy to fix up, and we should think about some > ways to refactor this to avoid repeating all these OS-specific things a > few times. (The code was copied from pg_upgrade originally.) > Yeah. The ifdef forest got rather hard to navigate. > But in the short term, how about some test coverage? You can exercise > the different pg_combinebackup copy modes like this: > > diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm > b/src/test/perl/PostgreSQL/Test/Cluster.pm > index 83f385a4870..7e8dd024c82 100644 > --- a/src/test/perl/PostgreSQL/Test/Cluster.pm > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > @@ -848,7 +848,7 @@ sub init_from_backup > } > > local %ENV = $self->_get_env(); > - my @combineargs = ('pg_combinebackup', '-d'); > + my @combineargs = ('pg_combinebackup', '-d', '--clone'); > if (exists $params{tablespace_map}) > { > while (my ($olddir, $newdir) = each %{ > $params{tablespace_map} }) > For ad hoc testing? Sure, but that won't work on platforms without the clone support, right? > We could do something like what we have for pg_upgrade, where we can use > the environment variable PG_TEST_PG_UPGRADE_MODE to test the different > copy modes. We could turn this into a global option. (This might also > be useful for future work to use file cloning elsewhere, like in CREATE > DATABASE?) > Yeah, this sounds like a good way to do this. Is there a good reason to have multiple different variables, one for each tool, or should we have a single PG_TEST_COPY_MODE affecting all the places? > Also, I think it would be useful for consistency if pg_combinebackup had > a --copy option to select the default mode, like pg_upgrade does. > I vaguely recall this might have been discussed in the thread about adding cloning to pg_combinebackup, but I don't recall the details why we didn't adopt the pg_uprade way. But we can revisit that, IMO. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: SQL/JSON query functions context_item doc entry and type requirement
On Thu, Jun 20, 2024 at 1:03 AM David G. Johnston wrote: > On Wed, Jun 19, 2024 at 8:29 AM jian he wrote: >> >> On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack wrote: >> > >> > Hi, >> > >> > On 06/17/24 02:43, Amit Langote wrote: >> > > context_item expression can be a value of >> > > any type that can be cast to jsonb. This includes types >> > > such as char, text, bpchar, >> > > character varying, and bytea (with >> > > ENCODING UTF8), as well as any domains over these types. >> > >> > Reading this message in conjunction with [0] makes me think that we are >> > really talking about a function that takes a first parameter of type jsonb, >> > and behaves exactly that way (so any cast required is applied by the system >> > ahead of the call). Under those conditions, this seems like an unusual >> > sentence to add in the docs, at least until we have also documented that >> > tan's argument can be of any type that can be cast to double precision. >> > >> >> I guess it would be fine to add an unusual sentence to the docs. >> >> imagine a function: array_avg(anyarray) returns anyelement. >> array_avg calculate an array's elements's avg. like >> array('{1,2,3}'::int[]) returns 2. >> but array_avg won't make sense if the input argument is a date array. >> so mentioning in the doc: array_avg can accept anyarray, but anyarray >> cannot date array. >> seems ok. > > > There is existing wording for this: > > "The expression can be of any JSON type, any character string type, or bytea > in UTF8 encoding." > > If you add this sentence to the paragraph the link that already exists, which > simply points the reader to this sentence, becomes redundant and should be > removed. I've just posted a patch in the other thread [1] to restrict context_item to be of jsonb type, which users would need to ensure by adding an explicit cast if needed. I think that makes this clarification unnecessary. > As for table 9.16.3 - it is unwieldy already. Lets try and make the core > syntax shorter, not longer. We already have precedence in the subsequent > json_table section - give each major clause item a name then below the table > define the syntax and meaning for those names. Unlike in that section - > which probably should be modified too - context_item should have its own > description line. I had posted a patch a little while ago at [1] to render the syntax a bit differently with each function getting its own syntax synopsis. Resending it here; have addressed Jian He's comments. -- Thanks, Amit Langote [1] https://www.postgresql.org/message-id/CA%2BHiwqF2Z6FATWQV6bG9NeKYf%3D%2B%2BfOgmdbYc9gWSNJ81jfqCuA%40mail.gmail.com v1-0001-SQL-JSON-Various-fixes-to-SQL-JSON-query-function.patch Description: Binary data
Re: Conflict Detection and Resolution
On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat wrote: > > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila wrote: >> >> > I doubt that it would be that simple. The application will have to >> > intervene and tell one of the employees that their reservation has failed. >> > It looks natural that the first one to reserve the room should get the >> > reservation, but implementing that is more complex than resolving a >> > conflict in the database. In fact, mostly it will be handled outside >> > database. >> > >> >> Sure, the application needs some handling but I have tried to explain >> with a simple way that comes to my mind and how it can be realized >> with db involved. This is a known conflict detection method but note >> that I am not insisting to have "earliest_timestamp_wins". Even, if we >> want this we can have a separate discussion on this and add it later. >> > > It will be good to add a minimal set of conflict resolution strategies to > begin with, while designing the feature for extensibility. I imagine the > first version might just detect the conflict and throw error or do nothing. > That's already two simple conflict resolution strategies with minimal > efforts. We can add more complicated ones incrementally. > Agreed, splitting the work into multiple patches would help us to finish the easier ones first. I have thought to divide it such that in the first patch, we detect conflicts like 'insert_exists', 'update_differ', 'update_missing', and 'delete_missing' (the definition of each could be found in the initial email [1]) and throw an ERROR or write them in LOG. Various people agreed to have this as a separate committable work [2]. This can help users to detect and monitor the conflicts in a better way. I have intentionally skipped update_deleted as it would require more infrastructure and it would be helpful even without that. In the second patch, we can implement simple built-in resolution strategies like apply and skip (which can be named as remote_apply and keep_local, see [3][4] for details on these strategies) with ERROR or LOG being the default strategy. We can allow these strategies to be configured at the global and table level. In the third patch, we can add monitoring capability for conflicts and resolutions as mentioned by Jonathan [5]. Here, we can have stats like how many conflicts of a particular type have happened. In the meantime, we can keep discussing and try to reach a consensus on the timing-related resolution strategy like 'last_update_wins' and the conflict strategy 'update_deleted'. If we agree on the above, some of the work, especially the first one, could even be discussed in a separate thread. Thoughts? [1] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAD21AoAa6JzqhXY02uNUPb-aTozu2RY9nMdD1%3DTUh%2BFpskkYtw%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAJpy0uD0-DpYVMtsxK5R%3DzszXauZBayQMAYET9sWr_w0CNWXxQ%40mail.gmail.com [4] - https://github.com/2ndquadrant/pglogical?tab=readme-ov-file#conflicts [5] - https://www.postgresql.org/message-id/1eb9242f-dcb6-45c3-871c-98ec324e03ef%40postgresql.org -- With Regards, Amit Kapila.
Re: 回复: An implementation of multi-key sort
On Fri, Jun 14, 2024 at 6:20 PM Yao Wang wrote: > > hi Tomas, > > So many thanks for your kind response and detailed report. I am working > on locating issues based on your report/script and optimizing code, and > will update later. Hi, This is an interesting proof-of-concept! Given the above, I've set this CF entry to "waiting on author". Also, I see you've added Heikki as a reviewer. I'm not sure how others think, but I consider a "reviewer" in the CF app to be someone who has volunteered to be responsible to help move this patch forward. If there is a name in the reviewer column, it may discourage others from doing review. It also can happened that people ping reviewers to ask "There's been no review for X months -- are you planning on looking at this?", and it's not great if that message is a surprise. Note that we prefer not to top-post in emails since it makes our web archive more difficult to read. Thanks, John
How about add counted_by attribute for flexible-array?
Hi, hackers When I read [1], I think the "counted_by" attribute may also be valuable for PostgreSQL. The 'counted_by' attribute is used on flexible array members. The argument for the attribute is the name of the field member in the same structure holding the count of elements in the flexible array. This information can be used to improve the results of the array bound sanitizer and the '__builtin_dynamic_object_size' builtin [2]. It was introduced in Clang-18 [3] and will soon be available in GCC-15. [1] https://embeddedor.com/blog/2024/06/18/how-to-use-the-new-counted_by-attribute-in-c-and-linux/ [2] https://reviews.llvm.org/D148381 [3] https://godbolt.org/z/5qKsEhG8o -- Regrads, Japin Li
Re: Speed up collation cache
On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis wrote: > Attached is a patch to use simplehash.h instead, which speeds things up > enough to make them fairly close (from around 15% slower to around 8%). +#define SH_HASH_KEY(tb, key) hash_uint32((uint32) key) For a static inline hash for speed reasons, we can use murmurhash32 here, which is also inline.
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hi Tom, Is there anything that could be back-patched with reasonable effort ? -- Hannu On Mon, Jun 17, 2024 at 6:37 PM Daniel Gustafsson wrote: > > > On 17 Jun 2024, at 16:56, Tom Lane wrote: > > Daniel Gustafsson writes: > > >> I wonder if this will break any tools/scripts in prod which relies on the > >> previous (faulty) behaviour. It will be interesting to see if anything > >> shows > >> up on -bugs. Off the cuff it seems like a good idea judging by where we > >> are > >> and what we can fix with it. > > > > Considering that SHARED_DEPENDENCY_INITACL has existed for less than > > two months, it's hard to believe that any outside code has grown any > > dependencies on it, much less that it couldn't be adjusted readily. > > Doh, I was thinking about it backwards, clearly not a worry =) > > >> I wonder if it's worth reverting passing the owner ID for v17 and > >> revisiting > >> that in 18 if we work on recording the ID. Shaving a few catalog lookups > >> is > >> generally worthwhile, doing them without needing the result for the next > >> five > >> years might bite us. > > > > Yeah, that was the direction I was leaning in, too. I'll commit the > > revert of that separately, so that un-reverting it shouldn't be too > > painful if we eventually decide to do so. > > Sounds good. > > -- > Daniel Gustafsson >
confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm
Hi, While running valgrind on 32-bit ARM (rpi5 with debian), I got this really strange report: ==25520== Use of uninitialised value of size 4 ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108) ==25520==by 0x4D7826F: ??? (sigrestorer.S:64) ==25520== Uninitialised value was created by a heap allocation ==25520==at 0x8FB780: palloc (mcxt.c:1340) ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289) ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331) ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64) ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) ==25520==by 0x3EF73F: ExecProcNode (executor.h:274) ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703) ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) ==25520==by 0x3C47DB: ExecProcNode (executor.h:274) ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561) ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364) ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179) ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274) ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646) ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363) ==25520==by 0x3A644B: ExecutorRun (execMain.c:304) ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924) ==25520==by 0x6972F7: PortalRun (pquery.c:768) ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274) ==25520== { Memcheck:Value4 fun:wrapper_handler obj:/usr/lib/arm-linux-gnueabihf/libc.so.6 } **25520** Valgrind detected 1 error(s) during execution of "select count(*) from **25520** (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x **25520** left join **25520** (select * from tenk1 y order by y.unique2) y **25520** on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;" I'm mostly used to weird valgrind stuff on this platform, but it's usually about libarmmmem and (possibly) thinking it might access undefined stuff when calculating checksums etc. This seems somewhat different, so I wonder if it's something real? But also, at the same time, it's rather weird, because the report says it's this bit in pqsignal.c (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg); but it also says the memory was allocated in tuplestore, and that's obviously very unlikely, because it does not do anything with signals. I've only seen this once, but if it's related to signals, that's not surprising - the window may be pretty narrow. Anyone saw/investigated a report like this? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Conflict Detection and Resolution
On Tue, Jun 18, 2024 at 11:34 AM Dilip Kumar wrote: > > On Tue, Jun 18, 2024 at 10:17 AM Dilip Kumar wrote: > > > > > > > > I think the unbounded size of the vector could be a problem to store > > > for each event. However, while researching previous discussions, it > > > came to our notice that we have discussed this topic in the past as > > > well in the context of standbys. For recovery_min_apply_delay, we > > > decided the clock skew is not a problem as the settings of this > > > parameter are much larger than typical time deviations between servers > > > as mentioned in docs. Similarly for casual reads [1], there was a > > > proposal to introduce max_clock_skew parameter and suggesting the user > > > to make sure to have NTP set up correctly. We have tried to check > > > other databases (like Ora and BDR) where CDR is implemented but didn't > > > find anything specific to clock skew. So, I propose to go with a GUC > > > like max_clock_skew such that if the difference of time between the > > > incoming transaction's commit time and the local time is more than > > > max_clock_skew then we raise an ERROR. It is not clear to me that > > > putting bigger effort into clock skew is worth especially when other > > > systems providing CDR feature (like Ora or BDR) for decades have not > > > done anything like vector clocks. It is possible that this is less of > > > a problem w.r.t CDR and just detecting the anomaly in clock skew is > > > good enough. > > > > I believe that if we've accepted this solution elsewhere, then we can > > also consider the same. Basically, we're allowing the application to > > set its tolerance for clock skew. And, if the skew exceeds that > > tolerance, it's the application's responsibility to synchronize; > > otherwise, an error will occur. This approach seems reasonable. > > This model can be further extended by making the apply worker wait if > the remote transaction's commit_ts is greater than the local > timestamp. This ensures that no local transactions occurring after the > remote transaction appear to have happened earlier due to clock skew > instead we make them happen before the remote transaction by delaying > the remote transaction apply. Essentially, by having the remote > application wait until the local timestamp matches the remote > transaction's timestamp, we ensure that the remote transaction, which > seems to occur after concurrent local transactions due to clock skew, > is actually applied after those transactions. > > With this model, there should be no ordering errors from the > application's perspective as well if synchronous commit is enabled. > The transaction initiated by the publisher cannot be completed until > it is applied to the synchronous subscriber. This ensures that if the > subscriber's clock is lagging behind the publisher's clock, the > transaction will not be applied until the subscriber's local clock is > in sync, preventing the transaction from being completed out of order. > As per the discussion, this idea will help us to resolve transaction ordering issues due to clock skew. I was thinking of having two variables max_clock_skew (indicates how much clock skew is acceptable), max_clock_skew_options: ERROR, LOG, WAIT (indicates the action we need to take once the clock skew is detected). There could be multiple ways to provide these parameters, one is providing them as GUCs, and another at the subscription or the table level. I am thinking whether users would only like to care about a table or set of tables or they would like to set such variables at the system level. We already have an SKIP option (that allows us to skip the transactions till a particular LSN) at the subscription level, so I am wondering if there is a sense to provide these new parameters related to conflict resolution also at the same level? -- With Regards, Amit Kapila.
Re: What is a typical precision of gettimeofday()?
(resending to list and other CC:s ) Hi Tom This is my current patch which also adds running % and optionally uses faster way to count leading zeros, though I did not see a change from that. It also bucketizes first 128 ns to get better overview of exact behaviour. We may want to put reporting this behind a flag --- Hannu On Wed, Jun 19, 2024 at 6:36 PM Tom Lane wrote: > > Peter Eisentraut writes: > > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > > really address the original question. > > It's not exactly hard to make it do so (see attached). > > I tried this on several different machines, and my conclusion is that > gettimeofday() reports full microsecond precision on any platform > anybody is likely to be running PG on today. Even my one surviving > pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like > this: > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 901.41 ns > Histogram of timing durations: > < us % of total count > 1 10.46074 348148 > 2 89.514952979181 > 4 0.00574191 > 8 0.00430143 > 16 0.00691230 > 32 0.00376125 > 64 0.00012 4 >128 0.00303101 >256 0.00027 9 >512 0.9 3 > 1024 0.9 3 > > I also modified pg_test_timing to measure nanoseconds not > microseconds (second patch attached), and got this: > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 805.50 ns > Histogram of timing durations: > < ns % of total count > 1 19.84234 739008 > 2 0.0 0 > 4 0.0 0 > 8 0.0 0 > 16 0.0 0 > 32 0.0 0 > 64 0.0 0 >128 0.0 0 >256 0.0 0 >512 0.0 0 > 1024 80.140132984739 > 2048 0.00078 29 > 4096 0.00658245 > 8192 0.00290108 > 16384 0.00252 94 > 32768 0.00250 93 > 65536 0.00016 6 > 131072 0.00185 69 > 262144 0.8 3 > 524288 0.8 3 > 1048576 0.8 3 > > confirming that when the result changes it generally does so by 1usec. > > Applying just the second patch, I find that clock_gettime on this > old hardware seems to be limited to 1us resolution, but on my more > modern machines (mac M1, x86_64) it can tick at 40ns or less. > Even a raspberry pi 4 shows > > $ ./pg_test_timing > Testing timing overhead for 3 seconds. > Per loop time including overhead: 69.12 ns > Histogram of timing durations: > < ns % of total count > 1 0.0 0 > 2 0.0 0 > 4 0.0 0 > 8 0.0 0 > 16 0.0 0 > 32 0.0 0 > 64 37.59583 16317040 >128 62.38568 27076131 >256 0.01674 7265 >512 0.2 8 > 1024 0.0 0 > 2048 0.0 0 > 4096 0.00153662 > 8192 0.00019 83 > 16384 0.1 3 > 32768 0.1 5 > > suggesting that the clock_gettime resolution is better than 64 ns. > > So I concur with Hannu that it's time to adjust pg_test_timing to > resolve nanoseconds not microseconds. I gather he's created a > patch that does more than mine below, so I'll wait for that. > > regards, tom lane > diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c index c29d6f8762..20b2785f50 100644 --- a/src/bin/pg_test_timing/pg_test_timing.c +++ b/src/bin/pg_test_timing/pg_test_timing.c @@ -19,9 +19,12 @@ static void handle_args(int argc, char *argv[]); static uint64 test_timing(unsigned int duration); static void output(uint64 loop_count); -/* record duration in powers of 2 microseconds */ +/* record duration in powers of 2 nanoseconds */ long long int histogram[32]; +/* record duration of first 128 ns directly */ +long long int direct_histogram[128]; + int main(int argc, char *argv[]) { @@ -130,10 +133,10 @@ test_timing(unsigned int duration) end_time, temp; - total_time = duration > 0 ? duration * INT64CONST(100) : 0; + total_time = duration > 0 ? duration * INT64CONST(10) : 0; INSTR_TIME_SET_CURRENT(start_time); - cur = INSTR_TIME_GET_MICROSEC(start_time); + cur = INSTR_TIME_GET_NANOSEC(start_time); while (time_elapsed < total_time) { @@ -142,7 +145,7 @@ test_timing(unsigned int duration) prev = cur; INSTR_TIME_SET_CURRENT(temp); - cur = INSTR_TIME_GET_MICROSEC(temp); + cur = INSTR_TIME_GET_NANOSEC(temp); diff = cur - prev; /* Did time go backwards? */ @@
Re: What is a typical precision of gettimeofday()?
Another thing I changed in reporting was to report <= ns instead of < ns This was inspired by not wanting to report "zero ns" as "< 1 ns" and easiest was to change them all to <= On Thu, Jun 20, 2024 at 12:41 PM Hannu Krosing wrote: > > (resending to list and other CC:s ) > > Hi Tom > > This is my current patch which also adds running % and optionally uses > faster way to count leading zeros, though I did not see a change from > that. > > It also bucketizes first 128 ns to get better overview of exact behaviour. > > We may want to put reporting this behind a flag > > --- > Hannu > > On Wed, Jun 19, 2024 at 6:36 PM Tom Lane wrote: > > > > Peter Eisentraut writes: > > > AFAICT, pg_test_timing doesn't use gettimeofday(), so this doesn't > > > really address the original question. > > > > It's not exactly hard to make it do so (see attached). > > > > I tried this on several different machines, and my conclusion is that > > gettimeofday() reports full microsecond precision on any platform > > anybody is likely to be running PG on today. Even my one surviving > > pet dinosaur, NetBSD 10 on PowerPC Mac (mamba), shows results like > > this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 901.41 ns > > Histogram of timing durations: > > < us % of total count > > 1 10.46074 348148 > > 2 89.514952979181 > > 4 0.00574191 > > 8 0.00430143 > > 16 0.00691230 > > 32 0.00376125 > > 64 0.00012 4 > >128 0.00303101 > >256 0.00027 9 > >512 0.9 3 > > 1024 0.9 3 > > > > I also modified pg_test_timing to measure nanoseconds not > > microseconds (second patch attached), and got this: > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 805.50 ns > > Histogram of timing durations: > > < ns % of total count > > 1 19.84234 739008 > > 2 0.0 0 > > 4 0.0 0 > > 8 0.0 0 > > 16 0.0 0 > > 32 0.0 0 > > 64 0.0 0 > >128 0.0 0 > >256 0.0 0 > >512 0.0 0 > > 1024 80.140132984739 > > 2048 0.00078 29 > > 4096 0.00658245 > > 8192 0.00290108 > > 16384 0.00252 94 > > 32768 0.00250 93 > > 65536 0.00016 6 > > 131072 0.00185 69 > > 262144 0.8 3 > > 524288 0.8 3 > > 1048576 0.8 3 > > > > confirming that when the result changes it generally does so by 1usec. > > > > Applying just the second patch, I find that clock_gettime on this > > old hardware seems to be limited to 1us resolution, but on my more > > modern machines (mac M1, x86_64) it can tick at 40ns or less. > > Even a raspberry pi 4 shows > > > > $ ./pg_test_timing > > Testing timing overhead for 3 seconds. > > Per loop time including overhead: 69.12 ns > > Histogram of timing durations: > > < ns % of total count > > 1 0.0 0 > > 2 0.0 0 > > 4 0.0 0 > > 8 0.0 0 > > 16 0.0 0 > > 32 0.0 0 > > 64 37.59583 16317040 > >128 62.38568 27076131 > >256 0.01674 7265 > >512 0.2 8 > > 1024 0.0 0 > > 2048 0.0 0 > > 4096 0.00153662 > > 8192 0.00019 83 > > 16384 0.1 3 > > 32768 0.1 5 > > > > suggesting that the clock_gettime resolution is better than 64 ns. > > > > So I concur with Hannu that it's time to adjust pg_test_timing to > > resolve nanoseconds not microseconds. I gather he's created a > > patch that does more than mine below, so I'll wait for that. > > > > regards, tom lane > >
Re: Extension security improvement: Add support for extensions with an owned schema
On Wed, 19 Jun 2024 at 17:22, David G. Johnston wrote: > > On Wed, Jun 19, 2024 at 8:19 AM Jelte Fennema-Nio wrote: > >> >> Because part of it would >> only be relevant once we support upgrading from PG18. So for now the >> upgrade_code I haven't actually run. > > > Does it apply against v16? If so, branch off there, apply it, then upgrade > from the v16 branch to master. I realized it's possible to do an "upgrade" with pg_upgrade from v17 to v17. So I was able to test both the pre and post PG18 upgrade logic manually by changing the version in this line: if (fout->remoteVersion >= 18) As expected the new pg_upgrade code was severely broken. Attached is a new patch where the pg_upgrade code now actually works. v3-0001-Add-support-for-extensions-with-an-owned-schema.patch Description: Binary data
Re: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm
Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra < tomas.von...@enterprisedb.com> escreveu: > Hi, > > While running valgrind on 32-bit ARM (rpi5 with debian), I got this > really strange report: > > > ==25520== Use of uninitialised value of size 4 > ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108) > ==25520==by 0x4D7826F: ??? (sigrestorer.S:64) > ==25520== Uninitialised value was created by a heap allocation > ==25520==at 0x8FB780: palloc (mcxt.c:1340) > ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289) > ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331) > ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64) > ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > ==25520==by 0x3EF73F: ExecProcNode (executor.h:274) > ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703) > ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > ==25520==by 0x3C47DB: ExecProcNode (executor.h:274) > ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561) > ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364) > ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179) > ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274) > ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646) > ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363) > ==25520==by 0x3A644B: ExecutorRun (execMain.c:304) > ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924) > ==25520==by 0x6972F7: PortalRun (pquery.c:768) > ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274) > ==25520== > { > >Memcheck:Value4 >fun:wrapper_handler >obj:/usr/lib/arm-linux-gnueabihf/libc.so.6 > } > **25520** Valgrind detected 1 error(s) during execution of "select > count(*) from > **25520** (select * from tenk1 x order by x.thousand, x.twothousand, > x.fivethous) x > **25520** left join > **25520** (select * from tenk1 y order by y.unique2) y > **25520** on x.thousand = y.unique2 and x.twothousand = y.hundred and > x.fivethous = y.unique2;" > > > I'm mostly used to weird valgrind stuff on this platform, but it's > usually about libarmmmem and (possibly) thinking it might access > undefined stuff when calculating checksums etc. > > This seems somewhat different, so I wonder if it's something real? It seems like a false positive to me. According to valgrind's documentation: https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value " This can lead to false positive errors, as the shared memory can be initialised via a first mapping, and accessed via another mapping. The access via this other mapping will have its own V bits, which have not been changed when the memory was initialised via the first mapping. The bypass for these false positives is to use Memcheck's client requests VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform Memcheck about what your program does (or what another process does) to these shared memory mappings. " best regards, Ranier Vilela
Re: Conflict Detection and Resolution
On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila wrote: > On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat > wrote: > > > > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila > wrote: > >> > >> > I doubt that it would be that simple. The application will have to > intervene and tell one of the employees that their reservation has failed. > It looks natural that the first one to reserve the room should get the > reservation, but implementing that is more complex than resolving a > conflict in the database. In fact, mostly it will be handled outside > database. > >> > > >> > >> Sure, the application needs some handling but I have tried to explain > >> with a simple way that comes to my mind and how it can be realized > >> with db involved. This is a known conflict detection method but note > >> that I am not insisting to have "earliest_timestamp_wins". Even, if we > >> want this we can have a separate discussion on this and add it later. > >> > > > > It will be good to add a minimal set of conflict resolution strategies > to begin with, while designing the feature for extensibility. I imagine the > first version might just detect the conflict and throw error or do nothing. > That's already two simple conflict resolution strategies with minimal > efforts. We can add more complicated ones incrementally. > > > > Agreed, splitting the work into multiple patches would help us to > finish the easier ones first. > > I have thought to divide it such that in the first patch, we detect > conflicts like 'insert_exists', 'update_differ', 'update_missing', and > 'delete_missing' (the definition of each could be found in the initial > email [1]) and throw an ERROR or write them in LOG. Various people > agreed to have this as a separate committable work [2]. This can help > users to detect and monitor the conflicts in a better way. I have > intentionally skipped update_deleted as it would require more > infrastructure and it would be helpful even without that. > Since we are in the initial months of release, it will be good to take a stock of whether the receiver receives all the information needed for most (if not all) of the conflict detection and resolution strategies. If there are any missing pieces, we may want to add those in PG18 so that improved conflict detection and resolution on a higher version receiver can still work. > > In the second patch, we can implement simple built-in resolution > strategies like apply and skip (which can be named as remote_apply and > keep_local, see [3][4] for details on these strategies) with ERROR or > LOG being the default strategy. We can allow these strategies to be > configured at the global and table level. > > In the third patch, we can add monitoring capability for conflicts and > resolutions as mentioned by Jonathan [5]. Here, we can have stats like > how many conflicts of a particular type have happened. > That looks like a plan. Thanks for chalking it out. -- Best Wishes, Ashutosh Bapat
Re: suspicious valgrind reports about radixtree/tidstore on arm64
Em qua., 19 de jun. de 2024 às 20:52, Tom Lane escreveu: > I wrote: > > I hypothesize that the reason we're not seeing equivalent failures > > on x86_64 is one of > > > 1. x86_64 valgrind is stupider than aarch64, and fails to track that > > the contents of the SIMD registers are only partially defined. > > > 2. x86_64 valgrind is smarter than aarch64, and is able to see > > that the "mask off invalid entries" step removes all the > > potentially-uninitialized bits. > > Hah: it's the second case. If I patch radixtree.h as attached, > then x86_64 valgrind complains about > > ==00:00:00:32.759 247596== Conditional jump or move depends on > uninitialised value(s) > ==00:00:00:32.759 247596==at 0x52F668: local_ts_node_16_search_eq > (radixtree.h:1018) > > showing that it knows that the result of vector8_highbit_mask is > only partly defined. I wouldn't be surprised if *RT_NODE_16_GET_INSERTPOS* (src/include/lib/radixtree.h), does not suffer from the same problem? Even with Assert trying to protect. Does the fix not apply here too? best regards, Ranier Vilela
Re: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm
On 6/20/24 13:32, Ranier Vilela wrote: > Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra < > tomas.von...@enterprisedb.com> escreveu: > >> Hi, >> >> While running valgrind on 32-bit ARM (rpi5 with debian), I got this >> really strange report: >> >> >> ==25520== Use of uninitialised value of size 4 >> ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108) >> ==25520==by 0x4D7826F: ??? (sigrestorer.S:64) >> ==25520== Uninitialised value was created by a heap allocation >> ==25520==at 0x8FB780: palloc (mcxt.c:1340) >> ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289) >> ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331) >> ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64) >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) >> ==25520==by 0x3EF73F: ExecProcNode (executor.h:274) >> ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703) >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) >> ==25520==by 0x3C47DB: ExecProcNode (executor.h:274) >> ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561) >> ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364) >> ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179) >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) >> ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274) >> ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646) >> ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363) >> ==25520==by 0x3A644B: ExecutorRun (execMain.c:304) >> ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924) >> ==25520==by 0x6972F7: PortalRun (pquery.c:768) >> ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274) >> ==25520== >> { >> >>Memcheck:Value4 >>fun:wrapper_handler >>obj:/usr/lib/arm-linux-gnueabihf/libc.so.6 >> } >> **25520** Valgrind detected 1 error(s) during execution of "select >> count(*) from >> **25520** (select * from tenk1 x order by x.thousand, x.twothousand, >> x.fivethous) x >> **25520** left join >> **25520** (select * from tenk1 y order by y.unique2) y >> **25520** on x.thousand = y.unique2 and x.twothousand = y.hundred and >> x.fivethous = y.unique2;" >> >> >> I'm mostly used to weird valgrind stuff on this platform, but it's >> usually about libarmmmem and (possibly) thinking it might access >> undefined stuff when calculating checksums etc. >> >> This seems somewhat different, so I wonder if it's something real? > > It seems like a false positive to me. > > According to valgrind's documentation: > https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value > > " This can lead to false positive errors, as the shared memory can be > initialised via a first mapping, and accessed via another mapping. The > access via this other mapping will have its own V bits, which have not been > changed when the memory was initialised via the first mapping. The bypass > for these false positives is to use Memcheck's client requests > VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform > Memcheck about what your program does (or what another process does) to > these shared memory mappings. " > But that's about shared memory, and the report has nothing to do with shared memory AFAICS. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: jsonapi type fixups
On 2024-06-18 Tu 7:48 AM, Peter Eisentraut wrote: I have this patch series that fixes up the types of the new incremental JSON API a bit. Specifically, it uses "const" throughout so that the top-level entry points such as pg_parse_json_incremental() can declare their arguments as const char * instead of just char *. This just works, it doesn't require any new casting tricks. In fact, it removes a few unconstify() calls. Also, a few arguments and variables that relate to object sizes should be size_t rather than int. At this point, this mainly makes the API better self-documenting. I don't think it actually works to parse larger than 2 GB chunks (not tested). I think this is mostly OK. The change at line 1857 of jsonapi.c looks dubious, though. The pointer variable p looks anything but constant. Perhaps I'm misunderstanding. It would also be nice to reword the comment at line 3142 of jsonfuncs.c, so it can still fit on one line. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: confusing valgrind report about tuplestore+wrapper_handler (?) on 32-bit arm
Em qui., 20 de jun. de 2024 às 08:54, Tomas Vondra < tomas.von...@enterprisedb.com> escreveu: > > > On 6/20/24 13:32, Ranier Vilela wrote: > > Em qui., 20 de jun. de 2024 às 07:28, Tomas Vondra < > > tomas.von...@enterprisedb.com> escreveu: > > > >> Hi, > >> > >> While running valgrind on 32-bit ARM (rpi5 with debian), I got this > >> really strange report: > >> > >> > >> ==25520== Use of uninitialised value of size 4 > >> ==25520==at 0x94A550: wrapper_handler (pqsignal.c:108) > >> ==25520==by 0x4D7826F: ??? (sigrestorer.S:64) > >> ==25520== Uninitialised value was created by a heap allocation > >> ==25520==at 0x8FB780: palloc (mcxt.c:1340) > >> ==25520==by 0x913067: tuplestore_begin_common (tuplestore.c:289) > >> ==25520==by 0x91310B: tuplestore_begin_heap (tuplestore.c:331) > >> ==25520==by 0x3EA717: ExecMaterial (nodeMaterial.c:64) > >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > >> ==25520==by 0x3EF73F: ExecProcNode (executor.h:274) > >> ==25520==by 0x3F0637: ExecMergeJoin (nodeMergejoin.c:703) > >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > >> ==25520==by 0x3C47DB: ExecProcNode (executor.h:274) > >> ==25520==by 0x3C4D4F: fetch_input_tuple (nodeAgg.c:561) > >> ==25520==by 0x3C8233: agg_retrieve_direct (nodeAgg.c:2364) > >> ==25520==by 0x3C7E07: ExecAgg (nodeAgg.c:2179) > >> ==25520==by 0x3B2FF7: ExecProcNodeFirst (execProcnode.c:464) > >> ==25520==by 0x3A5EC3: ExecProcNode (executor.h:274) > >> ==25520==by 0x3A8FBF: ExecutePlan (execMain.c:1646) > >> ==25520==by 0x3A6677: standard_ExecutorRun (execMain.c:363) > >> ==25520==by 0x3A644B: ExecutorRun (execMain.c:304) > >> ==25520==by 0x6976D3: PortalRunSelect (pquery.c:924) > >> ==25520==by 0x6972F7: PortalRun (pquery.c:768) > >> ==25520==by 0x68FA1F: exec_simple_query (postgres.c:1274) > >> ==25520== > >> { > >> > >>Memcheck:Value4 > >>fun:wrapper_handler > >>obj:/usr/lib/arm-linux-gnueabihf/libc.so.6 > >> } > >> **25520** Valgrind detected 1 error(s) during execution of "select > >> count(*) from > >> **25520** (select * from tenk1 x order by x.thousand, x.twothousand, > >> x.fivethous) x > >> **25520** left join > >> **25520** (select * from tenk1 y order by y.unique2) y > >> **25520** on x.thousand = y.unique2 and x.twothousand = y.hundred and > >> x.fivethous = y.unique2;" > >> > >> > >> I'm mostly used to weird valgrind stuff on this platform, but it's > >> usually about libarmmmem and (possibly) thinking it might access > >> undefined stuff when calculating checksums etc. > >> > >> This seems somewhat different, so I wonder if it's something real? > > > > It seems like a false positive to me. > > > > According to valgrind's documentation: > > https://valgrind.org/docs/manual/mc-manual.html#mc-manual.value > > > > " This can lead to false positive errors, as the shared memory can be > > initialised via a first mapping, and accessed via another mapping. The > > access via this other mapping will have its own V bits, which have not > been > > changed when the memory was initialised via the first mapping. The bypass > > for these false positives is to use Memcheck's client requests > > VALGRIND_MAKE_MEM_DEFINED and VALGRIND_MAKE_MEM_UNDEFINED to inform > > Memcheck about what your program does (or what another process does) to > > these shared memory mappings. " > > > > But that's about shared memory, and the report has nothing to do with > shared memory AFAICS. > You can try once: Selecting --expensive-definedness-checks=yes causes Memcheck to use the most accurate analysis possible. This minimises false error rates but can cause up to 30% performance degradation. I did a search through my reports and none refer to this particular source. best regards, Ranier Vilela
Custom TupleTableSlotOps while Initializing Custom Scan
Hello Team, Good Day, I have been working on adding a CustomScanState object in the executor state in my project. As part of CustomScanState, I execute queries and store their results in the Tuplestorestate object. After storing all tuples in the Tuplestorestate, I retrieve each tuple and place it in the TupleTableSlot using the tuplestore_gettupleslot() function. However, I encounter an error: *"trying to store a minimal tuple into the wrong type of slot."* Upon debugging, I discovered that the TupleTableSlot only holds virtual tuples (tupleTableSlot->tts_ops is set to TTSOpsVirtual). In contrast, tuplestore_gettupleslot() calls ExecStoreMinimalTuple(), which expects TupleTableSlotOps of type TTSOpsMinimalTuple. Further investigation revealed that in the ExecInitCustomScan() function within the nodeCustom.c source file, where ScanTupleSlot and ResultTupleSlots are initialized, users can choose custom slots by setting slotOps in CustomScanState. We initialize the ScanTupleSlot based on user-specified slotOps, but for ResultTupleSlot, we proceed with TTSOpsVirtual instead of the custom slotOps, which is causing the issue. Is this behavior expected? Is there a way to store tuples in slots according to the TupleTableSlot type? I found a function ExecForceStoreMinimalTuple() which can be used in my case. We need to pass the MinimalTuple to this function, but I was unable to find a way to fetch the tuple from tuple storestate. We do have tuplestore_gettuple() function to get the minimal tuple but it is a static function, is there any other function like that? Below is the code snippet of ExecInitCustomScan() , for simplicity I removed some code in the function. I took it from the nodeCustom.c file in the PG source. CustomScanState * ExecInitCustomScan(CustomScan *cscan, EState *estate, int eflags) { CustomScanState *css; const TupleTableSlotOps *slotOps; css = castNode(CustomScanState, cscan->methods->CreateCustomScanState(cscan)); // --- CODE STARTED /* * Use a custom slot if specified in CustomScanState or use virtual slot * otherwise. */ slotOps = css->slotOps; if (!slotOps) slotOps = &TTSOpsVirtual; if (cscan->custom_scan_tlist != NIL || scan_rel == NULL) { ExecInitScanTupleSlot(estate, &css->ss, scan_tupdesc, slotOps); // Here we are using slotOps provided by user } else { ExecInitScanTupleSlot(estate, &css->ss, RelationGetDescr(scan_rel), slotOps); // Here we are using slotOps provided by user } ExecInitResultTupleSlotTL(&css->ss.ps, &*TTSOpsVirtual*); // Here we have hard coded TTSOpsVirtual // -- CODE ENDED --- }
Re: suspicious valgrind reports about radixtree/tidstore on arm64
On Thu, Jun 20, 2024 at 4:58 PM John Naylor wrote: > > On Thu, Jun 20, 2024 at 8:12 AM Masahiko Sawada wrote: > > > On Thu, Jun 20, 2024 at 7:54 AM Tom Lane wrote: > > > > > > I don't know if there's any reason why the current order > > > is preferable.) > > > > IIUC there is no particular reason for the current order in RT_NODE_48. > > Yeah. I found that simply swapping them enables clang to avoid > double-initialization, but gcc still can't figure it out and must be > told to stop at slot_idxs[]. I'd prefer to do it that way and document > that slot_idxs is purposefully the last member of the fixed part of > the struct. +1 Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: jsonapi type fixups
On 2024-06-20 Th 8:05 AM, Andrew Dunstan wrote: On 2024-06-18 Tu 7:48 AM, Peter Eisentraut wrote: I have this patch series that fixes up the types of the new incremental JSON API a bit. Specifically, it uses "const" throughout so that the top-level entry points such as pg_parse_json_incremental() can declare their arguments as const char * instead of just char *. This just works, it doesn't require any new casting tricks. In fact, it removes a few unconstify() calls. Also, a few arguments and variables that relate to object sizes should be size_t rather than int. At this point, this mainly makes the API better self-documenting. I don't think it actually works to parse larger than 2 GB chunks (not tested). I think this is mostly OK. The change at line 1857 of jsonapi.c looks dubious, though. The pointer variable p looks anything but constant. Perhaps I'm misunderstanding. Ignore this comment, moment of brain fade. Of course it's the string that's constant, not the pointer. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Pgoutput not capturing the generated columns
On Thu, 20 Jun 2024 at 12:52, vignesh C wrote: > > On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut wrote: > > > > On 19.06.24 13:22, Shubham Khanna wrote: > > > All the comments are handled. > > > > > > The attached Patch contains all the suggested changes. > > > > Please also take a look at the proposed patch for virtual generated > > columns [0] and consider how that would affect your patch. I think your > > feature can only replicate *stored* generated columns. So perhaps the > > documentation and terminology in your patch should reflect that. > > This patch is unable to manage virtual generated columns because it > stores NULL values for them. Along with documentation the initial sync > command being generated also should be changed to sync data > exclusively for stored generated columns, omitting virtual ones. I > suggest treating these changes as a separate patch(0003) for future > merging or a separate commit, depending on the order of patch > acceptance. > I have addressed the issue and updated the documentation accordingly. And created a new 0003 patch. Please refer to v9-0003 patch for the same in [1]. [1]: https://www.postgresql.org/message-id/CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S%2BjRzzzXo2RavNWw%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pluggable cumulative statistics
Hi, On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > Hi all, > > 2) Make the shmem pgstats pluggable so as it is possible for extensions > to register their own stats kinds. Thanks for the patch! I like the idea of having custom stats (it has also been somehow mentioned in [1]). > 2) is actually something that can be used for more things than > just pg_stat_statements, because people love extensions and > statistics (spoiler: I do). +1 > * Making custom stats data persistent is an interesting problem, and > there are a couple of approaches I've considered: > ** Allow custom kinds to define callbacks to read and write data from > a source they'd want, like their own file through a fd. This has the > disadvantage to remove the benefit of c) above. > ** Store everything in the existing stats file, adding one type of > entry like 'S' and 'N' with a "custom" type, where the *name* of the > custom stats kind is stored instead of its ID computed from shared > memory. What about having 2 files? - One is the existing stats file - One "predefined" for all the custom stats (so what you've done minus the in-core stats). This one would not be configurable and the extensions will not need to know about it. Would that remove the benefit from c) that you mentioned up-thread? [1]: https://www.postgresql.org/message-id/20220818195124.c7ipzf6c5v7vxymc%40awork3.anarazel.de Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Conflict Detection and Resolution
On Thu, Jun 20, 2024 at 5:06 PM Ashutosh Bapat wrote: > > On Thu, Jun 20, 2024 at 3:21 PM Amit Kapila wrote: >> >> On Wed, Jun 19, 2024 at 2:51 PM Ashutosh Bapat >> wrote: >> > >> > On Wed, Jun 19, 2024 at 12:03 PM Amit Kapila >> > wrote: >> >> >> >> > I doubt that it would be that simple. The application will have to >> >> > intervene and tell one of the employees that their reservation has >> >> > failed. It looks natural that the first one to reserve the room should >> >> > get the reservation, but implementing that is more complex than >> >> > resolving a conflict in the database. In fact, mostly it will be >> >> > handled outside database. >> >> > >> >> >> >> Sure, the application needs some handling but I have tried to explain >> >> with a simple way that comes to my mind and how it can be realized >> >> with db involved. This is a known conflict detection method but note >> >> that I am not insisting to have "earliest_timestamp_wins". Even, if we >> >> want this we can have a separate discussion on this and add it later. >> >> >> > >> > It will be good to add a minimal set of conflict resolution strategies to >> > begin with, while designing the feature for extensibility. I imagine the >> > first version might just detect the conflict and throw error or do >> > nothing. That's already two simple conflict resolution strategies with >> > minimal efforts. We can add more complicated ones incrementally. >> > >> >> Agreed, splitting the work into multiple patches would help us to >> finish the easier ones first. >> >> I have thought to divide it such that in the first patch, we detect >> conflicts like 'insert_exists', 'update_differ', 'update_missing', and >> 'delete_missing' (the definition of each could be found in the initial >> email [1]) and throw an ERROR or write them in LOG. Various people >> agreed to have this as a separate committable work [2]. This can help >> users to detect and monitor the conflicts in a better way. I have >> intentionally skipped update_deleted as it would require more >> infrastructure and it would be helpful even without that. > > > Since we are in the initial months of release, it will be good to take a > stock of whether the receiver receives all the information needed for most > (if not all) of the conflict detection and resolution strategies. If there > are any missing pieces, we may want to add those in PG18 so that improved > conflict detection and resolution on a higher version receiver can still work. > Good point. This can help us to detect conflicts if required even when we move to a higher version. As we continue to discuss/develop the features, I hope we will be able to see any missing pieces. >> >> >> In the second patch, we can implement simple built-in resolution >> strategies like apply and skip (which can be named as remote_apply and >> keep_local, see [3][4] for details on these strategies) with ERROR or >> LOG being the default strategy. We can allow these strategies to be >> configured at the global and table level. >> >> In the third patch, we can add monitoring capability for conflicts and >> resolutions as mentioned by Jonathan [5]. Here, we can have stats like >> how many conflicts of a particular type have happened. > > > That looks like a plan. Thanks for chalking it out. > Thanks! -- With Regards, Amit Kapila.
Re: Logical Replication of sequences
On Wed, Jun 19, 2024 at 8:33 PM vignesh C wrote: > > On Tue, 18 Jun 2024 at 16:10, Amit Kapila wrote: > > > > > > Agreed and I am not sure which is better because there is a value in > > keeping the state name the same for both sequences and tables. We > > probably need more comments in code and doc updates to make the > > behavior clear. We can start with the sequence state as 'init' for > > 'needs-to-be-sycned' and 'ready' for 'synced' and can change if others > > feel so during the review. > > Here is a patch which does the sequence synchronization in the > following lines from the above discussion: > Thanks for summarizing the points discussed. I would like to confirm whether the patch replicates new sequences that are created implicitly/explicitly for a publication defined as ALL SEQUENCES. -- With Regards, Amit Kapila.
Visibility bug with prepared transaction with subtransactions on standby
Hi, Konstantin and I found an MVCC bug with: - a prepared transaction, - which has a subtransaction, - on a hot standby, - after starting the standby from a shutdown checkpoint. See the test case in the attached patch to demonstrate this. The last query in the new test returns incorrect result on master, causing the test to fail. The problem --- When you shut down a primary with a prepared transaction, and start a hot standby server from the shutdown checkpoint, the hot standby server goes through this code at startup: if (wasShutdown) oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids); else oldestActiveXID = checkPoint.oldestActiveXid; Assert(TransactionIdIsValid(oldestActiveXID)); /* Tell procarray about the range of xids it has to deal with */ ProcArrayInitRecovery(XidFromFullTransactionId(TransamVariables->nextXid)); /* * Startup subtrans only. CLOG, MultiXact and commit timestamp * have already been started up and other SLRUs are not maintained * during recovery and need not be started yet. */ StartupSUBTRANS(oldestActiveXID); /* * If we're beginning at a shutdown checkpoint, we know that * nothing was running on the primary at this point. So fake-up an * empty running-xacts record and use that here and now. Recover * additional standby state for prepared transactions. */ if (wasShutdown) { RunningTransactionsData running; TransactionId latestCompletedXid; /* * Construct a RunningTransactions snapshot representing a * shut down server, with only prepared transactions still * alive. We're never overflowed at this point because all * subxids are listed with their parent prepared transactions. */ running.xcnt = nxids; running.subxcnt = 0; running.subxid_overflow = false; running.nextXid = XidFromFullTransactionId(checkPoint.nextXid); running.oldestRunningXid = oldestActiveXID; latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid); TransactionIdRetreat(latestCompletedXid); Assert(TransactionIdIsNormal(latestCompletedXid)); running.latestCompletedXid = latestCompletedXid; running.xids = xids; ProcArrayApplyRecoveryInfo(&running); StandbyRecoverPreparedTransactions(); } The problem is that the RunningTransactions snapshot constructed here does not include subtransaction XIDs of the prepared transactions, only the main XIDs. Because of that, snapshots taken in the standby will consider the sub-XIDs as aborted rather than in-progress. That leads to two problems if the prepared transaction is later committed: - We will incorrectly set hint bits on tuples inserted/deleted by the subtransactions, which leads to incorrect query results later if the prepared transaction is committed. - If you acquire an MVCC snapshot and hold to it while the prepared transaction commits, the subtransactions will suddenly become visible to the old snapshot. History --- StandbyRecoverPreparedTransactions has this comment: * The lack of calls to SubTransSetParent() calls here is by design; * those calls are made by RecoverPreparedTransactions() at the end of recovery * for those xacts that need this. I think that's wrong; it really should update pg_subtrans. It used to, a long time ago, but commit 49e92815497 changed it. Reading the discussions that led to that change, seems that we somehow didn't realize that it's important to distinguish between in-progress and aborted transactions in a standby. On that thread, Nikhil posted [1] a test case that is almost exactly the same test case that I used to find this, but apparently the visibility in standby in that scenario was not tested thoroughly back then. [1] https://www.postgresql.org/message-id/CAMGcDxde4XjDyTjGvZCPVQROpXw1opfpC0vjpCkzc1pcQBqvrg%40mail.gmail.com Fix --- Attached is a patch to fix this, wi
Re: Visibility bug with prepared transaction with subtransactions on standby
On 20/06/2024 16:41, Heikki Linnakangas wrote: Attached is a patch to fix this, with a test case. The patch did not compile, thanks to a last-minute change in a field name. Here's a fixed version. -- Heikki Linnakangas Neon (https://neon.tech) From ccb0bd36619f03055382205eddd7a0e9b64bba95 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 20 Jun 2024 15:49:05 +0300 Subject: [PATCH v2 1/2] tests: Trim newline from result returned by BackgroundPsql->query This went unnoticed, because only a few existing callers of BackgroundPsql->query used the result, and the ones that did were not bothered by an extra newline. I noticed because I was about to add a new test that checks the result. --- src/test/perl/PostgreSQL/Test/BackgroundPsql.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm index 4091c311b8..041b3dfa7d 100644 --- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm +++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm @@ -217,13 +217,13 @@ sub query my $banner = "background_psql: QUERY_SEPARATOR"; $self->{stdin} .= "$query\n;\n\\echo $banner\n"; - pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner/); + pump_until($self->{run}, $self->{timeout}, \$self->{stdout}, qr/$banner\n/); die "psql query timed out" if $self->{timeout}->is_expired; $output = $self->{stdout}; # remove banner again, our caller doesn't care - $output =~ s/\n$banner$//s; + $output =~ s/\n$banner\n$//s; # clear out output for the next query $self->{stdout} = ''; -- 2.39.2 From f44c5ec2d3383240c495c36e05da812791e3f5d4 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Thu, 20 Jun 2024 17:09:41 +0300 Subject: [PATCH v2 2/2] Fix MVCC bug with prepared xact with subxacts on standby We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Discussion: XXX --- src/backend/access/transam/twophase.c | 7 ++--- src/backend/access/transam/xlog.c | 14 + src/backend/storage/ipc/procarray.c | 18 +-- src/backend/storage/ipc/standby.c | 6 ++-- src/include/storage/standby.h | 10 +- src/test/recovery/t/009_twophase.pl | 45 +++ src/tools/pgindent/typedefs.list | 1 + 7 files changed, 84 insertions(+), 17 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index bf451d42ff..9a8257fcaf 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -2035,9 +2035,8 @@ PrescanPreparedTransactions(TransactionId **xids_p, int *nxids_p) * This is never called at the end of recovery - we use * RecoverPreparedTransactions() at that point. * - * The lack of calls to SubTransSetParent() calls here is by design; - * those calls are made by RecoverPreparedTransactions() at the end of recovery - * for those xacts that need this. + * This updates pg_subtrans, so that any subtransactions will be correctly + * seen as in-progress in snapshots taken during recovery. */ void StandbyRecoverPreparedTransactions(void) @@ -2057,7 +2056,7 @@ StandbyRecoverPreparedTransactions(void) buf = ProcessTwoPhaseBuffer(xid, gxact->prepare_start_lsn, - gxact->ondisk, false, false); + gxact->ondisk, true, false); if (buf != NULL) pfree(buf); } diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 330e058c5f..3d084d1c38 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5777,6 +5777,9 @@ StartupXLOG(void) RunningTransactionsData running; TransactionId latestCompletedXid; +/* Update pg_subtrans entries for any prepared transactions */ +StandbyRecoverPreparedTransactions(); + /* * Construct a RunningTransactions snapshot representing a * shut down server, with only prepared transactions still @@ -5785,7 +5788,7 @@ StartupXLOG(void) */ running.xcnt = nxids; running.subxcnt = 0; -running.subxid_overflow = false; +running.subxid_status = SUBXIDS_IN_SUBTRANS; running.nextXid = XidFromFullTransactionId(checkPoint.nextXid); running.oldestRunningXid = oldestActiveXID; latestCompletedXid = XidFromFullTransactionId(checkPoint.nextXid); @@ -5795,8 +5798,6 @@ StartupXLOG(void) running.xids = xids; ProcA
Re: Pluggable cumulative statistics
Hi, On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote: > On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: > > - How should the persistence of the custom stats be achieved? > > Callbacks to give custom stats kinds a way to write/read their data, > > push everything into a single file, or support both? > > - Should this do like custom RMGRs and assign to each stats kinds ID > > that are set in stone rather than dynamic ones? > These two questions have been itching me in terms of how it would work > for extension developers, after noticing that custom RMGRs are used > more than I thought: > https://wiki.postgresql.org/wiki/CustomWALResourceManagers > > The result is proving to be nicer, shorter by 300 lines in total and > much simpler when it comes to think about the way stats are flushed > because it is possible to achieve the same result as the first patch > set without manipulating any of the code paths doing the read and > write of the pgstats file. I think it makes sense to follow the same "behavior" as the custom wal resource managers. That, indeed, looks much more simpler than v1. > In terms of implementation, pgstat.c's KindInfo data is divided into > two parts, for efficiency: > - The exiting in-core stats with designated initializers, renamed as > built-in stats kinds. > - The custom stats kinds are saved in TopMemoryContext, Agree that a backend lifetime memory area is fine for that purpose. > and can only > be registered with shared_preload_libraries. The patch reserves a set > of 128 harcoded slots for all the custom kinds making the lookups for > the KindInfos quite cheap. + MemoryContextAllocZero(TopMemoryContext, + sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE); and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total. I had a quick look at the patches (have in mind to do more): > With that in mind, the patch set is more pleasant to the eye, and the > attached v2 consists of: > - 0001 and 0002 are some cleanups, same as previously to prepare for > the backend-side APIs. 0001 and 0002 look pretty straightforward at a quick look. > - 0003 adds the backend support to plug-in custom stats. 1 === It looks to me that there is a mix of "in core" and "built-in" to name the non custom stats. Maybe it's worth to just use one? As I can see (and as you said above) this is mainly inspired by the custom resource manager and 2 === and 3 === are probably copy/paste consequences. 2 === + if (pgstat_kind_custom_infos[idx] != NULL && + pgstat_kind_custom_infos[idx]->name != NULL) + ereport(ERROR, + (errmsg("failed to register custom cumulative statistics \"%s\" with ID %u", kind_info->name, kind), +errdetail("Custom resource manager \"%s\" already registered with the same ID.", + pgstat_kind_custom_infos[idx]->name))); s/Custom resource manager/Custom cumulative statistics/ 3 === + ereport(LOG, + (errmsg("registered custom resource manager \"%s\" with ID %u", + kind_info->name, kind))); s/custom resource manager/custom cumulative statistics/ > - 0004 includes documentation. Did not look yet. > - 0005 is an example of how to use them, with a TAP test providing > coverage. Did not look yet. As I said, I've in mind to do a more in depth review. I've noted the above while doing a quick read of the patches so thought it makes sense to share them now while at it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Remove distprep
On Thu, Jun 20, 2024 at 09:29:45AM +0200, Peter Eisentraut wrote: > On 16.06.24 21:34, Noah Misch wrote: > > On Thu, Oct 05, 2023 at 05:46:46PM +0200, Peter Eisentraut wrote: > > > --- a/src/backend/Makefile > > > +++ b/src/backend/Makefile > > > > > $(top_builddir)/src/include/storage/lwlocknames.h: > > > storage/lmgr/lwlocknames.h > > > - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ > > > - cd '$(dir $@)' && rm -f $(notdir $@) && \ > > > - $(LN_S) "$$prereqdir/$(notdir $<)" . > > > + rm -f '$@' > > > + $(LN_S) ../../backend/$< '$@' > > > $(top_builddir)/src/include/utils/wait_event_types.h: > > > utils/activity/wait_event_types.h > > > - prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \ > > > - cd '$(dir $@)' && rm -f $(notdir $@) && \ > > > - $(LN_S) "$$prereqdir/$(notdir $<)" . > > > + rm -f '$@' > > > + $(LN_S) ../../backend/$< '$@' > > > > These broke the > > https://www.postgresql.org/docs/17/installation-platform-notes.html#INSTALLATION-NOTES-MINGW > > build, where LN_S='cp -pR'. On other platforms, "make LN_S='cp -pR'" > > reproduces this. Reverting the above lines fixes things. The buildfarm has > > no coverage for that build scenario (fairywren uses Meson). > > Is it just these two instances? Yes. > Commit 721856ff24b contains a few more hunks that change something about > LN_S. Are those ok? I'm guessing "make LN_S='cp -pR'" didn't have a problem with those because they have "." as the second argument. "cp -pR ../foo ." is compatible with "ln -s ../foo ." in that both are interpreting "../bar" relative to the same directory. Not so for "cp -pR ../foo bar/baz" vs. "ln -s ../foo bar/baz".
Re: jsonpath Time and Timestamp Special Cases
On Apr 29, 2024, at 20:45, David E. Wheeler wrote: > I noticed that the jsonpath date/time functions (.time() and timestamp(), et > al.) don’t support some valid but special-case PostgreSQL values, notably > `infinity`, `-infinity`, and, for times, '24:00:00`: Looking at ECMA-404[1], “The JSON data interchange syntax”, it says, of numbers: > Numeric values that cannot be represented as sequences of digits (such as > Infinity and NaN) are not permitted. So it makes sense that the JSON path standard would be the same, since such JSON explicitly cannot represent these values as numbers. Still not sure about `24:00:00` as a time, though. I presume the jsonpath standard disallows it. Best, David [1]: https://ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
Re: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
On Thu, Jun 20, 2024 at 12:17:44PM +0500, Andrey M. Borodin wrote: > On 20 Jun 2024, at 06:29, Noah Misch wrote: > > This resolves the last inplace update defect known to me. > > That’s a huge amount of work, thank you! > > Do I get it right, that inplace updates are catalog-specific and some other > OOM corruptions [0] and Standby corruptions [1] are not related to this fix. > Bot cases we observed on regular tables. In core code, inplace updates are specific to pg_class and pg_database. Adding PGXN modules, only the citus extension uses them on some other table. [0] definitely looks unrelated. > Or that might be effect of vacuum deepening corruption after observing wrong > datfrozenxid? Wrong datfrozenxid can cause premature clog truncation, which can cause "could not access status of transaction". While $SUBJECT could cause that, I think it would happen on both primary and standby. [1] seems to be about a standby lacking clog present on the primary, which is unrelated. > [0] > https://www.postgresql.org/message-id/flat/67EADE8F-AEA6-4B73-8E38-A69E5D48BAFE%40yandex-team.ru#1266dd8b898ba02686c2911e0a50ab47 > [1] > https://www.postgresql.org/message-id/flat/CAFj8pRBEFMxxFSCVOSi-4n0jHzSaxh6Ze_cZid5eG%3Dtsnn49-A%40mail.gmail.com
Re: Special-case executor expression steps for common combinations
On 10/12/23 11:48 AM, Daniel Gustafsson wrote: Thoughts? I have looked at the patch and it still applies, builds and passes the test cases and I personally think these optimizations are pretty much no-brainers that we should do and it is a pity nobody has had the time to review this patch. 1) The no-return case should help with the JIT, making jitted code faster. 2) The specialized strict steps helps with many common queries in the interpreted mode. The code itself looks really good (great work!) but I have two comments on it. 1) I think the patch should be split into two. The two different optimizations are not related at all other than that they create specialized versions of expressions steps. Having them as separate makes the commit history easier to read for future developers. 2) We could generate functions which return void rather than NULL and therefore not have to do a return at all but I am not sure that small optimization and extra clarity would be worth the hassle. The current approach with adding Assert() is ok with me. Daniel, what do you think? Andreas
Re: Special-case executor expression steps for common combinations
On 6/20/24 5:22 PM, Andreas Karlsson wrote: On 10/12/23 11:48 AM, Daniel Gustafsson wrote: Thoughts? I have looked at the patch and it still applies, builds and passes the test cases and I personally think these optimizations are pretty much no-brainers that we should do and it is a pity nobody has had the time to review this patch. Forgot to write that I am planning to also try to do so benchmarks to see if I can reproduce the speedups. :) Andreas
Re: suspicious valgrind reports about radixtree/tidstore on arm64
Ranier Vilela writes: > Em qua., 19 de jun. de 2024 às 20:52, Tom Lane escreveu: >> Hah: it's the second case. If I patch radixtree.h as attached, >> then x86_64 valgrind complains about >> ==00:00:00:32.759 247596== Conditional jump or move depends on >> uninitialised value(s) >> ==00:00:00:32.759 247596==at 0x52F668: local_ts_node_16_search_eq >> (radixtree.h:1018) >> showing that it knows that the result of vector8_highbit_mask is >> only partly defined. > I wouldn't be surprised if *RT_NODE_16_GET_INSERTPOS* > (src/include/lib/radixtree.h), > does not suffer from the same problem? Dunno, I only saw valgrind complaints in local_ts_node_16_search_eq, and Tomas reported the same. It seems moderately likely to me that this is a bug in aarch64 valgrind. Still, if it is that then people will have to deal with it for awhile yet. It won't cost us anything meaningful to work around it (he says, without having done actual measurements...) regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hannu Krosing writes: > Is there anything that could be back-patched with reasonable effort ? Afraid not. The whole thing is dependent on pg_shdepend entries that won't exist in older branches. regards, tom lane
Re: jsonpath Time and Timestamp Special Cases
On 06/20/24 10:54, David E. Wheeler wrote: > Still not sure about `24:00:00` as a time, though. I presume the jsonpath > standard disallows it. In 9075-2 9.46 "SQL/JSON path language: syntax and semantics", the behavior of the .time() and .time_tz() and similar item methods defers to the behavior of SQL's CAST. For example, .time(PS) (where PS is the optional precision spec) expects to be applied to a character string X from the JSON source, and its success/failure and result are the same as for CAST(X AS TIME PS). The fact that our CAST(X AS TIME) will succeed for '24:00:00' might be its own extension (or violation) of the spec (I haven't checked that), but given that it does, we could debate whether it violates the jsonpath spec for our jsonpath .time() to behave the same way. The same argument may also apply for ±infinity. Regards, -Chap
Re: Extension security improvement: Add support for extensions with an owned schema
On Wed, Jun 19, 2024 at 1:50 PM Jelte Fennema-Nio wrote: > I do think, even if we have this, there would be other good reasons to > use "owned schemas" for extension authors. At least the following two: > 1. To have a safe search_path that can be used in SET search_path on a > function (see also [1]). > 2. To make it easy for extension authors to avoid conflicts with other > extensions/UDFs. (1) is a very good point. (2) I don't know about one way or the other. -- Robert Haas EDB: http://www.enterprisedb.com
Re: SQL/JSON query functions context_item doc entry and type requirement
On Thu, Jun 20, 2024 at 5:46 PM Amit Langote wrote: > > On Thu, Jun 20, 2024 at 1:03 AM David G. Johnston > wrote: > > On Wed, Jun 19, 2024 at 8:29 AM jian he wrote: > >> > >> On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack wrote: > >> > > >> > Hi, > >> > > >> > On 06/17/24 02:43, Amit Langote wrote: > >> > > context_item expression can be a value of > >> > > any type that can be cast to jsonb. This includes types > >> > > such as char, text, bpchar, > >> > > character varying, and bytea (with > >> > > ENCODING UTF8), as well as any domains over these types. > >> > > >> > Reading this message in conjunction with [0] makes me think that we are > >> > really talking about a function that takes a first parameter of type > >> > jsonb, > >> > and behaves exactly that way (so any cast required is applied by the > >> > system > >> > ahead of the call). Under those conditions, this seems like an unusual > >> > sentence to add in the docs, at least until we have also documented that > >> > tan's argument can be of any type that can be cast to double precision. > >> > > >> > >> I guess it would be fine to add an unusual sentence to the docs. > >> > >> imagine a function: array_avg(anyarray) returns anyelement. > >> array_avg calculate an array's elements's avg. like > >> array('{1,2,3}'::int[]) returns 2. > >> but array_avg won't make sense if the input argument is a date array. > >> so mentioning in the doc: array_avg can accept anyarray, but anyarray > >> cannot date array. > >> seems ok. > > > > > > There is existing wording for this: > > > > "The expression can be of any JSON type, any character string type, or > > bytea in UTF8 encoding." > > > > If you add this sentence to the paragraph the link that already exists, > > which simply points the reader to this sentence, becomes redundant and > > should be removed. > > I've just posted a patch in the other thread [1] to restrict > context_item to be of jsonb type, which users would need to ensure by > adding an explicit cast if needed. I think that makes this > clarification unnecessary. > > > As for table 9.16.3 - it is unwieldy already. Lets try and make the core > > syntax shorter, not longer. We already have precedence in the subsequent > > json_table section - give each major clause item a name then below the > > table define the syntax and meaning for those names. Unlike in that > > section - which probably should be modified too - context_item should have > > its own description line. > > I had posted a patch a little while ago at [1] to render the syntax a > bit differently with each function getting its own syntax synopsis. > Resending it here; have addressed Jian He's comments. > > -- @@ -18746,6 +18752,7 @@ ERROR: jsonpath array subscript is out of bounds PASSING values. +Returns the result of applying the SQL/JSON If the path expression returns multiple SQL/JSON items, it might be necessary to wrap the result using the WITH WRAPPER clause to make it a valid JSON string. If the wrapper is +Returns the result of applying the SQL/JSON is redundant? playing around with it. found some minor issues: json_exists allow: DEFAULT expression ON ERROR, which is not mentioned in the doc. for example: select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default true ON ERROR); select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default 0 ON ERROR); select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default 11 ON ERROR); JSON_VALUE on error, on empty semantics should be the same as json_query. like: [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON EMPTY ] [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression } ON ERROR ]) examples: select JSON_value(jsonb '[]' , '$' empty array on error); select JSON_value(jsonb '[]' , '$' empty object on error);
Re: ON ERROR in json_query and the like
On Thu, Jun 20, 2024 at 2:19 AM Amit Langote wrote: > Hi, > > On Mon, Jun 17, 2024 at 10:07 PM Markus Winand > wrote: > > > On 17.06.2024, at 08:20, Amit Langote wrote: > > > Agree that the documentation needs to be clear about this. I'll update > > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query > > > Functions. > > > > Considering another branch of this thread [1] I think the > > "Supported Features” appendix of the docs should mention that as well. > > > > The way I see it is that the standards defines two overloaded > > JSON_QUERY functions, of which PostgreSQL will support only one. > > In case of valid JSON, the implied CAST makes it look as though > > the second variant of these function was supported as well but that > > illusion totally falls apart once the JSON is not valid anymore. > > > > I think it affects the following feature IDs: > > > > - T821, Basic SQL/JSON query operators > > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > > - T828, JSON_QUERY > > > > Also, how hard would it be to add the functions that accept > > character strings? Is there, besides the effort, any thing else > > against it? I’m asking because I believe once released it might > > never be changed — for backward compatibility. > > Hmm, I'm starting to think that adding the implied cast to json wasn't > such a great idea after all, because it might mislead the users to > think that JSON parsing is transparent (respects ON ERROR), which is > what you are saying, IIUC. > > Actually, the implied cast is exactly the correct thing to do here - the issue is that we aren't using the newly added soft errors infrastructure [1] to catch the result of that cast and instead produce whatever output on error tells us to produce. This seems to be in the realm of doability so we should try in the interest of being standard conforming. I'd even argue to make this an open item unless and until the attempt is agreed upon to have failed (or it succeeds of course). David J. [1] d9f7f5d32f201bec61fef8104aafcb77cecb4dcb
Re: Extension security improvement: Add support for extensions with an owned schema
On Jun 19, 2024, at 11:28, Robert Haas wrote: > But I wonder if there might also be another possible approach: could > we, somehow, prevent object references in extension scripts from > resolving to anything other than the system catalogs and the contents > of that extension? Perhaps with a control file setting to specify a > list of trusted extensions which we're also allowed to reference? It would also have to allow access to other extensions it depends upon. D
Re: Direct SSL connection and ALPN loose ends
On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion wrote: > > I think the behavior with v2 and v3 errors should be the same. And I > > think an immediate failure is appropriate on any v2/v3 error during > > negotiation, assuming we don't use those errors for things like "TLS not > > supported", which would warrant a fallback. > > For GSS encryption, it was my vague understanding that older servers > respond with an error rather than the "not supported" indication. For > TLS, though, the decision in a49fbaaf (immediate failure) seemed > reasonable. Would an open item for this be appropriate? Thanks, --Jacob
Re: Extension security improvement: Add support for extensions with an owned schema
On Jun 19, 2024, at 13:50, Jelte Fennema-Nio wrote: > This indeed does sound like the behaviour that pretty much every > existing extension wants to have. One small addition/clarification > that I would make to your definition: fully qualified references to > other objects should still be allowed. Would be tricky for referring to objects from other extensions with no defined schema, or are relatable. > 1. To have a safe search_path that can be used in SET search_path on a > function (see also [1]). > 2. To make it easy for extension authors to avoid conflicts with other > extensions/UDFs. These would indeed be nice improvements IMO. Best, David
call for applications: mentoring program for code contributors
Hi, I'm working to start a mentoring program where code contributors can be mentored by current committers. Applications are now open: https://forms.gle/dgjmdxtHYXCSg6aB7 Nine committers have volunteered to mentor one person each; hence, the anticipated number of acceptances is less than or equal to nine. In the future, we may have more mentors, or some mentors may be willing to take more than one mentee, or some mentoring relationships may end, opening up spots for new people, but right now I have nine slots maximum. Even if less than nine people apply initially, that doesn't guarantee that your application will be accepted, because the way this works is you can only be matched to a committer if you want to be matched with them and they want to be matched with you. If you don't already have a significant track record on pgsql-hackers, it is probably unlikely that you will find a mentor in this program at this time. Even if you do, you may not match with a mentor for any number of reasons: not enough slots, time zone, language issues, your particular interests as contrasted with those of the mentors, etc. The basic expectation around mentorship is that your mentor will have a voice call with you at least once per month for at least one hour. Before that call, you should give them some idea what you'd like to talk about and they should do some non-zero amount of preparation. During that call, they'll try to give you some useful advice. Maybe they'll be willing to do other things, too, like review and commit your patches, or email back and forth with you off-list, or chat using an instant messaging service, but if they do any of that stuff, that's extra. Either the mentor or the mentee is free to end the mentoring relationship at any time for any reason, or for no reason. If that happens, please let me know, whether it's because of an explicit decision on someone's part, or because somehow the monthly voice calls have ceased to occur. Periodically, someone -- most likely not me, since a few people have been kind enough to offer help -- will contact mentors and mentees to get feedback on how things are going. We'll use this feedback to improve the program, which might involve adjusting mentoring assignments, or might involve taking such other actions as the situation may suggest. In the future, I would like to expand this program to include non-committer mentors. The idea would be that committers would most likely want to mentor more senior contributors and senior non-committers could mentor more junior contributors, so that we pay it all forward. If this is something you'd be interested in participating in, whether as a co-organizer, mentor, or mentee, please let me know. It might also be advantageous to expand this program, or have a separate program, to mentor people making non-code contributions e.g. mentoring for conference organizers. I've chosen to focus on mentorship for code contribution because I know enough about it to function as an organizer for such an effort. If you apply for this program, you can expect to receive an email from me in the next couple of weeks letting you know the result of your application. If for some reason that does not occur, please feel free to email me privately, but note that I'll want to give a bit of time for people to see this email and fill out the form before doing anything, and then I'll need to talk over possibilities with the mentors before finalizing anything, so it will take a bit of time. Finally, I would like to extend a special thanks to the mentors for volunteering to mentor, and a more general thanks to everyone who contributes to PostgreSQL in any way or is interested in doing so for their interest in and hard work on the project. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
Re: call for applications: mentoring program for code contributors
On Jun 20, 2024, at 13:12, Robert Haas wrote: > I'm working to start a mentoring program where code contributors can > be mentored by current committers. Applications are now open: > https://forms.gle/dgjmdxtHYXCSg6aB7 This is amazing! Thank you for putting it together, Robert. Best, David
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Then maybe we should put a query / function in the release notes to clean up the existing mess. Thinking of it we should do it anyway, as the patch only prevents new messiness from happening and does not fix existing issues. I could share a query to update the pg_init_privs with non-existent role to replace it with the owner of the object if we figure out a correct place to publish it. --- Hannu On Thu, Jun 20, 2024 at 5:35 PM Tom Lane wrote: > > Hannu Krosing writes: > > Is there anything that could be back-patched with reasonable effort ? > > Afraid not. The whole thing is dependent on pg_shdepend entries > that won't exist in older branches. > > regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Or perhaps we should still also patch pg_dump to ignore the aclentries which refer to roles that do not exist in the database ? On Thu, Jun 20, 2024 at 7:41 PM Hannu Krosing wrote: > > Then maybe we should put a query / function in the release notes to > clean up the existing mess. > > Thinking of it we should do it anyway, as the patch only prevents new > messiness from happening and does not fix existing issues. > > I could share a query to update the pg_init_privs with non-existent > role to replace it with the owner of the object if we figure out a > correct place to publish it. > > --- > Hannu > > > > > On Thu, Jun 20, 2024 at 5:35 PM Tom Lane wrote: > > > > Hannu Krosing writes: > > > Is there anything that could be back-patched with reasonable effort ? > > > > Afraid not. The whole thing is dependent on pg_shdepend entries > > that won't exist in older branches. > > > > regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
On Thu, Jun 20, 2024 at 2:25 PM Tom Lane wrote: > Hannu Krosing writes: > > Or perhaps we should still also patch pg_dump to ignore the aclentries > > which refer to roles that do not exist in the database ? > > I didn't want to do that before, and I still don't. Given that this > issue has existed since pg_init_privs was invented (9.6) without > prior reports, I don't think it's a big enough problem in practice > to be worth taking extraordinary actions for. If we don't fix it in the code and we don't document it anywhere, the next person who hits it is going to have to try to discover the fact that there's a problem from the pgsql-hackers archives. That doesn't seem good. I don't have an educated opinion about what we should do here specifically, and I realize that we don't have any official place to document known issues, but it can be pretty inconvenient for users not to know about them. -- Robert Haas EDB: http://www.enterprisedb.com
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hannu Krosing writes: > Or perhaps we should still also patch pg_dump to ignore the aclentries > which refer to roles that do not exist in the database ? I didn't want to do that before, and I still don't. Given that this issue has existed since pg_init_privs was invented (9.6) without prior reports, I don't think it's a big enough problem in practice to be worth taking extraordinary actions for. regards, tom lane
Re: CREATE INDEX CONCURRENTLY on partitioned index
On 15.06.2024 20:40, Justin Pryzby wrote: On Thu, May 23, 2024 at 10:14:57PM +0100, Ilya Gladyshev wrote: Hi, I think it's well worth the effort to revive the patch, so I rebased it on master, updated it and will return it back to the commitfest. Alexander, Justin feel free to add yourselves as authors Thanks -- I was intending to write about this. I realized that the patch will need some isolation tests to exercise its concurrent behavior. Thanks for the suggestion, added an isolation test that verifies behaviour of partitioned CIC with simultaneous partition drop/detach going on. Also fixed some issues in the new patch that I found while writing the test. From 45f2ec9ee57a5337b77b66db3c8c5092f305a176 Mon Sep 17 00:00:00 2001 From: Ilya Gladyshev Date: Tue, 11 Jun 2024 17:48:08 +0100 Subject: [PATCH v5 2/2] Fix progress report for partitioned REINDEX --- src/backend/catalog/index.c | 11 -- src/backend/commands/indexcmds.c | 63 +--- src/include/catalog/index.h | 1 + 3 files changed, 66 insertions(+), 9 deletions(-) diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 55fdde4b24..c5bc72b350 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -3559,6 +3559,7 @@ reindex_index(const ReindexStmt *stmt, Oid indexId, volatile bool skipped_constraint = false; PGRUsage ru0; bool progress = ((params->options & REINDEXOPT_REPORT_PROGRESS) != 0); + bool partition = ((params->options & REINDEXOPT_PARTITION) != 0); bool set_tablespace = false; pg_rusage_init(&ru0); @@ -3604,8 +3605,9 @@ reindex_index(const ReindexStmt *stmt, Oid indexId, indexId }; - pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, - heapId); + if (!partition) + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, + heapId); pgstat_progress_update_multi_param(2, progress_cols, progress_vals); } @@ -3845,8 +3847,11 @@ reindex_index(const ReindexStmt *stmt, Oid indexId, index_close(iRel, NoLock); table_close(heapRelation, NoLock); - if (progress) + if (progress && !partition) + { + /* progress for partitions is tracked in the caller */ pgstat_progress_end_command(); + } } /* diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index dcb4ea89e9..6abe1f017c 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -118,6 +118,7 @@ static bool ReindexRelationConcurrently(const ReindexStmt *stmt, const ReindexParams *params); static void update_relispartition(Oid relationId, bool newval); static inline void set_indexsafe_procflags(void); +static inline void progress_index_partition_done(void); /* * callback argument type for RangeVarCallbackForReindexIndex() @@ -1550,7 +1551,7 @@ DefineIndex(Oid tableId, * Update progress for an intermediate partitioned index * itself */ -pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1); +progress_index_partition_done(); } return address; @@ -1577,7 +1578,7 @@ DefineIndex(Oid tableId, if (!OidIsValid(parentIndexId)) pgstat_progress_end_command(); else if (!concurrent) - pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1); + progress_index_partition_done(); return address; } @@ -1686,7 +1687,7 @@ DefineIndex(Oid tableId, heaplocktag, heaprelid); PushActiveSnapshot(GetTransactionSnapshot()); - pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1); + progress_index_partition_done(); } /* Set as valid all partitioned indexes, including the parent */ @@ -3331,6 +3332,14 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param ListCell *lc; ErrorContextCallback errcallback; ReindexErrorInfo errinfo; + ReindexParams newparams; + int progress_params[3] = { + PROGRESS_CREATEIDX_COMMAND, + PROGRESS_CREATEIDX_PHASE, + PROGRESS_CREATEIDX_PARTITIONS_TOTAL + }; + int64 progress_values[3]; + Oid heapId = relid; Assert(RELKIND_HAS_PARTITIONS(relkind)); @@ -3392,11 +3401,28 @@ ReindexPartitions(const ReindexStmt *stmt, Oid relid, const ReindexParams *param MemoryContextSwitchTo(old_context); } + if (relkind == RELKIND_PARTITIONED_INDEX) + { + heapId = IndexGetRelation(relid, true); + } + + progress_values[0] = (params->options & REINDEXOPT_CONCURRENTLY) != 0 ? + PROGRESS_CREATEIDX_COMMAND_REINDEX_CONCURRENTLY : + PROGRESS_CREATEIDX_COMMAND_REINDEX; + progress_values[1] = 0; + progress_values[2] = list_length(partitions); + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId); + pgstat_progress_update_multi_param(3, progress_params, progress_values); + /* * Process each partition listed in a separate transaction. Note that * this commits and then starts a new transaction immediately. */ - ReindexMultipleInternal(stmt, partitions, params); + newparams = *
Re: DROP OWNED BY fails to clean out pg_init_privs grants
It does happen with some regularity. At least one large cloud database provider I know of saw this more than once a month until the mitigations were integrated in the major version upgrade process. It is possible that making database upgrades easier via better automation is what made this turn up more, as now less experienced / non-DBA types are more comfortable doing the version upgrades, whereas before it would be something done by a person who can also diagnose it and manually fix pg_init_privs. Still it would be nice to have some public support for users of non-managed PostgreSQL databases as well On Thu, Jun 20, 2024 at 8:25 PM Tom Lane wrote: > > Hannu Krosing writes: > > Or perhaps we should still also patch pg_dump to ignore the aclentries > > which refer to roles that do not exist in the database ? > > I didn't want to do that before, and I still don't. Given that this > issue has existed since pg_init_privs was invented (9.6) without > prior reports, I don't think it's a big enough problem in practice > to be worth taking extraordinary actions for. > > regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
On Thu, Jun 20, 2024 at 3:43 PM Hannu Krosing wrote: > Still it would be nice to have some public support for users of > non-managed PostgreSQL databases as well +1. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Meson far from ready on Windows
Hi, On 2024-06-19 14:47:50 +0100, Dave Page wrote: > > I'm *NOT* sure that vcpkg is the way to go, fwiw. It does seem > > advantageous to > > use one of the toolkits thats commonly built for building dependencies on > > windows, which seems to mean vcpkg or conan. > > > > I don't think requiring or expecting vcpkg or conan is reasonable at all, > for a number of reasons: > > - Neither supports all the dependencies at present. > - There are real supply chain verification concerns for vendors. > - That would be a huge change from what we've required in the last 19 > years, with no deprecation notices or warnings for packagers etc. I don't think we should hard-require one specifically. I do think it'd be good if we provided an easy recipe for dependencies to be installed though. I think such flexibility acually means it becomes *more* important to abstract away some of the concrete ways of using the various dependencies. It doesn't make sense for postgres to understand the internals of each dependency on all platforms to a detail that it can cope with all the different ways of linking against them. E.g. libxml can be built with icu, lzma, zlib support. If libxml is built statically, postgres needs to link to all those libraries as well. How can we know which of those dependencies are enabled? Even if we can make that somehow work, it's not reasonable for postgres developers adding a dependency to have to figure out how to probe all of this, when literally no other platform works that way anymore. If you look around at recipes for building postgres on windows, they all have to patch src/tools/msvc (see links at the bottom), because the builtin paths and flags just don't work outside of a single way of acquiring the dependency. The fact that this thread started only now is actually a good example for how broken the approach to internalize all knowledge about building against libraries into postgres is. This could all have been figured out 1+ years ago - but wasn't. Unless you want to require postgres devs to get constantly in the muck on windows, we'll never get that right until just before the release. I don't particularly care how we abstract away the low level linking details on windows. We can use pkgconf, we can use cmake, we can invent our own thing. But it has to be something other than hardcoding windows library paths and compiler flags into our buildsystem. And yes, that might make it a bit harder for a packager on windows, but windows is already a *massive* drag on PG developers, it has to be somewhat manageable. I do think we can make the effort of windows dependency management a lot more reasonable than it is now though, by providing a recipe for acquiring the dependency in some form. It's a lot easier to for packagers and developers to customize ontop of something like that. Frankly, the fact that there's pretty much no automated testing of the various dependencies that's accessible to non-windows devs is not a sustainable situation. > > Btw, I've been working with Bilal to add a few of the dependencies to the > > CI > > images so we can test those automatically. Using vcpkg. We got that nearly > > working, but he's on vacation this week... That does ensure both cmake and > > .pc files are generated, fwiw. > > > > Currently builds gettext, icu, libxml2, libxslt, lz4, openssl, pkgconf, > > python3, tcl, zlib, zstd. > > > That appears to be using Mingw/Msys, which is quite different from a VC++ > build, in part because it's a full environment with its own package manager > and packages that people have put a lot of effort into making work as they > do on unix. Err, that was a copy-paste mistake on my end and doesn't even use the vcpkg generated stuff. Here's an example build with most dependencies enabled (see below for more details): https://cirrus-ci.com/task/6497321108635648?logs=configure#L323 I started hacking a bit further on testing all dependencies, which led me down a few rabbitholes: - kerberos: When built linking against a debug runtime, it spews *ginormous* amounts of information onto stderr. Unfortunately its buildsystem doesn't seperate out debugging output and linking against a debug runtime. Argh. The tests fail even with a non-debug runtime though, due to debugging output in some cases, not sure why: https://cirrus-ci.com/task/5872684519653376?logs=check_world#L502 Separately, the kerberos tests don't seem to be prepared to work on windows :(. So I disabled using it in CI for now. - Linking the backend dynamically against lz4, icu, ssl, xml, xslt, zstd, zlib slows down the tests noticeably (~20%). So I ended up building those statically. I ran into some issue with using a static libintl. I made it work, but for now reverted to a dynamic one. - Enabling nls slows down the tests by about 15%, somewhat painful. This is when statically linking, it's a bit worse when linked dynamically :(. - readline: Instead of the old
Issue with "start another primitive scan" logic during nbtree array advancement
While working on skip scan, I stumbled upon a bug on HEAD. This is an issue in my commit 5bf748b8, "Enhance nbtree ScalarArrayOp execution". The attached test case (repro_wrong_prim.sql) causes an assertion failure on HEAD. Here's the stack trace: TRAP: failed Assert("so->keyData[opsktrig].sk_strategy != BTEqualStrategyNumber"), File: "../source/src/backend/access/nbtree/nbtutils.c", Line: 2475, PID: 1765589 [0x55942a24db8f] _bt_advance_array_keys: /mnt/nvme/postgresql/patch/build_meson_dc/../source/src/backend/access/nbtree/nbtutils.c:2475 [0x55942a24bf22] _bt_checkkeys: /mnt/nvme/postgresql/patch/build_meson_dc/../source/src/backend/access/nbtree/nbtutils.c:3797 [0x55942a244160] _bt_readpage: /mnt/nvme/postgresql/patch/build_meson_dc/../source/src/backend/access/nbtree/nbtsearch.c:2221 [0x55942a2434ca] _bt_first: /mnt/nvme/postgresql/patch/build_meson_dc/../source/src/backend/access/nbtree/nbtsearch.c:1888 [0x55942a23ef88] btgettuple: /mnt/nvme/postgresql/patch/build_meson_dc/../source/src/backend/access/nbtree/nbtree.c:259 The problem is that _bt_advance_array_keys() doesn't take sufficient care at the point where it decides whether its call to _bt_check_compare against finaltup (with the scan direction flipped around) indicates that another primitive index scan is required. The final decision is conditioned on rules about how the scan key offset sktrig that triggered the call to _bt_advance_array_keys() relates to the scan key offset that was set by the _bt_check_compare finaltup comparison. This was fragile. It breaks with this test case because of fairly subtle conditions around when and how the arrays advance, the layout of the relevant leaf page, and the placement of inequality scan keys. When assertions are disabled, we do multiple primitive index scans that land on the same leaf page, which isn't supposed to be possible anymore. The query gives correct answers, but this behavior is definitely wrong (it is simply supposed to be impossible now, per 5bf748b8's commit message). Attached is a draft bug fix patch. It nails down the test by simply testing "so->keyData[opsktrig].sk_strategy != BTEqualStrategyNumber" directly, rather than comparing scan key offsets. This is a far simpler and far more direct approach. You might wonder why I didn't do it like this in the first place. It just worked out that way. The code in question was written before I changed the design of _bt_check_compare (in the draft patch that became commit 5bf748b8). Up until not that long before the patch was committed, _bt_check_compare would set "continuescan=false" for non-required arrays. That factor made detecting whether or not the relevant _bt_check_compare call had in fact encountered a required inequality of the kind we need to detect (to decide on whether to start another primitive index scan) difficult and messy. However, the final committed patch simplified _bt_check_compare, making the approach I've taken in the bug fix patch possible. I just never made the connection before now. -- Peter Geoghegan v1-0001-Fix-nbtree-array-unsatisfied-inequality-check.patch Description: Binary data repro_wrong_prim.sql Description: Binary data
Re: pg_combinebackup --clone doesn't work
Here's a fix adding the missing headers to pg_combinebackup, and fixing some compile-time issues in the ifdef-ed block. I've done some basic manual testing today - I plan to test this a bit more tomorrow, and I'll also look at integrating this into the existing tests. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom ae712aa6316c8f7035edee9fb49e1bfe1ea30b94 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 20 Jun 2024 23:50:22 +0200 Subject: [PATCH] fix clone headers --- src/bin/pg_combinebackup/copy_file.c| 12 +++- src/bin/pg_combinebackup/pg_combinebackup.c | 8 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_combinebackup/copy_file.c b/src/bin/pg_combinebackup/copy_file.c index 08c3b875420..8b50e937346 100644 --- a/src/bin/pg_combinebackup/copy_file.c +++ b/src/bin/pg_combinebackup/copy_file.c @@ -13,6 +13,10 @@ #ifdef HAVE_COPYFILE_H #include #endif +#ifdef __linux__ +#include +#include +#endif #include #include #include @@ -214,6 +218,9 @@ copy_file_clone(const char *src, const char *dest, pg_fatal("error while cloning file \"%s\" to \"%s\": %m", src, dest); #elif defined(__linux__) && defined(FICLONE) { + int src_fd; + int dest_fd; + if ((src_fd = open(src, O_RDONLY | PG_BINARY, 0)) < 0) pg_fatal("could not open file \"%s\": %m", src); @@ -228,8 +235,11 @@ copy_file_clone(const char *src, const char *dest, unlink(dest); pg_fatal("error while cloning file \"%s\" to \"%s\": %s", - src, dest); + src, dest, strerror(save_errno)); } + + close(src_fd); + close(dest_fd); } #else pg_fatal("file cloning not supported on this platform"); diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c index 01d7fb98e79..f67ccf76ea2 100644 --- a/src/bin/pg_combinebackup/pg_combinebackup.c +++ b/src/bin/pg_combinebackup/pg_combinebackup.c @@ -16,6 +16,14 @@ #include #include +#ifdef HAVE_COPYFILE_H +#include +#endif +#ifdef __linux__ +#include +#include +#endif + #include "backup_label.h" #include "common/blkreftable.h" #include "common/checksum_helper.h" -- 2.45.2
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On 19/06/2024 23:00, Alexander Lakhin wrote: Please look at a new anomaly, that I've discovered in master. ... triggers a segfault: 2024-06-19 19:22:49.009 UTC [1607210:6] LOG: server process (PID 1607671) was terminated by signal 11: Segmentation fault ... server.log might also contain: 2024-06-19 19:25:38.060 UTC [1618682:5] psql ERROR: could not read blocks 3..3 in file "base/28531/28840": read only 0 of 8192 bytes Thanks for the report! I was not able to reproduce the segfault, but I do see the "could not read blocks" error very quickly with the script. In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from RelationCacheInvalidate(). I thought it was no longer needed, because we no longer free the underlying SmgrRelation. However, it meant that if the relfilenode of the relation was changed, the relation keeps pointing to the SMgrRelation of the old relfilenode. So we still need the RelationCloseSmgr() call, in case the relfilenode has changed. Per attached patch. -- Heikki Linnakangas Neon (https://neon.tech) From 17a1e68a636edd87a34acf34dace2a696af81a2d Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Fri, 21 Jun 2024 01:45:53 +0300 Subject: [PATCH 1/1] Fix relcache invalidation when relfilelocator is updated In commit af0e7deb4a, I removed this call to RelationCloseSmgr(), because the dangling SMgrRelation was no longer an issue. However, we still need to call in case the relation's relfilelocator has changed, so that we open the new relfile on the next access. Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/987b1c8c-8c91-4847-ca0e-879f421680ff%40gmail.com --- src/backend/utils/cache/relcache.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 35dbb87ae3..f45880d96d 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -3032,6 +3032,9 @@ RelationCacheInvalidate(bool debug_discard) { relation = idhentry->reldesc; + /* Close all smgr references, in case the relfilelocator has changed */ + RelationCloseSmgr(relation); + /* * Ignore new relations; no other backend will manipulate them before * we commit. Likewise, before replacing a relation's relfilelocator, -- 2.39.2
Re: Pluggable cumulative statistics
On Thu, Jun 20, 2024 at 01:05:42PM +, Bertrand Drouvot wrote: > On Thu, Jun 13, 2024 at 04:59:50PM +0900, Michael Paquier wrote: >> * Making custom stats data persistent is an interesting problem, and >> there are a couple of approaches I've considered: >> ** Allow custom kinds to define callbacks to read and write data from >> a source they'd want, like their own file through a fd. This has the >> disadvantage to remove the benefit of c) above. >> ** Store everything in the existing stats file, adding one type of >> entry like 'S' and 'N' with a "custom" type, where the *name* of the >> custom stats kind is stored instead of its ID computed from shared >> memory. > > What about having 2 files? > > - One is the existing stats file > - One "predefined" for all the custom stats (so what you've done minus the > in-core stats). This one would not be configurable and the extensions will > not need to know about it. Another thing that can be done here is to add a few callbacks to control how an entry should be written out when the dshash is scanned or read when the dshash is populated depending on the KindInfo. That's not really complicated to do as the populate part could have a cleanup phase if an error is found. I just did not do it yet because this patch set is already covering a lot, just to get the basics in. > Would that remove the benefit from c) that you mentioned up-thread? Yes, that can be slightly annoying. Splitting the stats across multiple files would mean that each stats file would have to store the redo LSN. That's not really complicated to implement, but really easy to miss. Perhaps folks implementing their own stats kinds would be aware anyway because we are going to need a callback to initialize the file to write if we do that, and the redo LSN should be provided in input of it. Giving more control to extension developers here would be OK for me, especially since they could use their own format for their output file(s). -- Michael signature.asc Description: PGP signature
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
Heikki Linnakangas writes: > In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from > RelationCacheInvalidate(). I thought it was no longer needed, because we > no longer free the underlying SmgrRelation. > However, it meant that if the relfilenode of the relation was changed, > the relation keeps pointing to the SMgrRelation of the old relfilenode. > So we still need the RelationCloseSmgr() call, in case the relfilenode > has changed. Ouch. How come we did not see this immediately in testing? I'd have thought surely such a bug would be exposed by any command that rewrites a heap. regards, tom lane
Re: Direct SSL connection and ALPN loose ends
On 20/06/2024 20:02, Jacob Champion wrote: On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion wrote: I think the behavior with v2 and v3 errors should be the same. And I think an immediate failure is appropriate on any v2/v3 error during negotiation, assuming we don't use those errors for things like "TLS not supported", which would warrant a fallback. For GSS encryption, it was my vague understanding that older servers respond with an error rather than the "not supported" indication. For TLS, though, the decision in a49fbaaf (immediate failure) seemed reasonable. Would an open item for this be appropriate? Added. By "negotiation" I mean the server's response to the startup packet. I.e. "supported"/"not supported"/"error". Ok, I'm still a little confused, probably a terminology issue. The server doesn't respond with "supported" or "not supported" to the startup packet, that happens earlier. I think you mean the SSLRequst / GSSRequest packet, which is sent *before* the startup packet? I think the behavior with v2 and v3 errors should be the same. And I think an immediate failure is appropriate on any v2/v3 error during negotiation, assuming we don't use those errors for things like "TLS not supported", which would warrant a fallback. For GSS encryption, it was my vague understanding that older servers respond with an error rather than the "not supported" indication. For TLS, though, the decision in a49fbaaf (immediate failure) seemed reasonable. Hmm, right, GSS encryption was introduced in v12, and older versions respond with an error to a GSSRequest. We probably could make the same assumption for GSS as we did for TLS in a49fbaaf, i.e. that an error means that something's wrong with the server, rather than that it's just very old and doesn't support GSS. But the case for that is a lot weaker case than with TLS. There are still pre-v12 servers out there in the wild. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Pluggable cumulative statistics
On Thu, Jun 20, 2024 at 02:27:14PM +, Bertrand Drouvot wrote: > On Thu, Jun 20, 2024 at 09:46:30AM +0900, Michael Paquier wrote: > I think it makes sense to follow the same "behavior" as the custom > wal resource managers. That, indeed, looks much more simpler than v1. Thanks for the feedback. >> and can only >> be registered with shared_preload_libraries. The patch reserves a set >> of 128 harcoded slots for all the custom kinds making the lookups for >> the KindInfos quite cheap. > > + MemoryContextAllocZero(TopMemoryContext, > + > sizeof(PgStat_KindInfo *) * PGSTAT_KIND_CUSTOM_SIZE); > > and that's only 8 * PGSTAT_KIND_CUSTOM_SIZE bytes in total. Enlarging that does not worry me much. Just not too much. >> With that in mind, the patch set is more pleasant to the eye, and the >> attached v2 consists of: >> - 0001 and 0002 are some cleanups, same as previously to prepare for >> the backend-side APIs. > > 0001 and 0002 look pretty straightforward at a quick look. 0002 is quite independentn. Still, 0001 depends a bit on the rest. Anyway, the Kind is already 4 bytes and it cleans up some APIs that used int for the Kind, so enforcing signedness is just cleaner IMO. >> - 0003 adds the backend support to plug-in custom stats. > > 1 === > > It looks to me that there is a mix of "in core" and "built-in" to name the > non custom stats. Maybe it's worth to just use one? Right. Perhaps better to remove "in core" and stick to "builtin", as I've used the latter for the variables and such. > As I can see (and as you said above) this is mainly inspired by the custom > resource manager and 2 === and 3 === are probably copy/paste consequences. > > 2 === > > + if (pgstat_kind_custom_infos[idx] != NULL && > + pgstat_kind_custom_infos[idx]->name != NULL) > + ereport(ERROR, > + (errmsg("failed to register custom cumulative > statistics \"%s\" with ID %u", kind_info->name, kind), > +errdetail("Custom resource manager \"%s\" > already registered with the same ID.", > + > pgstat_kind_custom_infos[idx]->name))); > > s/Custom resource manager/Custom cumulative statistics/ > > 3 === > > + ereport(LOG, > + (errmsg("registered custom resource manager \"%s\" > with ID %u", > + kind_info->name, kind))); > > s/custom resource manager/custom cumulative statistics/ Oops. Will fix. -- Michael signature.asc Description: PGP signature
Re: Failures in constraints regression test, "read only 0 of 8192 bytes"
On 21/06/2024 02:12, Tom Lane wrote: Heikki Linnakangas writes: In commit af0e7deb4a, I removed the call to RelationCloseSmgr() from RelationCacheInvalidate(). I thought it was no longer needed, because we no longer free the underlying SmgrRelation. However, it meant that if the relfilenode of the relation was changed, the relation keeps pointing to the SMgrRelation of the old relfilenode. So we still need the RelationCloseSmgr() call, in case the relfilenode has changed. Ouch. How come we did not see this immediately in testing? I'd have thought surely such a bug would be exposed by any command that rewrites a heap. There is a RelationCloseSmgr() call in RelationClearRelation(), which covers the common cases. This only occurs during RelationCacheInvalidate(), when pg_class's relfilenumber was changed. Hmm, looking closer, I think this might be a more appropriate place for the RelationCloseSmgr() call: /* * If it's a mapped relation, immediately update its rd_locator in * case its relfilenumber changed. We must do this during phase 1 * in case the relation is consulted during rebuild of other * relcache entries in phase 2. It's safe since consulting the * map doesn't involve any access to relcache entries. */ if (RelationIsMapped(relation)) RelationInitPhysicalAddr(relation); That's where we change the relfilenumber, before the RelationClearRelation() call. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Direct SSL connection and ALPN loose ends
On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas wrote: > > By "negotiation" I mean the server's response to the startup packet. > > I.e. "supported"/"not supported"/"error". > > Ok, I'm still a little confused, probably a terminology issue. The > server doesn't respond with "supported" or "not supported" to the > startup packet, that happens earlier. I think you mean the SSLRequst / > GSSRequest packet, which is sent *before* the startup packet? Yes, sorry. (I'm used to referring to those as startup packets too, ha.) > Hmm, right, GSS encryption was introduced in v12, and older versions > respond with an error to a GSSRequest. > > We probably could make the same assumption for GSS as we did for TLS in > a49fbaaf, i.e. that an error means that something's wrong with the > server, rather than that it's just very old and doesn't support GSS. But > the case for that is a lot weaker case than with TLS. There are still > pre-v12 servers out there in the wild. Right. Since we default to gssencmode=prefer, if you have Kerberos creds in your environment, I think this could potentially break existing software that connects to v11 servers once you upgrade libpq. Thanks, --Jacob
Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Hi, If vacuum fails to remove a tuple with xmax older than VacuumCutoffs->OldestXmin and younger than GlobalVisState->maybe_needed, it will ERROR out when determining whether or not to freeze the tuple with "cannot freeze committed xmax". In back branches starting with 14, failing to remove tuples older than OldestXmin during pruning caused vacuum to infinitely loop in lazy_scan_prune(), as investigated on this [1] thread. On master, after 1ccc1e05ae removed the retry loop in lazy_scan_prune() and stopped comparing tuples to OldestXmin, the hang could no longer happen, but we can still attempt to freeze dead tuples visibly killed before OldestXmin -- resulting in an ERROR. Pruning may fail to remove dead tuples with xmax before OldestXmin if the tuple is not considered removable by GlobalVisState. For vacuum, the GlobalVisState is initially calculated at the beginning of vacuuming the relation -- at the same time and with the same value as VacuumCutoffs->OldestXmin. A backend's GlobalVisState may be updated again when it is accessed if a new snapshot is taken or if something caused ComputeXidHorizons() to be called. This can happen, for example, at the end of a round of index vacuuming when GetOldestNonRemovableTransactionId() is called. Normally this may result in GlobalVisState's horizon moving forward -- potentially allowing more dead tuples to be removed. However, if a disconnected standby with a running transaction older than VacuumCutoffs->OldestXmin reconnects to the primary after vacuum initially calculates GlobalVisState and OldestXmin but before GlobalVisState is updated, the value of GlobalVisState->maybe_needed could go backwards. If this happens in the middle of vacuum's first pruning and freezing pass, it is possible that pruning/freezing could then encounter a tuple whose xmax is younger than GlobalVisState->maybe_needed and older than VacuumCutoffs->OldestXmin. heap_prune_satisfies_vacuum() would deem the tuple HEAPTUPLE_RECENTLY_DEAD and would not remove it. But the heap_pre_freeze_checks() would ERROR out with "cannot freeze committed xmax". This check is to avoid freezing dead tuples. We can fix this by always removing tuples considered dead before VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby has a transaction that sees that tuple as alive, because it will simply wait to replay the removal until it would be correct to do so or recovery conflict handling will cancel the transaction that sees the tuple as alive and allow replay to continue. Attached is the suggested fix for master plus a repro. I wrote it as a recovery suite TAP test, but I am _not_ proposing we add it to the ongoing test suite. It is, amongst other things, definitely prone to flaking. I also had to use loads of data to force two index vacuuming passes now that we have TIDStore, so it is a slow test. If you want to run the repro with meson, you'll have to add 't/099_vacuum_hang.pl' to src/test/recovery/meson.build and then run it with: meson test postgresql:recovery / recovery/099_vacuum_hang If you use autotools, you can run it with: make check PROVE_TESTS="t/099_vacuum_hang.pl" The repro forces a round of index vacuuming after the standby reconnects and before pruning a dead tuple whose xmax is older than OldestXmin. At the end of the round of index vacuuming, _bt_pendingfsm_finalize() calls GetOldestNonRemovableTransactionId(), thereby updating the backend's GlobalVisState and moving maybe_needed backwards. Then vacuum's first pass will continue with pruning and find our later inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin. I make sure that the standby reconnects between vacuum_get_cutoffs() and pruning because I have a cursor on the page keeping VACUUM FREEZE from getting a cleanup lock. See the repro for step-by-step explanations of how it works. I have a modified version of this that repros the infinite loop on 14-16 with substantially less data. See it here [2]. Also, the repro attached to this mail won't work on 14 and 15 because of changes to background_psql. - Melanie [1] https://postgr.es/m/20240415173913.4zyyrwaftujxthf2%40awork3.anarazel.de#1b216b7768b5bd577a3d3d51bd5aadee [2] https://www.postgresql.org/message-id/CAAKRu_Y_NJzF4-8gzTTeaOuUL3CcGoXPjXcAHbTTygT8AyVqag%40mail.gmail.com From fb74a07d98d19147556fca1dc911a22d50a496e5 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Thu, 20 Jun 2024 16:38:30 -0400 Subject: [PATCH v1 1/2] Repro for vacuum ERROR out trying to freeze old dead tuple This repro is not stable enough to be added as a test to the regression suite. It is for demonstration purposes only. --- src/test/recovery/t/099_vacuum_hang.pl | 267 + 1 file changed, 267 insertions(+) create mode 100644 src/test/recovery/t/099_vacuum_hang.pl diff --git a/src/test/recovery/t/099_vacuum_hang.pl b/src/test/recovery/t/099_vacuum_hang.pl new file mo
PG 17 and GUC variables
FYI, looking at the release notes, I see 15 GUC variables added in this release, and two removed. That 15 number seemed unusually high so I thought I would report it. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman wrote: > If vacuum fails to remove a tuple with xmax older than > VacuumCutoffs->OldestXmin and younger than > GlobalVisState->maybe_needed, it will ERROR out when determining > whether or not to freeze the tuple with "cannot freeze committed > xmax". > > In back branches starting with 14, failing to remove tuples older than > OldestXmin during pruning caused vacuum to infinitely loop in > lazy_scan_prune(), as investigated on this [1] thread. This is a great summary. > We can fix this by always removing tuples considered dead before > VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby > has a transaction that sees that tuple as alive, because it will > simply wait to replay the removal until it would be correct to do so > or recovery conflict handling will cancel the transaction that sees > the tuple as alive and allow replay to continue. I think that this is the right general approach. > The repro forces a round of index vacuuming after the standby > reconnects and before pruning a dead tuple whose xmax is older than > OldestXmin. > > At the end of the round of index vacuuming, _bt_pendingfsm_finalize() > calls GetOldestNonRemovableTransactionId(), thereby updating the > backend's GlobalVisState and moving maybe_needed backwards. Right. I saw details exactly consistent with this when I used GDB against a production instance. I'm glad that you were able to come up with a repro that involves exactly the same basic elements, including index page deletion. -- Peter Geoghegan
Re: ON ERROR in json_query and the like
On Fri, Jun 21, 2024 at 1:11 AM David G. Johnston wrote: > On Thu, Jun 20, 2024 at 2:19 AM Amit Langote wrote: >> On Mon, Jun 17, 2024 at 10:07 PM Markus Winand >> wrote: >> > > On 17.06.2024, at 08:20, Amit Langote wrote: >> > > Agree that the documentation needs to be clear about this. I'll update >> > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query >> > > Functions. >> > >> > Considering another branch of this thread [1] I think the >> > "Supported Features” appendix of the docs should mention that as well. >> > >> > The way I see it is that the standards defines two overloaded >> > JSON_QUERY functions, of which PostgreSQL will support only one. >> > In case of valid JSON, the implied CAST makes it look as though >> > the second variant of these function was supported as well but that >> > illusion totally falls apart once the JSON is not valid anymore. >> > >> > I think it affects the following feature IDs: >> > >> > - T821, Basic SQL/JSON query operators >> > For JSON_VALUE, JSON_TABLE and JSON_EXISTS >> > - T828, JSON_QUERY >> > >> > Also, how hard would it be to add the functions that accept >> > character strings? Is there, besides the effort, any thing else >> > against it? I’m asking because I believe once released it might >> > never be changed — for backward compatibility. >> >> Hmm, I'm starting to think that adding the implied cast to json wasn't >> such a great idea after all, because it might mislead the users to >> think that JSON parsing is transparent (respects ON ERROR), which is >> what you are saying, IIUC. >> > > Actually, the implied cast is exactly the correct thing to do here - the > issue is that we aren't using the newly added soft errors infrastructure [1] > to catch the result of that cast and instead produce whatever output on error > tells us to produce. This seems to be in the realm of doability so we should > try in the interest of being standard conforming. Soft error handling *was* used for catching cast errors in the very early versions of this patch (long before I got involved and the infrastructure you mention got added). It was taken out after Pavel said [1] that he didn't like producing NULL instead of throwing an error. Not sure if Pavel's around but it would be good to know why he didn't like it at the time. I can look into making that work again, but that is not going to make beta2. > I'd even argue to make this an open item unless and until the attempt is > agreed upon to have failed (or it succeeds of course). OK, adding an open item. -- Thanks, Amit Langote [1] https://www.postgresql.org/message-id/CAFj8pRCnzO2cnHi5ebXciV%3DtuGVvAQOW9uPU%2BDQV1GkL31R%3D-g%40mail.gmail.com
Re: SQL/JSON query functions context_item doc entry and type requirement
On Thu, Jun 20, 2024 at 9:01 AM jian he wrote: > On Thu, Jun 20, 2024 at 5:46 PM Amit Langote > wrote: > > > > On Thu, Jun 20, 2024 at 1:03 AM David G. Johnston > > wrote: > > > On Wed, Jun 19, 2024 at 8:29 AM jian he > wrote: > > >> > > >> On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack > wrote: > > >> > > > >> > Hi, > > >> > > > >> > On 06/17/24 02:43, Amit Langote wrote: > > >> > > context_item expression can be a value > of > > >> > > any type that can be cast to jsonb. This includes > types > > >> > > such as char, text, > bpchar, > > >> > > character varying, and bytea (with > > >> > > ENCODING UTF8), as well as any domains over these > types. > > >> > > > >> > Reading this message in conjunction with [0] makes me think that we > are > > >> > really talking about a function that takes a first parameter of > type jsonb, > > >> > and behaves exactly that way (so any cast required is applied by > the system > > >> > ahead of the call). Under those conditions, this seems like an > unusual > > >> > sentence to add in the docs, at least until we have also documented > that > > >> > tan's argument can be of any type that can be cast to double > precision. > > >> > > > >> > > >> I guess it would be fine to add an unusual sentence to the docs. > > >> > > >> imagine a function: array_avg(anyarray) returns anyelement. > > >> array_avg calculate an array's elements's avg. like > > >> array('{1,2,3}'::int[]) returns 2. > > >> but array_avg won't make sense if the input argument is a date array. > > >> so mentioning in the doc: array_avg can accept anyarray, but anyarray > > >> cannot date array. > > >> seems ok. > > > > > > > > > There is existing wording for this: > > > > > > "The expression can be of any JSON type, any character string type, or > bytea in UTF8 encoding." > > > > > > If you add this sentence to the paragraph the link that already > exists, which simply points the reader to this sentence, becomes redundant > and should be removed. > > > > I've just posted a patch in the other thread [1] to restrict > > context_item to be of jsonb type, which users would need to ensure by > > adding an explicit cast if needed. I think that makes this > > clarification unnecessary. > > > > > As for table 9.16.3 - it is unwieldy already. Lets try and make the > core syntax shorter, not longer. We already have precedence in the > subsequent json_table section - give each major clause item a name then > below the table define the syntax and meaning for those names. Unlike in > that section - which probably should be modified too - context_item should > have its own description line. > > > > I had posted a patch a little while ago at [1] to render the syntax a > > bit differently with each function getting its own syntax synopsis. > > Resending it here; have addressed Jian He's comments. > > > > -- > I was thinking more like: diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index c324906b22..b9d157663a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18692,8 +18692,10 @@ $.* ? (@ like_regex "^\\d+$") json_exists json_exists ( -context_item, path_expression PASSING { value AS varname } , ... - { TRUE | FALSE | UNKNOWN | ERROR } ON ERROR ) +context_item, +path_expression +variable_definitions +on_error_boolean) Returns true if the SQL/JSON path_expression @@ -18732,12 +18734,14 @@ ERROR: jsonpath array subscript is out of bounds json_query json_query ( -context_item, path_expression PASSING { value AS varname } , ... - RETURNING data_type FORMAT JSON ENCODING UTF8 - { WITHOUT | WITH { CONDITIONAL | UNCONDITIONAL } } ARRAY WRAPPER - { KEEP | OMIT } QUOTES ON SCALAR STRING - { ERROR | NULL | EMPTY { ARRAY | OBJECT } | DEFAULT expression } ON EMPTY - { ERROR | NULL | EMPTY { ARRAY | OBJECT } | DEFAULT expression } ON ERROR ) +context_item, +path_expression +variable_definitions +return_clause +wrapping_clause +quoting_clause +on_empty_set +on_error_set) Returns the result of applying the SQL/JSON @@ -18809,11 +18813,12 @@ DETAIL: Missing "]" after array dimensions. json_value json_value ( -context_item, path_expression - PASSING { value AS varname } , ... - RETURNING data_type - { ERROR | NULL | DEFAULT expression } ON EMPTY - { ERROR | NULL | DEFAULT expression } ON ERROR ) +context_item, +path_expression +variable_definitions +return_type +on_empty_value +on_error_value) Returns the result of applying the SQL/JSON Then defining each of those below the table - keeping the on_error variants together. > playing around with it. > found some minor issues: > >
Re: ON ERROR in json_query and the like
On Thu, Jun 20, 2024 at 5:22 PM Amit Langote wrote: > > Soft error handling *was* used for catching cast errors in the very > early versions of this patch (long before I got involved and the > infrastructure you mention got added). It was taken out after Pavel > said [1] that he didn't like producing NULL instead of throwing an > error. Not sure if Pavel's around but it would be good to know why he > didn't like it at the time. > > I'm personally in the "make it error" camp but "make it conform to the standard" is a stronger membership (in general). I see this note in your linked thread: > By the standard, it is implementation-defined whether JSON parsing errors > should be caught by ON ERROR clause. Absent someone contradicting that claim I retract my position here and am fine with failing if these "functions" are supplied with something that cannot be cast to json. I'd document them like functions that accept json with the implications that any casting to json happens before the function is called and thus its arguments do not apply to that step. Given that we have "expression IS JSON" the ability to hack together a case expression to get non-erroring behavior exists. David J.
Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
On Wed, Jun 19, 2024 at 05:14:50AM +, Hayato Kuroda (Fujitsu) wrote: > I have an unclear point. According to the comment atop GetInsertRecPtr(), it > just > returns the approximated value - the position of the last full WAL page [1]. > If there is a continuation WAL record which across a page, will it return the > halfway point of the WAL record (end of the first WAL page)? If so, the > proposed > fix seems not sufficient. We have to point out the exact the end of the > record. Yeah, that a thing of the patch I am confused with. How are we sure that this is the correct LSN to rely on? If that it the case, the patch does not offer an explanation about why it is better. WalSndWaitForWal() is called only in the context of page callback for a logical WAL sender. Shouldn't we make the flush conditional on what's stored in XLogReaderState.missingContrecPtr? Aka, if we know that we're in the middle of the decoding of a continuation record, we should wait until we've dealt with it, no? In short, I would imagine that WalSndWaitForWal() should know more about XLogReaderState is doing. But perhaps I'm missing something. -- Michael signature.asc Description: PGP signature
Re: PG 17 and GUC variables
On Thu, Jun 20, 2024 at 08:01:19PM -0400, Bruce Momjian wrote: > FYI, looking at the release notes, I see 15 GUC variables added in this > release, and two removed. That 15 number seemed unusually high so I > thought I would report it. Scanning pg_settings across the two versions, I'm seeing: - Removed GUCs between 16 and 17: db_user_namespace old_snapshot_threshold trace_recovery_messages - Added GUCs between 16 and 17: allow_alter_system commit_timestamp_buffers enable_group_by_reordering event_triggers huge_pages_status io_combine_limit max_notify_queue_pages multixact_member_buffers multixact_offset_buffers notify_buffers serializable_buffers standby_slot_names subtransaction_buffers summarize_wal sync_replication_slots trace_connection_negotiation transaction_buffers transaction_timeout wal_summary_keep_time So that makes for 3 removed, 19 additions and a +16. -- Michael signature.asc Description: PGP signature
configure error when CFLAGS='-Wall -Werror
Hi, I relies on some compiler's check to reduce some simple coding issues, I use clang 18.1.6 for now. however "CFLAGS='-Wall -Werror ' ./configure" would fail, and if I run ' ./configure' directly, it is OK. I'm not sure why it happens. More details is below: (master)> echo $CC clang (master)> clang --version clang version 18.1.6 (https://gitee.com/mirrors/llvm-project.git 1118c2e05e67a36ed8ca250524525cdb66a55256) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/bin (master)> CFLAGS='-Wall -Werror ' ./configure checking for clang option to accept ISO C89... unsupported checking for clang option to accept ISO C99... unsupported configure: error: C compiler "clang" does not support C99 In config.log, we can see: configure:4433: clang -qlanglvl=extc89 -c -Wall -Werror conftest.c >&5 clang: error: unknown argument: '-qlanglvl=extc89' and clang does doesn't support -qlanglvl. in 'configure', we can see the related code is: """ for ac_arg in '' -std=gnu99 -std=c99 -c99 -AC99 -D_STDC_C99= -qlanglvl=extc99 do CC="$ac_save_CC $ac_arg" if ac_fn_c_try_compile "$LINENO"; then : ac_cv_prog_cc_c99=$ac_arg fi rm -f core conftest.err conftest.$ac_objext test "x$ac_cv_prog_cc_c99" != "xno" && break done rm -f conftest.$ac_ext CC=$ac_save_CC # Error out if the compiler does not support C99, as the codebase # relies on that. if test "$ac_cv_prog_cc_c99" = no; then as_fn_error $? "C compiler \"$CC\" does not support C99" "$LINENO" 5 fi """ So my questions are: 1. based on the fact clang doesn't support '-qlanglvl' all the time, why removing the CFLAGS matters. 2. If you are using clang as well, what CFLAGS you use and it works? for example: IIRC, clang doesn't report error when a variable is set but no used by default, we have to add some extra flags to make it. -- Best Regards Andy Fan
minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions
hi. - 9.16.2.1.1. Boolean Predicate Check Expressions As an extension to the SQL standard, a PostgreSQL path expression can be a Boolean predicate, whereas the SQL standard allows predicates only within filters. While SQL-standard path expressions return the relevant element(s) of the queried JSON value, predicate check expressions return the single three-valued result of the predicate: true, false, or unknown. For example, we could write this SQL-standard filter expression: - slight inconsistency, "SQL-standard" versus "SQL standard" "path expression can be a Boolean predicate", why capital "Boolean"? "predicate check expressions return the single three-valued result of the predicate: true, false, or unknown." "unknown" is wrong, because `select 'unknown'::jsonb;` will fail. here "unknown" should be "null"? see jsonb_path_query doc entry also.
Re: An inefficient query caused by unnecessary PlaceHolderVar
On Mon, Jan 15, 2024 at 1:50 PM Richard Guo wrote: > Updated this patch over 29f114b6ff, which indicates that we should apply > the same rules for PHVs. Here is a new rebase of this patch, with some tweaks to comments. I've also updated the commit message to better explain the context. To recap, this patch tries to avoid wrapping Vars and PHVs from subquery output that are lateral references to something outside the subquery. Typically this kind of wrapping is necessary when the Var/PHV references the non-nullable side of the outer join from the nullable side, because we need to ensure that it is evaluated at the right place and hence is forced to null when the outer join should do so. But if the referenced rel is under the same lowest nulling outer join, we can actually omit the wrapping. The PHVs generated from such kind of wrapping imply lateral dependencies and force us to resort to nestloop joins, so we'd better get rid of them. This patch performs this check by remembering the relids of the nullable side of the lowest outer join the subquery is within. So I think it improves the overall plan in the related cases with very little extra cost. Thanks Richard v4-0001-Avoid-unnecessary-wrapping-for-Vars-and-PHVs.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday. == src/test/subscription/t/011_generated.pl 1. Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I don't think so. ~~~ 2. +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); +is( $result, qq(4|8 +5|10), 'confirm generated columns ARE replicated when the subscriber-side column is not generated'); + +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); + +$node_publisher->wait_for_catchup('sub3'); + +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); +is( $result, qq(4|24 +5|25), 'confirm generated columns are NOT replicated when the subscriber-side column is also generated'); + It would be prudent to do explicit "SELECT a,b" instead of "SELECT *", for readability and to avoid any surprises. == Kind Regards, Peter Smith. Fujitsu Australia.
Re: improve predefined roles documentation
On Tue, Jun 18, 2024 at 9:52 AM Nathan Bossart wrote: > On Mon, Jun 17, 2024 at 02:10:22PM -0400, Robert Haas wrote: > > On Thu, Jun 13, 2024 at 3:48 PM Nathan Bossart > wrote: > >> I think we could improve matters by abandoning the table and instead > >> documenting these roles more like we document GUCs, i.e., each one has a > >> section below it where we can document it in as much detail as we want. > >> Some of these roles should probably be documented together (e.g., > >> pg_read_all_data and pg_write_all_data), so the ordering is unlikely to > be > >> perfect, but I'm hoping it would still be a net improvement. > > > > +1. I'm not sure about all of the details, but I like the general idea. > > Here is a first try. I did pretty much exactly what I proposed in the > quoted text, so I don't have much else to say about it. I didn't see an > easy way to specify multiple ids and xreflabels for a given entry, so the > entries that describe multiple roles just use the name of the first role > listed. In practice, I think this just means you need to do a little extra > work when linking to one of the other roles from elsewhere in the docs, > which doesn't seem too terrible. > > I like this. Losing the table turned out to be ok. Thank you. I would probably put pg_monitor first in the list. + A user granted this role cannot however send signals to a backend owned by a superuser. Remove "however", or put commas around it. I prefer the first option. Do we really need to repeat "even without having it explicitly" everywhere? + This role does not have the role attribute BYPASSRLS set. Even if it did, that attribute isn't inherited anyway... "This role is still governed by any row level security policies that may be in force. Consider setting the BYPASSRLS attribute on member roles." (assuming they intend it to be ALL data then doing the bypassrls even if they are not today using it doesn't hurt) pg_stat_scan_tables - This explanation leaves me wanting more. Maybe give an example of such a function? I think the bar is set a bit too high just talking about a specific lock level. "As these roles are able to access any file on the server file system," We forbid running under root so this isn't really true. They do have operating system level access logged in as the database process owner. They are able to access all PostgreSQL files on the server file system and usually can run a wide-variety of commands on the server. "access, therefore great care should be taken" I would go with: "access. Great care should be taken" Seems more impactful as its own sentence then at the end of a long multi-part sentence. "server with COPY any other file-access functions." - s/with/using/ David J.
Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions
On Thu, Jun 20, 2024 at 7:30 PM jian he wrote: > "predicate check expressions return the single three-valued result of > the predicate: true, false, or unknown." > "unknown" is wrong, because `select 'unknown'::jsonb;` will fail. > here "unknown" should be "null"? see jsonb_path_query doc entry also. > > The syntax for json_exists belies this claim (assuming our docs are accurate there). Its "on error" options are true/false/unknown. Additionally, the predicate test operator is named "is unknown" not "is null". The result of the predicate test, which is never produced as a value, only a concept, is indeed "unknown" - which then devolves to false when it is practically applied to determining whether to output the path item being tested. As it does also when used in a parth expression. postgres=# select json_value('[null]','$[0] < 1'); json_value f postgres=# select json_value('[null]','$[0] == null'); json_value t Not sure how to peek inside the jsonpath system here though... postgres=# select json_value('[null]','($[0] < 1) == null'); ERROR: syntax error at or near "==" of jsonpath input LINE 1: select json_value('[null]','($[0] < 1) == null'); I am curious if that produces true (the unknown is left as null) or false (the unknown becomes false immediately). David J.
Re: configure error when CFLAGS='-Wall -Werror
Andy Fan writes: > I relies on some compiler's check to reduce some simple coding issues, I > use clang 18.1.6 for now. however "CFLAGS='-Wall -Werror ' ./configure" > would fail, Nope, you cannot do that: -Werror breaks many of configure's tests. See https://www.postgresql.org/docs/current/install-make.html#CONFIGURE-ENVVARS for the standard workaround. regards, tom lane
Re: Add pg_get_acl() function get the ACL for a database object
On Thu, Jun 20, 2024 at 08:32:57AM +0200, Joel Jacobson wrote: > I've added overloaded versions for regclass and regproc so far: > > \df pg_get_acl > List of functions >Schema |Name| Result data type | Argument data types | Type > ++--++-- > pg_catalog | pg_get_acl | aclitem[]| classid oid, objid oid | func > pg_catalog | pg_get_acl | aclitem[]| objid regclass | func > pg_catalog | pg_get_acl | aclitem[]| objid regproc | func > (3 rows) Interesting idea. I am not really convinced that the regproc and regclass overloads are really necessary, considering the fact that one of the goals mentioned, as far as I understand, is to be able to get an idea of the ACLs associated to an object with its dependencies in pg_depend and/or pg_shdepend. Another one is to reduce the JOIN burden when querying a set of them, like attribute ACLs. Perhaps the documentation should add one or two examples to show this point? +tup = get_catalog_object_by_oid(rel, Anum_oid, objectId); +if (!HeapTupleIsValid(tup)) +elog(ERROR, "cache lookup failed for object %u of catalog \"%s\"", +objectId, RelationGetRelationName(rel)); get_catalog_object_by_oid() is handled differently here than in functions line pg_identify_object(). Shouldn't we return NULL for this case? That would be more useful when using this function with one or more large scans. -- Michael signature.asc Description: PGP signature
Re: Pgoutput not capturing the generated columns
Hi, here are some review comments for patch v9-0002. == src/backend/replication/logical/relation.c 1. logicalrep_rel_open - if (attr->attisdropped) + if (attr->attisdropped || + (!MySubscription->includegencols && attr->attgenerated)) You replied to my question from the previous review [1, #2] as follows: In case 'include_generated_columns' is 'true'. column list in remoterel will have an entry for generated columns. So, in this case if we skip every attr->attgenerated, we will get a missing column error. ~ TBH, the reason seems very subtle to me. Perhaps that "(!MySubscription->includegencols && attr->attgenerated))" condition should be coded as a separate "if", so then you can include a comment similar to your answer, to explain it. == src/backend/replication/logical/tablesync.c make_copy_attnamelist: NITPICK - punctuation in function comment NITPICK - add/reword some more comments NITPICK - rearrange comments to be closer to the code they are commenting ~ 2. make_copy_attnamelist. + /* + * Construct column list for COPY. + */ + for (int i = 0; i < rel->remoterel.natts; i++) + { + if(!gencollist[i]) + attnamelist = lappend(attnamelist, + makeString(rel->remoterel.attnames[i])); + } IIUC isn't this assuming that the attribute number (aka column order) is the same on the subscriber side (e.g. gencollist idx) and on the publisher side (e.g. remoterel.attnames[i]). AFAIK logical replication does not require this ordering must be match like that, therefore I am suspicious this new logic is accidentally imposing new unwanted assumptions/restrictions. I had asked the same question before [1-#4] about this code, but there was no reply. Ideally, there would be more test cases for when the columns (including the generated ones) are all in different orders on the pub/sub tables. ~~~ 3. General - varnames. It would help with understanding if the 'attgenlist' variables in all these functions are re-named to make it very clear that this is referring to the *remote* (publisher-side) table genlist, not the subscriber table genlist. ~~~ 4. + int i = 0; + appendStringInfoString(&cmd, "COPY (SELECT "); - for (int i = 0; i < lrel.natts; i++) + foreach_ptr(ListCell, l, attnamelist) { - appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i])); - if (i < lrel.natts - 1) + appendStringInfoString(&cmd, quote_identifier(strVal(l))); + if (i < attnamelist->length - 1) appendStringInfoString(&cmd, ", "); + i++; } 4a. I think the purpose of the new macros is to avoid using ListCell, and also 'l' is an unhelpful variable name. Shouldn't this code be more like: foreach_node(String, att_name, attnamelist) ~ 4b. The code can be far simpler if you just put the comma (", ") always up-front except the *first* iteration, so you can avoid checking the list length every time. For example: if (i++) appendStringInfoString(&cmd, ", "); == src/test/subscription/t/011_generated.pl 5. General. Hmm. This patch 0002 included many formatting changes to tables tab2 and tab3 and subscriptions sub2 and sub3 but they do not belong here. The proper formatting for those needs to be done back in patch 0001 where they were introduced. Patch 0002 should just concentrate only on the new stuff for patch 0002. ~ 6. CREATE TABLES would be better in pairs IMO it will be better if the matching CREATE TABLE for pub and sub are kept together, instead of separating them by doing all pub then all sub. I previously made the same comment for patch 0001, so maybe it will be addressed next time... ~ 7. SELECT * +$result = + $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a"); It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT *", for readability and so there are no surprises. == 99. Please also refer to my attached nitpicks diff for numerous cosmetic changes, and apply if you agree. == [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 0fbf661..ddeb6a8 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -694,7 +694,7 @@ process_syncing_tables(XLogRecPtr current_lsn) /* * Create list of columns for COPY based on logical relation mapping. Do not - * include generated columns, of the subscription table, in the column list. + * include generated columns of the subscription table in the column list. */ static List * make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *attgenlist) @@ -706,6 +706,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *attgenlist) desc = RelationGetDescr(rel->localrel); gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool)); + /* Loop to handle subscription table generated columns. */ for (int i = 0; i < des
Re: Add pg_get_acl() function get the ACL for a database object
Michael Paquier writes: > On Thu, Jun 20, 2024 at 08:32:57AM +0200, Joel Jacobson wrote: >> I've added overloaded versions for regclass and regproc so far: >> >> \df pg_get_acl >> List of functions >> Schema |Name| Result data type | Argument data types | Type >> ++--++-- >> pg_catalog | pg_get_acl | aclitem[]| classid oid, objid oid | func >> pg_catalog | pg_get_acl | aclitem[]| objid regclass | func >> pg_catalog | pg_get_acl | aclitem[]| objid regproc | func >> (3 rows) > Interesting idea. Doesn't that result in "cannot resolve ambiguous function call" failures? regards, tom lane
Re: Add pg_get_acl() function get the ACL for a database object
On Thu, 20 Jun 2024 at 02:33, Joel Jacobson wrote: > On Wed, Jun 19, 2024, at 16:23, Isaac Morland wrote: > > I have no idea how often this would be useful, but I wonder if it could > > work to have overloaded single-parameter versions for each of > > regprocedure (pg_proc.proacl), regclass (pg_class.relacl), …. To call, > > just cast the OID to the appropriate reg* type. > > > > For example: To get the ACL for table 'example_table', call pg_get_acl > > ('example_table'::regclass) > > +1 > > New patch attached. > > I've added overloaded versions for regclass and regproc so far: > > \df pg_get_acl > List of functions >Schema |Name| Result data type | Argument data types | Type > > ++--++-- > pg_catalog | pg_get_acl | aclitem[]| classid oid, objid oid | func > pg_catalog | pg_get_acl | aclitem[]| objid regclass | func > pg_catalog | pg_get_acl | aclitem[]| objid regproc | func > (3 rows) Those were just examples. I think for completeness there should be 5 overloads: [input type] → [relation.aclattribute] regproc/regprocedure → pg_proc.proacl regtype → pg_type.typacl regclass → pg_class.relacl regnamespace → pg_namespace.nspacl I believe the remaining reg* types don't correspond to objects with ACLs, and the remaining ACL fields are for objects which don't have a corresponding reg* type. In general I believe the reg* types are underutilized. All over the place I see examples where people write code to generate SQL statements and they take schema and object name and then format with %I.%I when all that is needed is a reg* value and then format it with a simple %s (of course, need to make sure the SQL will execute with the same search_path as when the SQL was generated, or generate with an empty search_path).
Re: Add pg_get_acl() function get the ACL for a database object
On Thu, 20 Jun 2024 at 23:44, Tom Lane wrote: > Michael Paquier writes: > > On Thu, Jun 20, 2024 at 08:32:57AM +0200, Joel Jacobson wrote: > >> I've added overloaded versions for regclass and regproc so far: > >> > >> \df pg_get_acl > >> List of functions > >> Schema |Name| Result data type | Argument data types | Type > >> > ++--++-- > >> pg_catalog | pg_get_acl | aclitem[]| classid oid, objid oid | > func > >> pg_catalog | pg_get_acl | aclitem[]| objid regclass | > func > >> pg_catalog | pg_get_acl | aclitem[]| objid regproc | > func > >> (3 rows) > > > Interesting idea. > > Doesn't that result in "cannot resolve ambiguous function call" > failures? If you try to pass an oid directly, as a value of type oid, you should get "function is not unique". But if you cast a string or numeric value to the appropriate reg* type for the object you are using, it should work fine. I have functions which reset object permissions on all objects in a specified schema back to the default state as if they had been freshly created which rely on this. They work very well, and allow me to have a privilege-granting script for each project which always fully resets all the privileges back to a known state.
Re: ON ERROR in json_query and the like
On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston wrote: > On Thu, Jun 20, 2024 at 5:22 PM Amit Langote wrote: >> >> >> Soft error handling *was* used for catching cast errors in the very >> early versions of this patch (long before I got involved and the >> infrastructure you mention got added). It was taken out after Pavel >> said [1] that he didn't like producing NULL instead of throwing an >> error. Not sure if Pavel's around but it would be good to know why he >> didn't like it at the time. >> > > I'm personally in the "make it error" camp but "make it conform to the > standard" is a stronger membership (in general). > > I see this note in your linked thread: > > > By the standard, it is implementation-defined whether JSON parsing errors > > should be caught by ON ERROR clause. > > Absent someone contradicting that claim I retract my position here and am > fine with failing if these "functions" are supplied with something that > cannot be cast to json. I'd document them like functions that accept json > with the implications that any casting to json happens before the function is > called and thus its arguments do not apply to that step. Thanks for that clarification. So, there are the following options: 1. Disallow anything but jsonb for context_item (the patch I posted yesterday) 2. Continue allowing context_item to be non-json character or utf-8 encoded bytea strings, but document that any parsing errors do not respect the ON ERROR clause. 3. Go ahead and fix implicit casts to jsonb so that any parsing errors respect ON ERROR (no patch written yet). David's vote seems to be 2, which is my inclination too. Markus' vote seems to be either 1 or 3. Anyone else? -- Thanks, Amit Langote
Re: Allow non-superuser to cancel superuser tasks.
On Fri, Jun 14, 2024 at 12:06:36PM +0500, Andrey M. Borodin wrote: > I’ve tried to dig into the test. > The problem is CV is allocated in > > inj_state = GetNamedDSMSegment("injection_points”, > > which seems to be destroyed in > > shmem_exit() calling dsm_backend_shutdown() > > This happens before we broadcast that sleep is over. > I think this might happen with any wait on injection point if it is > pg_terminate_backend()ed. Except if I am missing something, this is not a problem for a normal backend, for example with one using a `SELECT injection_points_run()`. > Is there way to wake up from CV sleep before processing actual termination? I am honestly not sure if this is worth complicating the sigjmp path of the autovacuum worker just for the sake of this test. It seems to me that it would be simple enough to move the injection point autovacuum-worker-start within the transaction block a few lines down in do_autovacuum(), no? -- Michael signature.asc Description: PGP signature
Re: Pluggable cumulative statistics
At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier wrote in > * The kind IDs may change across restarts, meaning that any stats data > associated to a custom kind is stored with the *name* of the custom > stats kind. Depending on the discussion happening here, I'd be open > to use the same concept as custom RMGRs, where custom kind IDs are > "reserved", fixed in time, and tracked in the Postgres wiki. It is > cheaper to store the stats this way, as well, while managing conflicts > across extensions available in the community ecosystem. I prefer to avoid having a central database if possible. If we don't intend to move stats data alone out of a cluster for use in another one, can't we store the relationship between stats names and numeric IDs (or index numbers) in a separate file, which is loaded just before and synced just after extension preloading finishes? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: ON ERROR in json_query and the like
pá 21. 6. 2024 v 2:22 odesílatel Amit Langote napsal: > On Fri, Jun 21, 2024 at 1:11 AM David G. Johnston > wrote: > > On Thu, Jun 20, 2024 at 2:19 AM Amit Langote > wrote: > >> On Mon, Jun 17, 2024 at 10:07 PM Markus Winand > wrote: > >> > > On 17.06.2024, at 08:20, Amit Langote > wrote: > >> > > Agree that the documentation needs to be clear about this. I'll > update > >> > > my patch at [1] to add a note next to table 9.16.3. SQL/JSON Query > >> > > Functions. > >> > > >> > Considering another branch of this thread [1] I think the > >> > "Supported Features” appendix of the docs should mention that as well. > >> > > >> > The way I see it is that the standards defines two overloaded > >> > JSON_QUERY functions, of which PostgreSQL will support only one. > >> > In case of valid JSON, the implied CAST makes it look as though > >> > the second variant of these function was supported as well but that > >> > illusion totally falls apart once the JSON is not valid anymore. > >> > > >> > I think it affects the following feature IDs: > >> > > >> > - T821, Basic SQL/JSON query operators > >> > For JSON_VALUE, JSON_TABLE and JSON_EXISTS > >> > - T828, JSON_QUERY > >> > > >> > Also, how hard would it be to add the functions that accept > >> > character strings? Is there, besides the effort, any thing else > >> > against it? I’m asking because I believe once released it might > >> > never be changed — for backward compatibility. > >> > >> Hmm, I'm starting to think that adding the implied cast to json wasn't > >> such a great idea after all, because it might mislead the users to > >> think that JSON parsing is transparent (respects ON ERROR), which is > >> what you are saying, IIUC. > >> > > > > Actually, the implied cast is exactly the correct thing to do here - the > issue is that we aren't using the newly added soft errors infrastructure > [1] to catch the result of that cast and instead produce whatever output on > error tells us to produce. This seems to be in the realm of doability so > we should try in the interest of being standard conforming. > > Soft error handling *was* used for catching cast errors in the very > early versions of this patch (long before I got involved and the > infrastructure you mention got added). It was taken out after Pavel > said [1] that he didn't like producing NULL instead of throwing an > error. Not sure if Pavel's around but it would be good to know why he > didn't like it at the time. > > I can look into making that work again, but that is not going to make > beta2. > > > I'd even argue to make this an open item unless and until the attempt > is agreed upon to have failed (or it succeeds of course). > > OK, adding an open item. > At this time, when I wrote this mail, I didn't exactly notice the standard, so broken format should be handled there too. In this time, there was no support for soft errors ever in Postgres, so handling broken formats was inconsistent. Standard describes format errors, but exactly doesn't describe if this is error like missing key or broken json format. Maybe wrongly, but intuitively for me, these errors are of different kinds and broken input data is a different case than missing key (but fully valid json). I didn't find the exact sentence in standard when I searched it (but it was four years ago). My position in this case is not extra strong. The original patch was written and tested to be compatible with Oracle (what is a strong argument and feature). On second hand, some things required subtransactioning what was wrong (soft errors were introduced later). The compatibility with Oracle is a strong argument, but Oracle by itself is not fully compatible with standard, and some cases are special (in Oracle) because empty string in Oracle is NULL, and then it is handled differently. In this time I had motivation to reduce the patch to "safe" minimum to be possible to accept it by committers. The patch was written in 2017 (I think). Handling broken format (input format) was one issue that I thought could be solved later. The main reason for my mail is fact, so Postgres and Oracle have DIFFERENT correct format of JSON! '{a:10}' is correct on Oracle, but not correct on Postgres. And with default ON ERROR NULL (what is default), then the Oracle returns 10, and Postgres NULL. I thought this can be very messy and better to just raise an exception. Regards Pavel > -- > Thanks, Amit Langote > [1] > https://www.postgresql.org/message-id/CAFj8pRCnzO2cnHi5ebXciV%3DtuGVvAQOW9uPU%2BDQV1GkL31R%3D-g%40mail.gmail.com >
Re: Pluggable cumulative statistics
On Fri, Jun 21, 2024 at 01:09:10PM +0900, Kyotaro Horiguchi wrote: > At Thu, 13 Jun 2024 16:59:50 +0900, Michael Paquier > wrote in >> * The kind IDs may change across restarts, meaning that any stats data >> associated to a custom kind is stored with the *name* of the custom >> stats kind. Depending on the discussion happening here, I'd be open >> to use the same concept as custom RMGRs, where custom kind IDs are >> "reserved", fixed in time, and tracked in the Postgres wiki. It is >> cheaper to store the stats this way, as well, while managing conflicts >> across extensions available in the community ecosystem. > > I prefer to avoid having a central database if possible. I was thinking the same originally, but the experience with custom RMGRs has made me change my mind. There are more of these than I thought originally: https://wiki.postgresql.org/wiki/CustomWALResourceManagers > If we don't intend to move stats data alone out of a cluster for use > in another one, can't we store the relationship between stats names > and numeric IDs (or index numbers) in a separate file, which is loaded > just before and synced just after extension preloading finishes? Yeah, I've implemented a prototype that does exactly something like that with a restriction on the stats name to NAMEDATALEN, except that I've added the kind ID <-> kind name mapping at the beginning of the main stats file. At the end, it still felt weird and over-engineered to me, like the v1 prototype of upthread, because we finish with a strange mix when reloading the dshash where the builtin ID are handled with fixed values, with more code paths required when doing the serialize callback dance for stats kinds like replication slots, because the custom kinds need to update their hash keys to the new values based on the ID/name mapping stored at the beginning of the file itself. The equation complicates itself a bit more once you'd try to add more ways to write some stats kinds to other places, depending on what a custom kind wants to achieve. I can see the benefits of both approaches, still fixing the IDs in time leads to a lot of simplicity in this infra, which is very appealing on its own before tackling the next issues where I would rely on the proposed APIs. -- Michael signature.asc Description: PGP signature
Re: ON ERROR in json_query and the like
pá 21. 6. 2024 v 6:01 odesílatel Amit Langote napsal: > On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston > wrote: > > On Thu, Jun 20, 2024 at 5:22 PM Amit Langote > wrote: > >> > >> > >> Soft error handling *was* used for catching cast errors in the very > >> early versions of this patch (long before I got involved and the > >> infrastructure you mention got added). It was taken out after Pavel > >> said [1] that he didn't like producing NULL instead of throwing an > >> error. Not sure if Pavel's around but it would be good to know why he > >> didn't like it at the time. > >> > > > > I'm personally in the "make it error" camp but "make it conform to the > standard" is a stronger membership (in general). > > > > I see this note in your linked thread: > > > > > By the standard, it is implementation-defined whether JSON parsing > errors > > > should be caught by ON ERROR clause. > > > > Absent someone contradicting that claim I retract my position here and > am fine with failing if these "functions" are supplied with something that > cannot be cast to json. I'd document them like functions that accept json > with the implications that any casting to json happens before the function > is called and thus its arguments do not apply to that step. > > Thanks for that clarification. > > So, there are the following options: > > 1. Disallow anything but jsonb for context_item (the patch I posted > yesterday) > > 2. Continue allowing context_item to be non-json character or utf-8 > encoded bytea strings, but document that any parsing errors do not > respect the ON ERROR clause. > > 3. Go ahead and fix implicit casts to jsonb so that any parsing errors > respect ON ERROR (no patch written yet). > > David's vote seems to be 2, which is my inclination too. Markus' vote > seems to be either 1 or 3. Anyone else? > @3 can be possibly messy (although be near Oracle or standard). I don't think it is safe - one example '{a:10}' is valid for Oracle, but not for Postgres, and using @3 impacts different results (better to raise an exception). The effect of @1 and @2 is similar - @1 is better so the user needs to explicitly cast, so maybe it is cleaner, so the cast should not be handled, @2 is more user friendly, because it accepts unknown string literal. From a developer perspective I prefer @1, from a user perspective I prefer @2. Maybe @2 is a good compromise. Regards Pavel > > -- > Thanks, Amit Langote >
Re: ON ERROR in json_query and the like
On Thursday, June 20, 2024, Pavel Stehule wrote: > > > pá 21. 6. 2024 v 6:01 odesílatel Amit Langote > napsal: > >> On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston >> wrote: >> >> > > By the standard, it is implementation-defined whether JSON parsing >> errors >> > > should be caught by ON ERROR clause. >> > >> > Absent someone contradicting that claim I retract my position here and >> am fine with failing if these "functions" are supplied with something that >> cannot be cast to json. I'd document them like functions that accept json >> with the implications that any casting to json happens before the function >> is called and thus its arguments do not apply to that step. >> >> Thanks for that clarification. >> >> So, there are the following options: >> >> 1. Disallow anything but jsonb for context_item (the patch I posted >> yesterday) >> >> 2. Continue allowing context_item to be non-json character or utf-8 >> encoded bytea strings, but document that any parsing errors do not >> respect the ON ERROR clause. >> >> 3. Go ahead and fix implicit casts to jsonb so that any parsing errors >> respect ON ERROR (no patch written yet). >> >> David's vote seems to be 2, which is my inclination too. Markus' vote >> seems to be either 1 or 3. Anyone else? >> > > @3 can be possibly messy (although be near Oracle or standard). I don't > think it is safe - one example '{a:10}' is valid for Oracle, but not for > Postgres, and using @3 impacts different results (better to raise an > exception). > > The effect of @1 and @2 is similar - @1 is better so the user needs to > explicitly cast, so maybe it is cleaner, so the cast should not be handled, > @2 is more user friendly, because it accepts unknown string literal. From a > developer perspective I prefer @1, from a user perspective I prefer @2. > Maybe @2 is a good compromise. > 2 also has the benefit of being standard conforming while 1 does not. 3 is also conforming and I wouldn’t object to it had we already done it that way. But since 2 is conforming too and implemented, and we are in beta, I'm thinking we need to go with this option. David J.
Re: Pgoutput not capturing the generated columns
Hi Shubham/Shlok. FYI, my patch describing the current PG17 behaviour of logical replication of generated columns was recently pushed [1]. Note that this will have some impact on your patch set. e.g. You are changing the current replication behaviour, so the "Generated Columns" section note will now need to be modified by your patches. == [1] https://github.com/postgres/postgres/commit/7a089f6e6a14ca3a5dc8822c393c6620279968b9 [2] Kind Regards, Peter Smith. Fujitsu Australia
Re: ON ERROR in json_query and the like
> On 21.06.2024, at 03:00, David G. Johnston wrote: > > On Thu, Jun 20, 2024 at 5:22 PM Amit Langote wrote: > > Soft error handling *was* used for catching cast errors in the very > early versions of this patch (long before I got involved and the > infrastructure you mention got added). It was taken out after Pavel > said [1] that he didn't like producing NULL instead of throwing an > error. Not sure if Pavel's around but it would be good to know why he > didn't like it at the time. > > > I'm personally in the "make it error" camp but "make it conform to the > standard" is a stronger membership (in general). > > I see this note in your linked thread: > > > By the standard, it is implementation-defined whether JSON parsing errors > > should be caught by ON ERROR clause. > > Absent someone contradicting that claim I retract my position here and am > fine with failing if these "functions" are supplied with something that > cannot be cast to json. I'd document them like functions that accept json > with the implications that any casting to json happens before the function is > called and thus its arguments do not apply to that step. That claim was also made in 2020, before the current (2023) SQL standard was released — yet it might have been the same. My understanding of the 2023 standard is that ON ERROR covers invalid JSON because the conversion from a character string to JSON is done deeply nested inside the JSON_QUERY & Co functions. 9.47 Processing Function GR 3 triggers 9.46, “SQL/JSON path language: syntax and semantics” Where GR 11 says: GR 11) The result of evaluating a is a completion condition, and, if that completion condition is successful completion (0), then an SQL/JSON sequence. For conciseness, the result will be stated either as an exception condition or as an SQL/JSON sequence (in the latter case, the completion condition successful completion (0) is implicit). Unsuccessful completion conditions are not automatically raised and do not terminate application of the General Rules in this Subclause. a) If JPCV is specified, then Case: • i) If PARSED is True, then the result of evaluating JPCV is JT. • ii) If the declared type of JT is JSON, then the result of evaluating JPCV is JT. • iii) Otherwise: • 1) The General Rules of Subclause 9.42, “Parsing JSON text”, are applied with JT as JSON TEXT, an implementation-defined (IV185) as UNIQUENESS CONSTRAINT, and FO as FORMAT OPTION; let ST be the STATUS and let CISJI be the SQL/JSON ITEM returned from the application of those General Rules. • 2) Case: • A) If ST is not successful completion (0), then the result of evaluating JPCV is ST. • B) Otherwise, the result of evaluating JPCV is CISJI. In case of an exception, it is passed along to clause 9.44 Converting an SQL/JSON sequence to an SQL/JSON item where GR 5b ultimately says (the exception is in TEMPST in the meanwhile): —— • b) If TEMPST is an exception condition, then Case: i) If ONERROR is ERROR, then let OUTST be TEMPST. ii) Otherwise, let OUTST be successful completion (0). Case: • 1) If ONERROR is NULL, then let JV be the null value. • 2) If ONERROR is EMPTY ARRAY, then let JV be an SQL/JSON array that has no SQL/JSON elements. • 3) If ONERROR is EMPTY OBJECT, then let JV be an SQL/JSON object that has no SQL/JSON members. —— Let me know if I’m missing something here. The whole idea that a cast is implied outside of JSON_QUERY & co might be covered by a clause that generally allows implementations to cast as they like (don’t have the ref at hand, but I think such a clause is somewhere). On the other hand, the 2023 standard doesn’t even cover an **explicit** cast from character strings to JSON as per 6.13 SR 7 (that’ where the matrix of source- and destination types is given for cast). So my bet is this: * I’m pretty sure JSON parsing errors being subject to ON ERROR is conforming. That’s also “backed” by the Oracle and Db2 (LUW) implementations. * Implying a CAST might be ok, but I have doubts. * I don’t see how failing without being subject to ON ERRROR (as it is now in 17beta1) could possibly covered by the standard. But as we all know: the standard is confusing. If somebody thinks differently, references would be greatly appreciated. -markus
Re: ON ERROR in json_query and the like
> On 21.06.2024, at 06:46, David G. Johnston wrote: >> >> On Thursday, June 20, 2024, Pavel Stehule wrote: >> >> >> pá 21. 6. 2024 v 6:01 odesílatel Amit Langote >> napsal: >> On Fri, Jun 21, 2024 at 10:01 AM David G. Johnston >> wrote: >> >> > > By the standard, it is implementation-defined whether JSON parsing errors >> > > should be caught by ON ERROR clause. >> > >> > Absent someone contradicting that claim I retract my position here and am >> > fine with failing if these "functions" are supplied with something that >> > cannot be cast to json. I'd document them like functions that accept json >> > with the implications that any casting to json happens before the function >> > is called and thus its arguments do not apply to that step. >> >> Thanks for that clarification. >> >> So, there are the following options: >> >> 1. Disallow anything but jsonb for context_item (the patch I posted >> yesterday) >> >> 2. Continue allowing context_item to be non-json character or utf-8 >> encoded bytea strings, but document that any parsing errors do not >> respect the ON ERROR clause. >> >> 3. Go ahead and fix implicit casts to jsonb so that any parsing errors >> respect ON ERROR (no patch written yet). >> >> David's vote seems to be 2, which is my inclination too. Markus' vote >> seems to be either 1 or 3. Anyone else? With a very strong preference of 3. >> >> @3 can be possibly messy (although be near Oracle or standard). I don't >> think it is safe - one example '{a:10}' is valid for Oracle, but not for >> Postgres, and using @3 impacts different results (better to raise an >> exception). The question of what is valid JSON is a different question, I guess. My original report is about something that is invalid everywhere. Having that in line would be a start. Also I believe Oracle’s habit to accept unquoted object keys is not covered by the standard (unless defined as a JSON format and also explicitly using the corresponding FORMAT clause). >> The effect of @1 and @2 is similar - @1 is better so the user needs to >> explicitly cast, so maybe it is cleaner, so the cast should not be handled, >> @2 is more user friendly, because it accepts unknown string literal. From a >> developer perspective I prefer @1, from a user perspective I prefer @2. >> Maybe @2 is a good compromise. > > 2 also has the benefit of being standard conforming while 1 does not. Why do you think so? Do you have any references or is this just based on previous statements in this discussion? -markus