Re: New committer: Jacob Champion
On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz wrote: > The Core Team would like to extend our congratulations to Jacob > Champion, who has accepted an invitation to become our newest PostgreSQL > committer. > > Please join us in wishing Jacob much success and few reverts! Congratulations Jacob! Well deserved! Thanks Richard
Re: pg_dump --if-exists --clean when drop index that is partition of a partitioned index
Hi po 14. 4. 2025 v 7:54 odesílatel jian he napsal: > hi. > > CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b); > CREATE TABLE tp_1(c int, a int, b int); > ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1); > CREATE INDEX t_a_idx ON tp_1(a); > CREATE INDEX tp_a_idx ON tp(a); > > pg_dump --schema=public --if-exists --clean --no-statistics > --no-owner --no-data --table-and-children=tp > 1.sql > pg_dump output file 1.sql excerpts: > > DROP INDEX IF EXISTS public.t_a_idx; > DROP INDEX IF EXISTS public.tp_a_idx; > DROP TABLE IF EXISTS public.tp_1; > DROP TABLE IF EXISTS public.tp; > > if you execute the > DROP INDEX IF EXISTS public.t_a_idx; > > ERROR: cannot drop index t_a_idx because index tp_a_idx requires it > HINT: You can drop index tp_a_idx instead. > > Is this pg_dump output what we expected? > > It is a bug, I think, the implementation of these parts of code is older than partitioning support, and doesn't do necessary detach. regards Pavel > > SELECT * FROM pg_partition_tree('tp_a_idx'); > relid | parentrelid | isleaf | level > --+-++--- > tp_a_idx | | f | 0 > t_a_idx | tp_a_idx| t | 1 > (2 rows) > > >
Re: An incorrect check in get_memoize_path
On Mon, Apr 7, 2025 at 4:50 PM Richard Guo wrote: > Hence, I propose the attached patch for the fix. > > BTW, there is a XXX comment there saying that maybe we can make the > remaining join quals part of the inner scan's filter instead of the > join filter. I don't think this is possible in all cases. In the > above query, 'coalesce(t2.b) = t3.b' cannot be made part of t3's scan > filter, according to join_clause_is_movable_into. So I removed that > comment in the patch while we're here. > > Any thoughts? Here is an updated patch with a commit message. Regarding backporting, I'm inclined not to, given the lack of field reports. Any objections to pushing it? Thanks Richard v2-0001-Fix-an-incorrect-check-in-get_memoize_path.patch Description: Binary data
Re: Adding error messages to a few slash commands
Thanks for the feedback, attached an updated patch that changes most of those "Did not find" messages to empty tables. I did not change two sets, listDbRoleSettings and listTables both have comments describing that these are deliberately this way. I wanted to post this update to close this loop for now. I will follow up once the merge window opens again. Thanks On Sun, Apr 13, 2025 at 1:47 AM Pavel Luzanov wrote: > > On 13.04.2025 08:29, Tom Lane wrote: > > Abhishek Chanda writes: > > Currently, some slash commands in psql return an error saying "Did not > find any named " while some return an empty table. This patch > changes a few of the slash commands to return a similar error message. > > > +1 for this patch. > > Personally, if I were trying to make these things consistent, I'd have > gone in the other direction (ie return empty tables). > > > +1 > Returning empty tables is a more appropriate behavior. > > -- > Pavel Luzanov > Postgres Professional: https://postgrespro.com -- Thanks and regards Abhishek Chanda v2-0001-Print-empty-table-when-a-given-object-is-not-foun.patch Description: Binary data
Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)
Andrew Dunstan: On 2025-04-12 Sa 10:10 PM, Noah Misch wrote: On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote: With injection points enabled, I get the following errors in test_aio: [15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported: expected stderr [15:14:45.409](0.000s) [15:14:45.409](0.000s) # Failed test 'worker: first hard IO error is reported: expected stderr' # at t/001_aio.pl line 810. [15:14:45.409](0.000s) # 'psql::88: ERROR: could not read blocks 2..2 in file "base/5/16408": I/O error' # doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file \"base/.*\": Input/output error)' It seems like it's just the error message that is different and has "I/O" instead of "Input/output"? Looks like it. On a more general note, does enabling injection points make any sense here? Yes, it does. I see that coverage in the build farm is not very big. IIUC, those are a development tool, so might not be relevant, because nobody is developing on Alpine / musl? No, whether anyone develops on the platform is not a factor. One hasn't fully tested PostgreSQL until one builds with injection points. Here's a simple fix ... also removes some unnecessary escaping and leaning toothpick syndrome. Confirmed - this works! Thanks, Wolfgang
Re: Automatically sizing the IO worker pool
On 12/4/25 18:59, Thomas Munro wrote: It's hard to know how to set io_workers=3. Hmmm enable the below behaviour if "io_workers=auto" (default) ? Sometimes being able to set this kind of parameters manually helps tremendously with specific workloads... :S [snip] Here's a patch to replace that GUC with: io_min_workers=1 io_max_workers=8 io_worker_idle_timeout=60s io_worker_launch_interval=500ms Great as defaults / backwards compat with io_workers=auto. Sounds more user-friendly to me, at least [snip] Ideas, testing, flames etc welcome. Logic seems sound, if a bit daunting for inexperienced users --- well, maybe just a bit more than it is now, but ISTM evolution should try and flatten novices' learning curve, right? Just .02€, though. Thanks, -- Parkinson's Law: Work expands to fill the time alloted to it.
Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)
Hi, On April 13, 2025 7:27:33 PM GMT+02:00, Wolfgang Walther wrote: >Andrew Dunstan: >> >> On 2025-04-12 Sa 10:10 PM, Noah Misch wrote: >>> On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote: With injection points enabled, I get the following errors in test_aio: [15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported: expected stderr [15:14:45.409](0.000s) [15:14:45.409](0.000s) # Failed test 'worker: first hard IO error is reported: expected stderr' # at t/001_aio.pl line 810. [15:14:45.409](0.000s) # 'psql::88: ERROR: could not read blocks 2..2 in file "base/5/16408": I/O error' # doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file \"base/.*\": Input/output error)' It seems like it's just the error message that is different and has "I/O" instead of "Input/output"? >>> Looks like it. >>> On a more general note, does enabling injection points make any sense here? >>> Yes, it does. >>> I see that coverage in the build farm is not very big. IIUC, those are a development tool, so might not be relevant, because nobody is developing on Alpine / musl? >>> No, whether anyone develops on the platform is not a factor. One hasn't >>> fully >>> tested PostgreSQL until one builds with injection points. >>> >>> >> >> Here's a simple fix ... also removes some unnecessary escaping and leaning >> toothpick syndrome. > >Confirmed - this works! Thanks for testing and writing up a fix. Andrew, would you be ok applying it? I've been traveling the last 24h and should probably not handing sharp commit bits tonight. I'm not too surprised about failures like this, when writing the tests up I was worried about different formulations. But after seeing freebsd, glibc Linux, netbsd, openbsd windows all working the same I thought we were in the clear. Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Get rid of integer divide in FAST_PATH_REL_GROUP() macro
I noticed a while ago that the new fast-path locking code uses integer division to figure out the fast-path locking group slot. To me, this seems a bit unnecessary as FastPathLockGroupsPerBackend is always a power-of-two value, so we can use bitwise-AND instead. I don't think FAST_PATH_REL_GROUP() is in any particularly hot code paths, but still, having the divide in there isn't sitting well with me. Can we get rid of it? I've attached a patch for that. I also adjusted the method used to calculate FastPathLockGroupsPerBackend. Also, the Assert that was going on at the end of the loop in InitializeFastPathLocks() looked a little odd as it seems to be verifying something that the loop condition was checking already. I thought it was better to check that we end up with a power-of-two. Please see the attached patch. David v1-0001-Eliminate-integer-divide-in-fastpath-locking-code.patch Description: Binary data
Re: Accessing an invalid pointer in BufferManagerRelation structure
Hi, On Wed, Mar 26, 2025 at 2:14 PM Stepan Neretin wrote: > > The first thing we both noticed is that the macro calls a function that won't > be available without an additional header. This seems a bit inconvenient. Well, I rebased the patch onto the latest `master` (b51f86e49a7f119004c0ce5d0be89cdf98309141) and noticed that we don't need to include `rel.h` in `localbuf.c` directly anymore, because `#include lmgr.h` was added in memutils.h I guess it solves this issue. Please, see v3 patch. > I also have a question: is the logic correct that if the relation is valid, > we should fetch it rather than the other way around? Additionally, is > checking only the `rd_isvalid` flag sufficient, or should we also consider > the flag below? > > ``` > bool rd_isvalid; /* relcache entry is valid */ > I don't think that we should check any Relation's flags here. We are checking `RelationIsValid((bmr).rel) ?` to decide whether BufferManagerRelation was created via BMR_REL or BMR_SMGR. If the `rel` field is not NULL, we can definitely say that BMR_REL was used, so we should call RelationGetSmgr in order to access smgr. -- Best regards, Daniil Davydov From a1c572652e69f13b954ec3a915ed55c1ec11c772 Mon Sep 17 00:00:00 2001 From: Daniil Davidov Date: Mon, 14 Apr 2025 13:07:38 +0700 Subject: [PATCH v3] Add macros for safety access to smgr --- src/backend/storage/buffer/bufmgr.c | 55 --- src/backend/storage/buffer/localbuf.c | 9 +++-- src/include/storage/bufmgr.h | 13 +++ 3 files changed, 43 insertions(+), 34 deletions(-) diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 0b0e056eea2..f2b117088a8 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -892,11 +892,8 @@ ExtendBufferedRelBy(BufferManagerRelation bmr, Assert(bmr.smgr == NULL || bmr.relpersistence != 0); Assert(extend_by > 0); - if (bmr.smgr == NULL) - { - bmr.smgr = RelationGetSmgr(bmr.rel); + if (bmr.relpersistence == 0) bmr.relpersistence = bmr.rel->rd_rel->relpersistence; - } return ExtendBufferedRelCommon(bmr, fork, strategy, flags, extend_by, InvalidBlockNumber, @@ -928,11 +925,8 @@ ExtendBufferedRelTo(BufferManagerRelation bmr, Assert(bmr.smgr == NULL || bmr.relpersistence != 0); Assert(extend_to != InvalidBlockNumber && extend_to > 0); - if (bmr.smgr == NULL) - { - bmr.smgr = RelationGetSmgr(bmr.rel); + if (bmr.relpersistence == 0) bmr.relpersistence = bmr.rel->rd_rel->relpersistence; - } /* * If desired, create the file if it doesn't exist. If @@ -940,15 +934,15 @@ ExtendBufferedRelTo(BufferManagerRelation bmr, * an smgrexists call. */ if ((flags & EB_CREATE_FORK_IF_NEEDED) && - (bmr.smgr->smgr_cached_nblocks[fork] == 0 || - bmr.smgr->smgr_cached_nblocks[fork] == InvalidBlockNumber) && - !smgrexists(bmr.smgr, fork)) + (BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == 0 || + BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] == InvalidBlockNumber) && + !smgrexists(BMR_GET_SMGR(bmr), fork)) { LockRelationForExtension(bmr.rel, ExclusiveLock); /* recheck, fork might have been created concurrently */ - if (!smgrexists(bmr.smgr, fork)) - smgrcreate(bmr.smgr, fork, flags & EB_PERFORMING_RECOVERY); + if (!smgrexists(BMR_GET_SMGR(bmr), fork)) + smgrcreate(BMR_GET_SMGR(bmr), fork, flags & EB_PERFORMING_RECOVERY); UnlockRelationForExtension(bmr.rel, ExclusiveLock); } @@ -958,13 +952,13 @@ ExtendBufferedRelTo(BufferManagerRelation bmr, * kernel. */ if (flags & EB_CLEAR_SIZE_CACHE) - bmr.smgr->smgr_cached_nblocks[fork] = InvalidBlockNumber; + BMR_GET_SMGR(bmr)->smgr_cached_nblocks[fork] = InvalidBlockNumber; /* * Estimate how many pages we'll need to extend by. This avoids acquiring * unnecessarily many victim buffers. */ - current_size = smgrnblocks(bmr.smgr, fork); + current_size = smgrnblocks(BMR_GET_SMGR(bmr), fork); /* * Since no-one else can be looking at the page contents yet, there is no @@ -1008,7 +1002,7 @@ ExtendBufferedRelTo(BufferManagerRelation bmr, if (buffer == InvalidBuffer) { Assert(extended_by == 0); - buffer = ReadBuffer_common(bmr.rel, bmr.smgr, bmr.relpersistence, + buffer = ReadBuffer_common(bmr.rel, BMR_GET_SMGR(bmr), bmr.relpersistence, fork, extend_to - 1, mode, strategy); } @@ -2568,10 +2562,10 @@ ExtendBufferedRelCommon(BufferManagerRelation bmr, BlockNumber first_block; TRACE_POSTGRESQL_BUFFER_EXTEND_START(fork, - bmr.smgr->smgr_rlocator.locator.spcOid, - bmr.smgr->smgr_rlocator.locator.dbOid, - bmr.smgr->smgr_rlocator.locator.relNumber, - bmr.smgr->smgr_rlocator.backend, + BMR_GET_SMGR(bmr)->smgr_rlocator.locator.spcOid, + BMR_GET_SMGR(bmr)->smgr_rlocator.locator.dbOid, + BMR_GET_SMGR(bmr)->smgr_rlocator.locator.relNumber, + BMR_GET_SMGR(bmr)->smgr_rlocator.backend, extend_by);
RE: Add pg_get_injection_points() for information of injection points
Dear Michael, Thanks for creating the patch! Let me confirm two points: Apart from functions related with injection_points, this can be called even when postgres has been built with -Dinjection_points=false. This is intentional because this function does not have any side-effect and just refers the status. Is my understanding correct? I'm not sure it is directly related, but ISTM there are no direct ways to check whether the injection_points is enabled or not. How do you think adding the function? Regarding the internal of the patch, it could be crashed when two points are attached and then first one is detached [1]. I think we should not use "idx" for the result array - PSA the fix. [1]: ``` SELECT injection_points_attach('TestInjectionLog', 'notice'); injection_points_attach - (1 row) SELECT injection_points_attach('TestInjectionError', 'error'); injection_points_attach - (1 row) SELECT injection_points_detach('TestInjectionLog'); injection_points_detach - (1 row) SELECT name, library, function FROM pg_get_injection_points() ORDER BY name COLLATE "C"; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost ``` Best regards, Hayato Kuroda FUJITSU LIMITED diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c index bf1a49f7472..ea7e6dba521 100644 --- a/src/backend/utils/misc/injection_point.c +++ b/src/backend/utils/misc/injection_point.c @@ -596,6 +596,7 @@ InjectionPointList(uint32 *num_points) #ifdef USE_INJECTION_POINTS InjectionPointData *result; uint32 max_inuse; + int cur_pos = 0; LWLockAcquire(InjectionPointLock, LW_SHARED); @@ -619,14 +620,16 @@ InjectionPointList(uint32 *num_points) if (generation % 2 == 0) continue; - result[idx].name = pstrdup(entry->name); - result[idx].library = pstrdup(entry->library); - result[idx].function = pstrdup(entry->function); - (*num_points)++; + result[cur_pos].name = pstrdup(entry->name); + result[cur_pos].library = pstrdup(entry->library); + result[cur_pos].function = pstrdup(entry->function); + cur_pos++; } LWLockRelease(InjectionPointLock); + *num_points = cur_pos; + return result; #else
Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
On Sun, Apr 13, 2025 at 11:51:57AM -0400, Tom Lane wrote: > Noah Misch writes: > > Tom and Michael, do you still object to the test addition, or not? If there > > are no new or renewed objections by 2025-04-20, I'll proceed to add the > > test. > > While I still don't love it, I don't have a better proposal. No objections from here to use your proposal for the time being on all the branches. I am planning to spend some cycles thinking about a better alternative, but I doubt that this will be portable down to v13. -- Michael signature.asc Description: PGP signature
Re: Automatically sizing the IO worker pool
On Mon, Apr 14, 2025 at 5:45 AM Jose Luis Tallon wrote: > On 12/4/25 18:59, Thomas Munro wrote: > > It's hard to know how to set io_workers=3. > > Hmmm enable the below behaviour if "io_workers=auto" (default) ? Why not just delete io_workers? If you really want a fixed number, you can set io_min_workers==io_max_workers. What should io_max_workers default to? I guess it could be pretty large without much danger, but I'm not sure. If it's a small value, an overloaded storage system goes through two stages: first it fills the queue up with a backlog of requests until it overflows because the configured maximum of workers isn't keeping up, and then new submissions start falling back to synchronous IO, sort of jumping ahead of the queued backlog, but also stalling if the real reason is that the storage itself isn't keeping up. Whether it'd be better for the IO worker pool to balloon all the way up to 32 processes (an internal limit) if required to try to avoid that with default settings, I'm not entirely sure. Maybe? Why not at least try to get all the concurrency possible, before falling back to synchronous? Queued but not running IOs seem to be strictly worse than queued but not even trying to run. I'd be interested to hear people's thoughts and experiences actually trying different kinds of workloads on different kinds of storage. Whether adding more concurrency actually helps or just generates a lot of useless new processes before the backpressure kicks in depends on why it's not keeping up, eg hitting IOPS, throughput or concurrency limits in the storage. In later work I hope we can make higher levels smarter about understanding whether requesting more concurrency helps or hurts with feedback (that's quite a hard problem that some of my colleagues have been looking into), but the simpler question here seems to be: should this fairly low level system-wide setting ship with a default that includes any preconceived assumptions about that? It's superficially like max_parallel_workers, which ships with a default of 8, and that's basically where I plucked that 8 from in the current patch for lack of a serious idea to propose yet. But it's also more complex than CPU: you know how many cores you have and you know things about your workload, but even really small "on the metal" systems probably have a lot more concurrent I/O capacity -- perhaps depending on the type of operation! (and so far we only have reads) -- than CPU cores. Especially once you completely abandon the idea that anyone runs databases on spinning rust in modern times, even on low end systems, which I think we've more or less agreed to assume these days with related changes such as the recent *_io_concurrency default change (1->16). It's actually pretty hard to drive a laptop up to needing more half a dozen or a dozen or a dozen or so workers with this patch for especially without debug_io_direct=data ie with fast double-buffered I/O, but cloud environments may also be where most databases run these days, and low end cloud configurations have arbitrary made up limits that may be pretty low, so it all depends I really don't know, but one idea is that we could leave it open as possible, and let users worry about that with higher-level settings and the query concurrency they choose to generate... io_method=io_uring is effectively open, so why should io_method=worker be any different by default? Just some thoughts. I'm not sure.
Add pg_get_injection_points() for information of injection points
Hi all, One thing that's been lacking in injection points is the possibility to look at the state of the injection points in shared memory through some SQL. This was on my tablets for some time, but I have not taken the time to do the actual legwork. The attached patch adds a SRF that returns a set of tuples made of the name, library and function for all the injection points attached to the system. This implementation relies on a new function added in injection_point.c, called InjectionPointList(), that retrieves a palloc()'d array of the injection points, hiding from the callers the internals of what this stuff does with the shmem array lookup. This is useful for monitoring or in tests, to make sure for example that nothing is left around at the end of a script. I have a different proposal planned for this area of the code, where this function would be good to have, but I am sending an independent patch as this stuff is useful on its own. The patch includes a couple of tests and some documentation. Thanks, -- Michael From 8865e2ca3b1b6291e94d2a5476926095245a4e2f Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 14 Apr 2025 09:26:52 +0900 Subject: [PATCH] Add pg_get_injection_points() This is a system function that displays the information about the injection points currently attached to the system, feeding from the states of things in shared memory. --- src/include/catalog/pg_proc.dat | 8 +++ src/include/utils/injection_point.h | 14 + src/backend/utils/misc/Makefile | 1 + src/backend/utils/misc/injection_point.c | 50 .../utils/misc/injection_point_funcs.c| 59 +++ src/backend/utils/misc/meson.build| 1 + .../expected/injection_points.out | 16 + .../injection_points/sql/injection_points.sql | 7 +++ src/test/regress/expected/misc_functions.out | 7 +++ src/test/regress/sql/misc_functions.sql | 3 + doc/src/sgml/func.sgml| 46 +++ src/tools/pgindent/typedefs.list | 1 + 12 files changed, 213 insertions(+) create mode 100644 src/backend/utils/misc/injection_point_funcs.c diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 62beb71da288..d087d9fda1e8 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12566,4 +12566,12 @@ proargnames => '{pid,io_id,io_generation,state,operation,off,length,target,handle_data_len,raw_result,result,target_desc,f_sync,f_localmem,f_buffered}', prosrc => 'pg_get_aios' }, +# Injection point functions +{ oid => '8490', descr => 'information about injection points attached', + proname => 'pg_get_injection_points', prorows => '3', proretset => 't', + provolatile => 'v', proparallel => 'r', prorettype => 'record', + proargtypes => '', proallargtypes => '{text,text,text}', + proargmodes => '{o,o,o}', proargnames => '{name,library,function}', + prosrc => 'pg_get_injection_points' }, + ] diff --git a/src/include/utils/injection_point.h b/src/include/utils/injection_point.h index 6ba64cd1ebc6..c5c78914b848 100644 --- a/src/include/utils/injection_point.h +++ b/src/include/utils/injection_point.h @@ -11,6 +11,17 @@ #ifndef INJECTION_POINT_H #define INJECTION_POINT_H +/* + * Injection point data, used when retrieving a list of all the attached + * injection points. + */ +typedef struct InjectionPointData +{ + const char *name; + const char *library; + const char *function; +} InjectionPointData; + /* * Injection points require --enable-injection-points. */ @@ -46,6 +57,9 @@ extern void InjectionPointCached(const char *name); extern bool IsInjectionPointAttached(const char *name); extern bool InjectionPointDetach(const char *name); +/* Get the current set of injection points attached */ +extern InjectionPointData *InjectionPointList(uint32 *num_points); + #ifdef EXEC_BACKEND extern PGDLLIMPORT struct InjectionPointsCtl *ActiveInjectionPoints; #endif diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile index b362ae437710..93703633f69a 100644 --- a/src/backend/utils/misc/Makefile +++ b/src/backend/utils/misc/Makefile @@ -22,6 +22,7 @@ OBJS = \ guc_tables.o \ help_config.o \ injection_point.o \ + injection_point_funcs.o \ pg_config.o \ pg_controldata.o \ pg_rusage.o \ diff --git a/src/backend/utils/misc/injection_point.c b/src/backend/utils/misc/injection_point.c index 97ab851f0a63..bf1a49f7472a 100644 --- a/src/backend/utils/misc/injection_point.c +++ b/src/backend/utils/misc/injection_point.c @@ -584,3 +584,53 @@ IsInjectionPointAttached(const char *name) return false;/* silence compiler */ #endif } + +/* + * Retrieve a list of all the injection points currently attached. + * + * This list is palloc'd in the current memory context. + */ +InjectionPointData * +InjectionPointList(uint32 *num_points) +{ +#ifdef USE_INJECTION_POINTS + InjectionPointData *re
Re: Changing shared_buffers without restart
On Fri, Apr 11, 2025 at 8:31 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > > I think a relatively elegant solution is to extend ProcSignalBarrier > > > mechanism to track not only pss_barrierGeneration, as a sign that > > > everything was processed, but also something like > > > pss_barrierReceivedGeneration, indicating that the message was received > > > everywhere but not processed yet. That would be enough to allow > > > processes to wait until the resize message was received everywhere, then > > > use a global Barrier to wait until all processes are finished. It's > > > somehow similar to your proposal to use two signals, but has less > > > implementation overhead. > > > > The way it's implemented in v4 still has the disjoint group problem. > > Assume backends p1, p2, p3. All three of them are executing > > ProcessProcSignalBarrier(). All three of them updated > > pss_barrierReceivedGeneration > > > > /* The message is observed, record that */ > > pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration, > > shared_gen); > > > > p1, p2 moved faster and reached following code from > > ProcessBarrierShmemResize() > > if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED) > > > > WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation)); > > > > Since all the processes have received the barrier message, p1, p2 move > > ahead and go through all the next phases and finish resizing even > > before p3 gets a chance to call ProcessBarrierShmemResize() and attach > > itself to Barrier. This could happen because it processed some other > > ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has > > not attached itself to the barrier. Once p1, p2 finish, p3 will attach > > itself to the barrier and resize buffers again - reinitializing the > > shared memory, which might has been already modified by p1 or p2. Boom > > - there's memory corruption. > > It won't reinitialize anything, since this logic is controlled by the > ShmemCtrl->NSharedBuffers, if it's already updated nothing will be > changed. Ah, I see it now if(pg_atomic_read_u32(&ShmemCtrl->NSharedBuffers) != NBuffers) { Thanks for the clarification. However, when we put back the patches to shrink buffers, we will evict the extra buffers, and shrink - if all the processes haven't participated in the barrier by then, some of them may try to access those buffers - re-installing them and then bad things can happen. > > About the race condition you mention, there is indeed a window between > receiving the ProcSignalBarrier and attaching to the global Barrier in > resize, but I don't think any process will be able to touch buffer pool > while inside this window. Even if it happens that the remapping itself > was blazing fast that this window was enough to make one process late > (e.g. if it was busy handling some other signal as you mention), as I've > showed above it shouldn't be a problem. > > I can experiment with this case though, maybe there is a way to > completely close this window to not thing about even potential > scenarios. The window may be small today but we have to make this future proof. Multiple ProcSignalBarrier messages may be processed in a single call to ProcessProcSignalBarrier() and if each of those takes as long as buffer resizing, the window will get bigger and bigger. So we have to close this window. > > > > * Shared memory address space is now reserved for future usage, making > > > shared memory segments clash (e.g. due to memory allocation) > > > impossible. There is a new GUC to control how much space to reserve, > > > which is called max_available_memory -- on the assumption that most of > > > the time it would make sense to set its value to the total amount of > > > memory on the machine. I'm open for suggestions regarding the name. > > > > With 0006 applied > > + /* Clean up some reserved space to resize into */ > > + if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1) > > ze, m->shmem))); > > ... snip ... > > + ptr = mremap(m->shmem, m->shmem_size, new_size, 0); > > > > We unmap the portion of reserved address space where the existing > > segment would expand into. As long as we are just expanding this will > > work. I am wondering how would this work for shrinking buffers? What > > scheme do you have in mind? > > I didn't like this part originally, and after changes to support hugetlb > I think it's worth it just to replace mremap with munmap/mmap. That way > there will be no such question, e.g. if a segment is getting shrinked > the unmapped area will again become a part of the reserved space. > I might have not noticed it, but are we putting two mappings one reserved and one allocated in the same address space, so that when the allocated mapping shrinks or expands, the reserved mapping continues to prohibit any other mapping from appearing there? I looked at some of the previous emails, but didn't find anything that describes how the res
Re: Logical Replication of sequences
Hi Vignesh, FYI, the patch v20250325-0004 failed to apply (atop 0001,0002,0002) due to recent master changes. Checking patch src/backend/commands/sequence.c... error: while searching for: (long long) minv, (long long) maxv))); /* Set the currval() state only if iscalled = true */ if (iscalled) { elm->last = next; /* last returned number */ elm->last_valid = true; error: patch failed: src/backend/commands/sequence.c:994 error: src/backend/commands/sequence.c: patch does not apply == Kind Regards, Peter Smith. Fujitsu Australia
Re: New committer: Jacob Champion
Congrats Jacob. On Mon, Apr 14, 2025 at 8:32 AM Ashutosh Bapat wrote: > Hearty congratulations Jacob. > > On Mon, Apr 14, 2025 at 6:55 AM Richard Guo > wrote: > > > > On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz > wrote: > > > The Core Team would like to extend our congratulations to Jacob > > > Champion, who has accepted an invitation to become our newest > PostgreSQL > > > committer. > > > > > > Please join us in wishing Jacob much success and few reverts! > > > > Congratulations Jacob! Well deserved! > > > > Thanks > > Richard > > > > > > > -- > Best Wishes, > Ashutosh Bapat > > >
Re: New committer: Jacob Champion
Many Congratulations, Jacob. On Mon, Apr 14, 2025 at 11:00 AM Kashif Zeeshan wrote: > > Congrats Jacob. > > On Mon, Apr 14, 2025 at 8:32 AM Ashutosh Bapat > wrote: >> >> Hearty congratulations Jacob. >> >> On Mon, Apr 14, 2025 at 6:55 AM Richard Guo wrote: >> > >> > On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz >> > wrote: >> > > The Core Team would like to extend our congratulations to Jacob >> > > Champion, who has accepted an invitation to become our newest PostgreSQL >> > > committer. >> > > >> > > Please join us in wishing Jacob much success and few reverts! >> > >> > Congratulations Jacob! Well deserved! >> > >> > Thanks >> > Richard >> > >> > >> >> >> -- >> Best Wishes, >> Ashutosh Bapat >> >>
Fix bug with accessing to temporary tables of other sessions
Hi, During previous commitfest this topic already has been discussed within the "Forbid to DROP temp tables of other sessions" thread [1]. Unfortunately its name doesn't reflect the real problem, so I decided to start a new thread, as David G. Johnston advised. Here are the summary results of the discussion [1] : The superuser is only allowed to DROP temporary relations of other sessions. Other commands (like SELECT, INSERT, UPDATE, DELETE ...) must be forbidden to him. Error message for this case will look like this : `could not access temporary relations of other sessions`. For now, superuser still can specify such operations because of a bug in the code that mistakenly recognizes other session's temp table as permanent (I've covered this topic in more detail in [2]). Attached patch fixes this bug (targeted on b51f86e49a7f119004c0ce5d0be89cdf98309141). Opened issue: Not everyone liked the idea that table's persistence can be assigned to table during makeRangeVarXXX calls (inside gram.y). My opinion - `As far as "pg_temp_" prefix is reserved by the postgres kernel, we can definitely say that we have encountered a temporary table when we see this prefix.` I will be glad to hear your opinion. -- Best regards, Daniil Davydov [1] https://www.postgresql.org/message-id/CAJDiXgj72Axj0d4ojKdRWG_rnkfs4uWY414NL%3D15sCvh7-9rwg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAJDiXgj%2B5UKLWSUT5605rJhuw438NmEKecvhFAF2nnrMsgGK3w%40mail.gmail.com From c1415e457edd8e1cf38fd78ac55c93dbd8617d55 Mon Sep 17 00:00:00 2001 From: Daniil Davidov Date: Mon, 14 Apr 2025 11:01:56 +0700 Subject: [PATCH v5] Fix accessing other sessions temp tables --- src/backend/catalog/namespace.c | 56 src/backend/commands/tablecmds.c | 3 +- src/backend/nodes/makefuncs.c| 6 +++- src/backend/parser/gram.y| 11 ++- src/include/catalog/namespace.h | 2 ++ 5 files changed, 55 insertions(+), 23 deletions(-) diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index d97d632a7ef..f407efd9447 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -499,28 +499,44 @@ RangeVarGetRelidExtended(const RangeVar *relation, LOCKMODE lockmode, */ if (relation->relpersistence == RELPERSISTENCE_TEMP) { - if (!OidIsValid(myTempNamespace)) -relId = InvalidOid; /* this probably can't happen? */ - else - { -if (relation->schemaname) -{ - Oid namespaceId; + Oid namespaceId; - namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok); + if (relation->schemaname) + { +namespaceId = LookupExplicitNamespace(relation->schemaname, missing_ok); +/* + * If the user has specified an existing temporary schema + * owned by another user. + */ +if (OidIsValid(namespaceId) && namespaceId != myTempNamespace) +{ /* - * For missing_ok, allow a non-existent schema name to - * return InvalidOid. + * We don't allow users to access temp tables of other + * sessions except for the case of dropping tables. */ - if (namespaceId != myTempNamespace) + if (!(flags & RVR_OTHER_TEMP_OK)) ereport(ERROR, -(errcode(ERRCODE_INVALID_TABLE_DEFINITION), - errmsg("temporary tables cannot specify a schema name"))); +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("could not access temporary relations of other sessions"))); } + } + else + { +namespaceId = myTempNamespace; -relId = get_relname_relid(relation->relname, myTempNamespace); +/* + * If this table was recognized as temporary, it means that we + * found it because backend's temporary namespace was specified + * in search_path. Thus, MyTempNamespace must contain valid oid. + */ +Assert(OidIsValid(namespaceId)); } + + if (missing_ok && !OidIsValid(namespaceId)) +relId = InvalidOid; + else +relId = get_relname_relid(relation->relname, namespaceId); } else if (relation->schemaname) { @@ -3553,21 +3569,19 @@ get_namespace_oid(const char *nspname, bool missing_ok) RangeVar * makeRangeVarFromNameList(const List *names) { - RangeVar *rel = makeRangeVar(NULL, NULL, -1); + RangeVar *rel; switch (list_length(names)) { case 1: - rel->relname = strVal(linitial(names)); + rel = makeRangeVar(NULL, strVal(linitial(names)), -1); break; case 2: - rel->schemaname = strVal(linitial(names)); - rel->relname = strVal(lsecond(names)); + rel = makeRangeVar(strVal(linitial(names)), strVal(lsecond(names)), -1); break; case 3: + rel = makeRangeVar(strVal(lsecond(names)), strVal(lthird(names)), -1); rel->catalogname = strVal(linitial(names)); - rel->schemaname = strVal(lsecond(names)); - rel->relname = strVal(lthird(names)); break; default: ereport(ERROR, @@ -3774,6 +3788,8 @@ GetTempNamespaceProcNumber(Oid namespaceId) return INVALID_PROC_NUMBER
Re: New committer: Jacob Champion
On Sat, 12 Apr 2025 at 01:56, Jonathan S. Katz wrote: > > The Core Team would like to extend our congratulations to Jacob > Champion, who has accepted an invitation to become our newest PostgreSQL > committer. > > Please join us in wishing Jacob much success and few reverts! Many Congratulations, Jacob Regards, Vignesh
Re: Silence resource leaks alerts
Em sex., 11 de abr. de 2025 às 08:27, Ranier Vilela escreveu: > Thanks Michael, for looking at this. > > > Em sex., 11 de abr. de 2025 às 02:09, Michael Paquier > escreveu: > >> On Thu, Apr 10, 2025 at 03:10:02PM -0300, Ranier Vilela wrote: >> > While it is arguable that this is a false warning, there is a benefit in >> > moving the initialization of the string buffer, silencing the warnings >> that >> > are presented in this case. >> > >> > 1. pg_overexplain.c >> > 2. ruleutils.c >> >> These code paths are far from being critical and the two ones in >> ruleutils.c are older, even if it is a practice that had better be >> discouraged particularly as initStringInfo() can allocate some memory >> for nothing. So it could bloat the current memory context if these >> code paths are repeatedly taken. >> > Yeah, it's a bit annoying to do unnecessary work. > Plus a small gain, by delaying memory allocation until when it is actually > needed. > Attached a new example of moving stringinfo creation, per Coverity. best regards, Ranier Vilela postpone_string_buffer_creation_dblink.patch Description: Binary data
Re: Logical Replication of sequences
Hi Vignesh, Here are some review comments for patch v20250325-0001. == src/test/regress/expected/sequence.out 1. +SELECT last_value, log_cnt, is_called FROM pg_sequence_state('sequence_test'); + last_value | log_cnt | is_called ++-+--- + 99 | 32 | t +(1 row) + I think 32 may seem like a surprising value to anybody reading these results. Perhaps it will help if there can be a comment for this .sql test to explain why this is the expected value. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Horribly slow pg_upgrade performance with many Large Objects
And in case there *is* ACL present then each user mentioned in the ACL adds more overhead Also the separate GRANT calls cause bloat as the pg_largeoject_metadata row gets updated for each ALTER USER or GRANT The following is for 10 million LOs with 1 and 3 users being GRANTed SELECT on each object (with no grants the pg_restore run was 10 minutes) Nr of GRANTS | pg_dump time | pg_restore time --+--+ 0 |0m 10s| 10m 5s 1 |0m 17s| 15m 3s 3 |0m 21s| 27m 15s NB! - I left out the --verbose flag from pg_dump as used by pg_upgrade, as it will emit one line per LO dumped ## 1 GRANT / LO hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file=lodb10m.dump -p 5433 lodb10m real0m17.022s user0m2.956s sys 0m1.453s hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m.dump real15m3.136s user0m28.991s sys 2m54.164s ## 3 GRANTs / LO make sample LO with 3 grants ALTER LARGE OBJECT 1 OWNER TO "hannuk"; GRANT SELECT ON LARGE OBJECT 1 TO "bob"; GRANT SELECT ON LARGE OBJECT 1 TO "joe"; GRANT SELECT ON LARGE OBJECT 1 TO "tom"; lodb10m=# select * from pg_shdepend where objid = 1; ┌───┬─┬───┬──┬┬──┬─┐ │ dbid │ classid │ objid │ objsubid │ refclassid │ refobjid │ deptype │ ├───┼─┼───┼──┼┼──┼─┤ │ 16406 │2613 │ 1 │0 │ 1260 │16384 │ o │ │ 16406 │2613 │ 1 │0 │ 1260 │16393 │ a │ │ 16406 │2613 │ 1 │0 │ 1260 │16394 │ a │ │ 16406 │2613 │ 1 │0 │ 1260 │16395 │ a │ └───┴─┴───┴──┴┴──┴─┘ lodb10m=# select * from pg_largeobject_metadata ; ┌─┬──┬───┐ │ oid │ lomowner │ lomacl │ ├─┼──┼───┤ │ 1 │16384 │ {hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk} │ └─┴──┴───┘ Make the remaining 10M-1 LOs lodb10m=# insert into pg_largeobject_metadata(oid, lomowner, lomacl) SELECT i, 16384, '{hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk}' FROM generate_series(2, 10_000_000) g(i); INSERT 0 999 Time: 18859.341 ms (00:18.859) And add their sharedeps lodb10m=# WITH refdeps (robj, rdeptype) AS ( VALUES (16384, 'o'), (16393, 'a'), (16394, 'a'), (16395, 'a') ) INSERT INTO pg_shdepend SELECT 16396, 2613, i, 0, 1260, robj, rdeptype FROM generate_series(2, 10_000_000) g(i) , refdeps ; INSERT 0 3996 Time: 116697.342 ms (01:56.697) Time pg_upgrade's pg_dump and pg_reload hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file=lodb10m-3grants.dump -p 5433 lodb10m real0m21.519s user0m2.951s sys 0m1.723s hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m-3grants.dump real27m15.372s user0m45.157s sys 4m57.513s On Fri, Apr 11, 2025 at 10:11 PM Nathan Bossart wrote: > > On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote: > > On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote: > >> Nathan Bossart writes: > >>> I do think it's worth considering going back to copying > >>> pg_largobject_metadata's files for upgrades from v16 and newer. > >> > >> (If we do this) I don't see why we'd need to stop at v16. I'm > >> envisioning that we'd use COPY, which will be dealing in the > >> text representation of aclitems, and I don't think that's changed > >> in a long time. The sort of thing that would break it is changes > >> in the set of available/default privilege bits for large objects. > > > > I was thinking of actually reverting commit 12a53c7 for upgrades from v16, > > which AFAICT is the last release where any relevant storage formats changed > > (aclitem changed in v16). But if COPY gets us pretty close to that and is > > less likely to be disrupted by future changes, it could be a better > > long-term approach. > > > >> That is, where the dump currently contains something like > >> > >> SELECT pg_catalog.lo_create('2121'); > >> ALTER LARGE OBJECT 2121 OWNER TO postgres; > >> GRANT ALL ON LARGE OBJECT 2121 TO joe; > >> > >> we'd have > >> > >> COPY pg_largeobject_metadata FROM STDIN; > >> ... > >> 2121 10 {postgres=rw/postgres,joe=rw/postgres} > >> ... > >> > >> and some appropriate COPY data for pg_shdepend too. > > I did some more research here. For many large objects without ACLs to > dump, I noticed that the vast majority o
Performance issues with v18 SQL-language-function changes
In the thread that led up to commit 0dca5d68d [1], we'd convinced ourselves that the new implementation was faster than the old. So I was sad to discover that there are common cases where it's a good bit slower. We'd focused too much on test methods like do $$ begin for i in 1..1000 loop perform fx((random()*100)::int); end loop; end; $$; The thing about this test case is that the SQL function under test is executed only once per calling query (i.e., per PERFORM). That doesn't allow the old implementation to amortize anything. If you instead test cases like create function fx(p_summa bigint) returns text immutable strict return ltrim(to_char(p_summa, '999 999 999 999 999 999 999 999')); explain analyze select fx(i) from generate_series(1,100) as i(i); you arrive at the rude discovery that 0dca5d68d is about 50% slower than 0dca5d68d^, because the old implementation builds a plan for fx() only once and then re-uses it throughout the query. So does the new implementation, but it has added GetCachedPlan overhead. Moreover, I made the unforced error of deciding that we could tear down and rebuild the SQLFunctionCache and subsidiary data structures for each call. That overhead is relatively minor in comparison to the cost of parsing and planning the function; but when comparing cases where there's no repeated parsing and planning, it becomes significant. Hence, the attached patch series reverts that decision and goes back to the old method of having the SQLFunctionCache and some associated objects live as long as the FmgrInfo does (without, however, the poorly-controlled memory leaks of the old code). In my testing this gets us from a 50% penalty down to about 5%, which I think is acceptable given the other benefits 0dca5d68d brought us. I'm inclined to argue that, seeing that 0dca5d68d was mainly intended as a performance feature, this performance loss is a bug that we need to do something about even though we're post-feature-freeze. We could either revert 0dca5d68d or apply the attached. I'd prefer the latter. (There are other things we could do to try to reduce the overhead further, such as trying to not build a Tuplestore or JunkFilter in simple cases. But that seems like new work not a fix for a bad decision in existing work, so I think it's out of scope for v18.) Comments? regards, tom lane [1] https://www.postgresql.org/message-id/flat/8216639.NyiUUSuA9g%40aivenlaptop From c87a37d00b847e86e5286c9c46668410019b0043 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 13 Apr 2025 13:37:46 -0400 Subject: [PATCH v1 1/5] Minor performance improvement for SQL-language functions. Late in the development of commit 0dca5d68d, I added a step to copy the result tlist we extract from the cached final query, because I was afraid that that might not last as long as the JunkFilter that we're passing it off to. However, that turns out to cost a noticeable number of cycles, and it's really quite unnecessary because the JunkFilter will not examine that tlist after it's been created. (ExecFindJunkAttribute would use it, but we don't use that function on this JunkFilter.) Hence, remove the copy step. For safety, reset the might-become-dangling jf_targetList pointer to NIL. In passing, remove DR_sqlfunction.cxt, which we don't use anymore; it's confusing because it's not entirely clear which context it ought to point at. --- src/backend/executor/functions.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 53ff614d87b..d3f05c7d2c7 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -45,7 +45,6 @@ typedef struct { DestReceiver pub; /* publicly-known function pointers */ Tuplestorestate *tstore; /* where to put result tuples */ - MemoryContext cxt; /* context containing tstore */ JunkFilter *filter; /* filter to convert tuple type */ } DR_sqlfunction; @@ -787,12 +786,6 @@ init_execution_state(SQLFunctionCachePtr fcache) */ resulttlist = get_sql_fn_result_tlist(plansource->query_list); - /* - * We need to make a copy to ensure that it doesn't disappear - * underneath us due to plancache invalidation. - */ - resulttlist = copyObject(resulttlist); - /* * If the result is composite, *and* we are returning the whole tuple * result, we need to insert nulls for any dropped columns. In the @@ -807,6 +800,17 @@ init_execution_state(SQLFunctionCachePtr fcache) slot); else fcache->junkFilter = ExecInitJunkFilter(resulttlist, slot); + + /* + * The resulttlist tree belongs to the plancache and might disappear + * underneath us due to plancache invalidation. While we could + * forestall that by copying it, that'd just be a waste of cycles, + * because the junkfilter doesn't need it anymore. (It'd only be used + * by ExecFindJunkAttribute(), which we
Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)
On 2025-04-13 Su 1:51 PM, Andres Freund wrote: Hi, On April 13, 2025 7:27:33 PM GMT+02:00, Wolfgang Walther wrote: Andrew Dunstan: On 2025-04-12 Sa 10:10 PM, Noah Misch wrote: On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote: With injection points enabled, I get the following errors in test_aio: [15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported: expected stderr [15:14:45.409](0.000s) [15:14:45.409](0.000s) # Failed test 'worker: first hard IO error is reported: expected stderr' # at t/001_aio.pl line 810. [15:14:45.409](0.000s) # 'psql::88: ERROR: could not read blocks 2..2 in file "base/5/16408": I/O error' # doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file \"base/.*\": Input/output error)' It seems like it's just the error message that is different and has "I/O" instead of "Input/output"? Looks like it. On a more general note, does enabling injection points make any sense here? Yes, it does. I see that coverage in the build farm is not very big. IIUC, those are a development tool, so might not be relevant, because nobody is developing on Alpine / musl? No, whether anyone develops on the platform is not a factor. One hasn't fully tested PostgreSQL until one builds with injection points. Here's a simple fix ... also removes some unnecessary escaping and leaning toothpick syndrome. Confirmed - this works! Thanks for testing and writing up a fix. Andrew, would you be ok applying it? I've been traveling the last 24h and should probably not handing sharp commit bits tonight. I'm not too surprised about failures like this, when writing the tests up I was worried about different formulations. But after seeing freebsd, glibc Linux, netbsd, openbsd windows all working the same I thought we were in the clear. pushed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
pg_dump --if-exists --clean when drop index that is partition of a partitioned index
hi. CREATE TABLE tp(c int, a int, b int) PARTITION BY RANGE (b); CREATE TABLE tp_1(c int, a int, b int); ALTER TABLE tp ATTACH PARTITION tp_1 FOR VALUES FROM (0) TO (1); CREATE INDEX t_a_idx ON tp_1(a); CREATE INDEX tp_a_idx ON tp(a); pg_dump --schema=public --if-exists --clean --no-statistics --no-owner --no-data --table-and-children=tp > 1.sql pg_dump output file 1.sql excerpts: DROP INDEX IF EXISTS public.t_a_idx; DROP INDEX IF EXISTS public.tp_a_idx; DROP TABLE IF EXISTS public.tp_1; DROP TABLE IF EXISTS public.tp; if you execute the DROP INDEX IF EXISTS public.t_a_idx; ERROR: cannot drop index t_a_idx because index tp_a_idx requires it HINT: You can drop index tp_a_idx instead. Is this pg_dump output what we expected? SELECT * FROM pg_partition_tree('tp_a_idx'); relid | parentrelid | isleaf | level --+-++--- tp_a_idx | | f | 0 t_a_idx | tp_a_idx| t | 1 (2 rows)
Re: datfrozenxid > relfrozenxid w/ crash before XLOG_HEAP_INPLACE
On Sun, Apr 06, 2025 at 11:00:54AM -0700, Noah Misch wrote: > I pushed that as commit 8e7e672 (2024-10-25). I now think DELAY_CHKPT_START > is superfluous here, per this proc.h comment: > > * (In the > * extremely common case where the data being modified is in shared buffers > * and we acquire an exclusive content lock on the relevant buffers before > * writing WAL, this mechanism is not needed, because phase 2 will block > * until we release the content lock and then flush the modified data to > * disk.) > > heap_inplace_update_and_unlock() meets those conditions. Its closest > precedent is XLogSaveBufferForHint(), which does need DELAY_CHKPT_START due to > having only BUFFER_LOCK_SHARED. True so far, but ... > heap_inplace_update_and_unlock() has > BUFFER_LOCK_EXCLUSIVE, so it doesn't need DELAY_CHKPT_START. ... not so, because we've not yet done MarkBufferDirty(). transam/README cites SyncOneBuffer(), which says: * Check whether buffer needs writing. * * We can make this check without taking the buffer content lock so long * as we mark pages dirty in access methods *before* logging changes with * XLogInsert(): if someone marks the buffer dirty just after our check we * don't worry because our checkpoint.redo points before log record for * upcoming changes and so we are not required to write such dirty buffer. The attached patch updates the aforementioned proc.h comment and the heap_inplace_update_and_unlock() comment that my last message proposed. From: Noah Misch Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START. Blocking checkpoint phase 2 requires MarkBufferDirty() and BUFFER_LOCK_EXCLUSIVE; neither suffices by itself. transam/README documents this, citing SyncOneBuffer(). Update the DELAY_CHKPT_START documentation to say this. Expand the heap_inplace_update_and_unlock() comment that cites XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock() could have opted not to use DELAY_CHKPT_START. Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to heap_inplace_update_and_unlock(). Since commit bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches, no back-patch. Discussion: https://postgr.es/m/20250406180054.26.nmi...@google.com diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index ed2e302..c1a4de1 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6507,9 +6507,17 @@ heap_inplace_update_and_unlock(Relation relation, * [crash] * [recovery restores datfrozenxid w/o relfrozenxid] * -* Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(), copy -* the buffer to the stack before logging. Here, that facilitates a FPI -* of the post-mutation block before we accept other sessions seeing it. +* Mimic MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(). +* Specifically, use DELAY_CHKPT_START, and copy the buffer to the stack. +* The stack copy facilitates a FPI of the post-mutation block before we +* accept other sessions seeing it. DELAY_CHKPT_START allows us to +* XLogInsert() before MarkBufferDirty(). Since XLogSaveBufferForHint() +* can operate under BUFFER_LOCK_SHARED, it can't avoid DELAY_CHKPT_START. +* This function, however, likely could avoid it with the following order +* of operations: MarkBufferDirty(), XLogInsert(), memcpy(). Opt to use +* DELAY_CHKPT_START here, too, as a way to have fewer distinct code +* patterns to analyze. Inplace update isn't so frequent that it should +* pursue the small optimization of skipping DELAY_CHKPT_START. */ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0); START_CRIT_SECTION(); diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index f51b03d..86c5f99 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -110,10 +110,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend; * is inserted prior to the new redo point, the corresponding data changes will * also be flushed to disk before the checkpoint can complete. (In the * extremely common case where the data being modified is in shared buffers - * and we acquire an exclusive content lock on the relevant buffers before - * writing WAL, this mechanism is not needed, because phase 2 will block - * until we release the content lock and then flush the modified data to - * disk.) + * and we acquire an exclusive content lock and MarkBufferDirty() on the + * relevant buffers before writing WAL, this mechanism is not needed, because + * phase 2 will block until we release the content lock and then flush the + * modified data to disk. See transam/README and SyncOneBuffer().) * * Setting DELAY_CHKPT_COMPLETE prevents the system from moving fr
Re: Fix replica identity checks for MERGE command on published table.
On Fri, Apr 11, 2025 at 10:54 AM Zhijie Hou (Fujitsu) wrote: > > Hi, > > While testing publication DDLs, I noticed an unexpected behavior where the > MERGE command can be executed on tables lacking replica identity keys, > regardless of whether they are part of a publication that publishes updates > and > deletes. > > Replication and application of the updates and deletes generated by MERGE > command require replica identity keys in the WAL record, which are essential > for the apply worker on the subscriber to find local tuples for updating or > deletion. Furthermore, publications require specific columns to be part of the > replica identity key if the table specifies a publication row filter or column > list. > > We already have restrictions on executing UPDATE and DELETE commands for > tables > without replica identity keys under similar conditions. So, I think the same > restriction should apply to the MERGE command as well. > +-- fail - missing REPLICA IDENTITY +MERGE INTO testpub_merge_no_ri USING testpub_merge_pk s ON s.a >= 1 + WHEN MATCHED THEN UPDATE SET b = s.b; +ERROR: cannot update table "testpub_merge_no_ri" because it does not have a replica identity and publishes updates +HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. +-- fail - missing REPLICA IDENTITY +MERGE INTO testpub_merge_no_ri USING testpub_merge_pk s ON s.a >= 1 + WHEN MATCHED THEN DELETE; +ERROR: cannot delete from table "testpub_merge_no_ri" because it does not have a replica identity and publishes deletes +HINT: To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE. I was wondering whether we should mention MERGE somewhere in the error message like "cannot merge into table ...". But the error message is reported depending upon the actual operation being performed and whether it's being published by the publication, so mentioning specific operations is better than mentioning just MERGE. So I think the current error message is ok; and it will help point out the operations that caused it. But that opens up another question: some merge operations (on the same table) will go through and some won't if the publication publishes only some of the operations. I am wondering, albeit quite late after the feature has sailed, whether MERGE should be considered a separate operation as far as publication is concerned. This topic may have been discussed either when MERGE was implemented or when operation filters were implemented. Sorry for the noise in that case. This isn't a detailed review. -- Best Wishes, Ashutosh Bapat
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Wed, Feb 05, 2025 at 09:19:22AM +0900, Michael Paquier wrote: > On Mon, Feb 03, 2025 at 09:30:33PM -0800, Jeff Davis wrote: >> One extra benefit of supporting arguments is that it would be a more >> flexible way to change the local state around the injection point. >> Right now the only way is by using IS_INJECTION_POINT_ATTACHED(), which >> doesn't permit callback-defined conditions, etc. > > Agreed. The line I'm drawing here (mentioned upthread as well) is > that any changes done in the core backend for injection_point.c should > have one or more use cases in the tree. As a matter of fact, this argument is now void on HEAD as of 93bc3d75d8e1 (addition of test_aio) and more specifically da7226993fd4 (core AIO infra). See pgaio_inj_io_get() that is used in a injection point callback to retrieve a specific value that had better be given as an argument of the callback at runtime. There is also pgaio_inj_cur_handle at the top of aio.c with a comment on the matter. So the AIO testing is using a set of hacks to go through the current limitations. IMO, we ought to clean up the way the AIO code does its tests with injection point with something like the patch of this thread. And perhaps even do it in v18 to have the code in a cleaner state at release. I'll start a new thread after hacking my way through that. The core injection point patch still needs a bit of work compared to what was sent previously. -- Michael signature.asc Description: PGP signature
Rename injection point names in test_aio
Hi all, (well, Andres) 93bc3d75d8e1 has introduced a couple of new injection points, but these don't follow the convention in-place where points are named more-or-less-like-that. Please find attached a patch to make all these more consistent. Thoughts or comments? -- Michael diff --git a/src/backend/storage/aio/aio.c b/src/backend/storage/aio/aio.c index 86f7250b7a5f..64e8e81e1849 100644 --- a/src/backend/storage/aio/aio.c +++ b/src/backend/storage/aio/aio.c @@ -507,7 +507,7 @@ pgaio_io_process_completion(PgAioHandle *ioh, int result) pgaio_io_update_state(ioh, PGAIO_HS_COMPLETED_IO); - pgaio_io_call_inj(ioh, "AIO_PROCESS_COMPLETION_BEFORE_SHARED"); + pgaio_io_call_inj(ioh, "aio-process-completion-before-shared"); pgaio_io_call_complete_shared(ioh); diff --git a/src/backend/storage/aio/method_worker.c b/src/backend/storage/aio/method_worker.c index 8ad17ec1ef73..0fde2a5b30da 100644 --- a/src/backend/storage/aio/method_worker.c +++ b/src/backend/storage/aio/method_worker.c @@ -525,7 +525,7 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len) * To be able to exercise the reopen-fails path, allow injection * points to trigger a failure at this point. */ - pgaio_io_call_inj(ioh, "AIO_WORKER_AFTER_REOPEN"); + pgaio_io_call_inj(ioh, "aio-worker-after-reopen"); error_errno = 0; error_ioh = NULL; diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c index 1d776010ef41..7745244b0e24 100644 --- a/src/test/modules/test_aio/test_aio.c +++ b/src/test/modules/test_aio/test_aio.c @@ -86,19 +86,19 @@ test_aio_shmem_startup(void) inj_io_error_state->enabled_reopen = false; #ifdef USE_INJECTION_POINTS - InjectionPointAttach("AIO_PROCESS_COMPLETION_BEFORE_SHARED", + InjectionPointAttach("aio-process-completion-before-shared", "test_aio", "inj_io_short_read", NULL, 0); - InjectionPointLoad("AIO_PROCESS_COMPLETION_BEFORE_SHARED"); + InjectionPointLoad("aio-process-completion-before-shared"); - InjectionPointAttach("AIO_WORKER_AFTER_REOPEN", + InjectionPointAttach("aio-worker-after-reopen", "test_aio", "inj_io_reopen", NULL, 0); - InjectionPointLoad("AIO_WORKER_AFTER_REOPEN"); + InjectionPointLoad("aio-worker-after-reopen"); #endif } @@ -109,8 +109,8 @@ test_aio_shmem_startup(void) * critical section. */ #ifdef USE_INJECTION_POINTS - InjectionPointLoad("AIO_PROCESS_COMPLETION_BEFORE_SHARED"); - InjectionPointLoad("AIO_WORKER_AFTER_REOPEN"); + InjectionPointLoad("aio-process-completion-before-shared"); + InjectionPointLoad("aio-worker-after-reopen"); elog(LOG, "injection point loaded"); #endif } signature.asc Description: PGP signature
Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
On Fri Apr 4, 2025 at 7:34 PM CEST, Heikki Linnakangas wrote: Let's move that 'in_restore_command' business to the caller. It's weird modularity for the function to implicitly behave differently for some callers. I definitely agree with the sentiment, and it was what I originally planned to do. But then I went for this approach anyway because commit 8fb13dd6ab5b explicitely moved all code except for the actual call to system() out of the PreRestoreCommand()/PostRestoreCommand() section (which is also described in the code comment). So I didn't move the the in_restore_command stuff to the caller, and improved the function comment to call out this unfortunate coupling. And 'wait_event_info' should only affect pgstat reporting, not actual behavior. Given that we need to keep the restore command stuff in this function, I think the only other option is to add a dedicated argument for the restore command stuff, like "bool is_restore_command". But that would require every caller, except for the restore command, to pass an additional "false" as an argument. To me the additionaly noise that that adds seems like a worse issue than the non-purity we get by piggy-backing on the wait_event_info argument. I don't feel good about the function name. How about pg_system() or something? Done postmaster/startup.c also seems like a weird place for it; not sure where it belongs but probably not there. Maybe next to OpenPipeStream() in fd.c, or move both to a new file. Moved it to fd.c Looks a bit funny that both functions are called Restore(). Not sure what to suggest instead though. Maybe RaiseOpenFileLimit() and LowerOpenFileLimit(). Changed them to UseOriginalOpenFileLimit() and UseOriginalOpenFileLimit() What would it take to re-implement popen() with fork+exec? Genuine question; I have a feeling it might be complicated to do correctly on all platforms, but I don't know. I initially attempted to re-implement it, but after looking at the fairly complex FreeBSD implementation of popen[1] that suddenly seemed more trouble than it's worth. [1]: https://github.com/freebsd/freebsd-src/blob/c98367641991019bac0e8cd55b70682171820534/lib/libc/gen/popen.c#L63-L181 From 5dbab2ccf0d74313dbc2cbaeb418346de8cc2f48 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Sun, 23 Feb 2025 16:52:29 +0100 Subject: [PATCH v8 1/3] Adds a helper for places that shell out to system() We need to call system() in a few places, and to do so safely we need some pre/post-fork logic. This encapsulates that logic into a single System helper function. The main reason to do this is that in a follow up commit we want to add even more logic pre and post-fork. --- src/backend/access/transam/xlogarchive.c | 29 ++-- src/backend/archive/shell_archive.c | 5 +-- src/backend/postmaster/startup.c | 1 + src/backend/storage/file/fd.c| 42 src/include/storage/fd.h | 1 + 5 files changed, 48 insertions(+), 30 deletions(-) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 1ef1713c91a..c7640ec5025 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -158,27 +158,8 @@ RestoreArchivedFile(char *path, const char *xlogfname, (errmsg_internal("executing restore command \"%s\"", xlogRestoreCmd))); - fflush(NULL); - pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND); - - /* - * PreRestoreCommand() informs the SIGTERM handler for the startup process - * that it should proc_exit() right away. This is done for the duration - * of the system() call because there isn't a good way to break out while - * it is executing. Since we might call proc_exit() in a signal handler, - * it is best to put any additional logic before or after the - * PreRestoreCommand()/PostRestoreCommand() section. - */ - PreRestoreCommand(); - - /* - * Copy xlog from archival storage to XLOGDIR - */ - rc = system(xlogRestoreCmd); - - PostRestoreCommand(); - - pgstat_report_wait_end(); + /* Copy xlog from archival storage to XLOGDIR */ + rc = pg_system(xlogRestoreCmd, WAIT_EVENT_RESTORE_COMMAND); pfree(xlogRestoreCmd); if (rc == 0) @@ -325,11 +306,7 @@ ExecuteRecoveryCommand(const char *command, const char *commandName, /* * execute the constructed command */ - fflush(NULL); - pgstat_report_wait_start(wait_event_info); - rc = system(xlogRecoveryCmd); - pgstat_report_wait_end(); - + rc = pg_system(xlogRecoveryCmd, wait_event_info); pfree(xlogRecoveryCmd); if (rc != 0) diff --git a/src/backend/archive/shell_archive.c b/src/backend/archive/shell_archive.c index 828723afe47..64c2c6aa760 100644 --- a/src/backend/archive/shell_archive.c +++ b/src/backend/archive/shell_archive.c @@ -75,10 +75,7 @@ shell_archive_file(ArchiveModuleState *state, const char *file, (errmsg_internal("executing archive command \"%s\"", xlogarchcmd))); - fflu
Fix a resource leak (src/backend/utils/adt/rowtypes.c)
Hi. Per Coverity. CID 1608916: (#1 of 1): Resource leak (RESOURCE_LEAK) 52. leaked_storage: Variable buf going out of scope leaks the storage buf.data points to. The function *record_in* has a new report about resource leak. I think Coverity is right. The normal path of the function frees the memory of several variables used. Therefore the failure path should also free them. A quick search on the web shows several occurrences of "malformed record literal", therefore failure is common in this function. Although Coveriy reports the leak of only buf.data, the variables *values* and *nulls* should also be released. While there, move the creation of stringdata, to ensure that in case of failure, the buf.data variable is released correctly. Attached a path. best regards, Ranier Vilela fix-resource-leak-rowtypes.patch Description: Binary data
Re: New committer: Jacob Champion
Hearty congratulations Jacob. On Mon, Apr 14, 2025 at 6:55 AM Richard Guo wrote: > > On Sat, Apr 12, 2025 at 5:26 AM Jonathan S. Katz wrote: > > The Core Team would like to extend our congratulations to Jacob > > Champion, who has accepted an invitation to become our newest PostgreSQL > > committer. > > > > Please join us in wishing Jacob much success and few reverts! > > Congratulations Jacob! Well deserved! > > Thanks > Richard > > -- Best Wishes, Ashutosh Bapat
Re: Logical Replication of sequences
Hi Vignesh, Some review comments for patch v20250325-0002 == Commit message 1. Furthermore, enhancements to psql commands (\d and \dRp) now allow for better display of publications containing specific sequences or sequences included in a publication. ~ That doesn't seem as clear as it might be. Also, IIUC the "sequences included in a publication" is not actually implemented yet -- there is only the "all sequences" flag. SUGGESTION Furthermore, enhancements to psql commands now display which publications contain the specified sequence (\d command), and if a specified publication includes all sequences (\dRp command) == doc/src/sgml/ref/create_publication.sgml 2. To add a table to a publication, the invoking user must have ownership - rights on the table. The FOR ALL TABLES and - FOR TABLES IN SCHEMA clauses require the invoking + rights on the table. The FOR TABLES IN SCHEMA, + FOR ALL TABLES and + FOR ALL SEQUENCES clauses require the invoking user to be a superuser. IMO these should all be using SGML markup same as elsewhere on this page, not markup. == src/backend/commands/publicationcmds.c 3. if (!superuser_arg(newOwnerId)) { if (form->puballtables) ereport(ERROR, errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied to change owner of publication \"%s\"", NameStr(form->pubname)), errhint("The owner of a FOR ALL TABLES publication must be a superuser.")); if (form->puballsequences) ereport(ERROR, errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied to change owner of publication \"%s\"", NameStr(form->pubname)), errhint("The owner of a FOR ALL SEQUENCES publication must be a superuser.")); if (is_schema_publication(form->oid)) ereport(ERROR, errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied to change owner of publication \"%s\"", NameStr(form->pubname)), errhint("The owner of a FOR TABLES IN SCHEMA publication must be a superuser.")); } I wondered if there's too much duplicated code here. Maybe it's better to share a common ereport? SUGGESTION if (!superuser_arg(newOwnerId)) { char *hint_msg = NULL; if (form->puballtables) hint_msg = _("The owner of a FOR ALL TABLES publication must be a superuser."); else if (form->puballsequences) hint_msg = _("The owner of a FOR ALL SEQUENCES publication must be a superuser."); else if (is_schema_publication(form->oid)) hint_msg = _("The owner of a FOR TABLES IN SCHEMA publication must be a superuser."); if (hint_msg) ereport(ERROR, errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied to change owner of publication \"%s\"", NameStr(form->pubname)), errhint(hint_msg)); } == src/bin/psql/describe.c describeOneTableDetails: 4. + res = PSQLexec(buf.data); + if (!res) + goto error_return; + + numrows = PQntuples(res); + Isn't this same code already done a few lines above in the same function? Maybe I misread something. == src/test/regress/sql/publication.sql 5. +-- check that describe sequence lists all publications the sequence belongs to Might be clearer to say: "lists both" instead of "lists all" == Kind Regards, Peter Smith. Fujitsu Australia
Re: Proposal: Adding compression of temporary files
> On Fri, Mar 28, 2025 at 09:23:13AM GMT, Filip Janus wrote: > > +else > > +if (nbytesOriginal - file->pos != 0) > > +/* curOffset must be corrected also if compression is > > + * enabled, nbytes was changed by compression but we > > + * have to use the original value of nbytes > > + */ > > +file->curOffset-=bytestowrite; > > > > It's not something introduced by the compression patch - the first part > > is what we used to do before. But I find it a bit confusing - isn't it > > mixing the correction of "logical file position" adjustment we did > > before, and also the adjustment possibly needed due to compression? > > > > In fact, isn't it going to fail if the code gets multiple loops in > > > > while (wpos < file->nbytes) > > { > >... > > } > > > > because bytestowrite will be the value from the last loop? I haven't > > tried, but I guess writing wide tuples (more than 8k) might fail. > > > > I will definitely test it with larger tuples than 8K. > > Maybe I don't understand it correctly, > the adjustment is performed in the case that file->nbytes and file->pos > differ. > So it must persist also if we are working with the compressed data, but the > problem is that data stored and compressed on disk has different sizes than > data incoming uncompressed ones, so what should be the correction value. > By debugging, I realized that the correction should correspond to the size > of > bytestowrite from the last iteration of the loop. I agree, this looks strange. If the idea is to set curOffset to its original value + pos, and the original value was advanced multiple times by bytestowrite, it seems incorrect to adjust it by bytestowrite, it seems incorrect to adjust it only once. From what I see current tests do not exercise a case where the while will get multiple loops, so it looks fine. At the same time maybe I'm missing something, but how exactly such test for 8k tuples and multiple loops in the while block should look like? E.g. when I force a hash join on a table with a single wide text column, the minimal tuple that is getting written to the temporary file still has rather small length, I assume due to toasting. Is there some other way to achieve that?
Re: Things I don't like about \du's "Attributes" column
On 11.04.2025 20:09, David G. Johnston wrote: However, I do think we are at something committable, though I'd make one, maybe two, changes to v8. Thank you for supporting this patch. Valid until -> Password valid until: the timestamp value already forces a wide column, adding the word Password to the header to clarify what is valid simply provides the same context that the create role page provides when it shows the valid until attribute immediately below the password attribute. Leaving "valid until" alone retains the attribute name tieback. Connection limit -> Con. limit: maybe this gets rejected on translation grounds but the abbreviation here seems obvious and shaves 7 characters off the mandatory width for a column that occupies 12 characters more than the values require. Yes, it can be done. I still have some doubts about these columns. Possibly these columns changes very rare, not very often. So, maybe a better solution is to show them only in \du+. In such case wide of columns is not a problem. But such change requires changing "Attributes" column back to one line. Even without those changes I as reviewer would concur with the proposal and try to move this on from bike-shedding to a go/no-go decision (by marking it Ready to Commit) as to whether this is an improvement over the status quo. Sincethe patchrequiresrebaseandimplementationof yoursuggestions, Idecidedto movethe entrytothenext(July)commitfestandsetthe status to "Waitingon Author". I hope that I will return to this work in June. Robert's idea (detailed role permissions report with \du rolename) can be implemented as a separate work. -- Pavel Luzanov Postgres Professional:https://postgrespro.com
Call for Posters: PGConf.dev 2025
Dear pgsql-hackers, As we prepare for the PostgreSQL Development Conference 2025 (PGConf.dev 2025), scheduled for May 13–16 in Montreal, Canada, we are excited to host a poster session showcasing the contributions and projects within our community. If you’re planning to attend PGConf.dev 2025, your participation would be an invaluable addition, and we’d love to showcase your work. I've contacted several authors of patches that have been rescheduled multiple times in the CommitFests. But then decided to make a broader call for posters. The goal of the poster session is to visually present your patch or project on an A2-sized poster (ANSI C, 17" × 22"). We encourage you to include: /* * A visual representation of your patch or project. * The conference logo and year. * Information on what attention or collaboration your project requires. * Contact details for potential collaborators to reach you or your team. * (E.g. a QR code to your patch or project, email, or messenger) */ If you're unsure where to start, I'll be happy to send you my own poster soon (created in PowerPoint) as a stub. Additionally, it would be greatly appreciated if your poster could be used as an example for others in their work. We hope to print and bring all the posters to the conference, but if you prefer to handle your poster's printing and transportation, it would certainly be appreciated. Posters will be displayed in dedicated conference spaces and removed after the event, and you are welcome to take them home if you wish. If you want to read about typical poster sessions you can reffer to this wiki page [0]. However, there are some differences. Please note that our poster session will be somewhat experimental. The main focus is on improving collaboration: attracting coauthors, reviewers, maybe committers, improving the overall discussion, etc. Let me know if you would like to present your poster or have a question. Thank you for your ongoing contributions to PostgreSQL, and I look forward to hearing from you. Best regards, Andrey Borodin [0] https://en.wikipedia.org/wiki/Poster_session
Re: [PATCH] PGSERVICEFILE as part of a normal connection string
On Mon, Apr 7, 2025 at 1:10 PM Michael Paquier wrote: > > On Thu, Apr 03, 2025 at 12:36:59AM +0900, Ryo Kanbayashi wrote: > > I'll reflect your notice and suggestion to the patch current I'm > > working on :) > > Thanks for that. > > And I have forgotten to add you as a reviewer of what has been > committed as 2c7bd2ba507e. Sorry for that :/ No problem :) I rebased our patch according to 2c7bd2ba507e. https://commitfest.postgresql.org/patch/5387/ --- Great regards, Ryo Kanbayashi v7-0001-add-servicefile-connection-option-feature.patch Description: Binary data
Re: remove unnecessary explicit type conversion (to char) for appendStringInfoChar function calls
On 2025-04-13 Su 8:12 AM, Álvaro Herrera wrote: On 2025-Apr-12, Andrew Dunstan wrote: Seems odd that this one is necessary at all. Doesn't a StringInfo always have a trailing null byte? AFAICT what this is doing that's significant, is increment StringInfo->len. Before doing it, the NUL byte is not part of the message; afterwards it is. That make sense, but then it would be nice to have the accompanying comment a bit clearer. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Buildfarm: Enabling injection points on basilisk/dogfish (Alpine / musl)
On 2025-04-12 Sa 10:10 PM, Noah Misch wrote: On Sat, Apr 12, 2025 at 07:51:06PM +0200, Wolfgang Walther wrote: With injection points enabled, I get the following errors in test_aio: [15:14:45.408](0.000s) not ok 187 - worker: first hard IO error is reported: expected stderr [15:14:45.409](0.000s) [15:14:45.409](0.000s) # Failed test 'worker: first hard IO error is reported: expected stderr' # at t/001_aio.pl line 810. [15:14:45.409](0.000s) # 'psql::88: ERROR: could not read blocks 2..2 in file "base/5/16408": I/O error' # doesn't match '(?^:ERROR:.*could not read blocks 2\.\.2 in file \"base/.*\": Input/output error)' It seems like it's just the error message that is different and has "I/O" instead of "Input/output"? Looks like it. On a more general note, does enabling injection points make any sense here? Yes, it does. I see that coverage in the build farm is not very big. IIUC, those are a development tool, so might not be relevant, because nobody is developing on Alpine / musl? No, whether anyone develops on the platform is not a factor. One hasn't fully tested PostgreSQL until one builds with injection points. Here's a simple fix ... also removes some unnecessary escaping and leaning toothpick syndrome. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/src/test/modules/test_aio/t/001_aio.pl b/src/test/modules/test_aio/t/001_aio.pl index c136d8ee8f5..ef4e5247e5b 100644 --- a/src/test/modules/test_aio/t/001_aio.pl +++ b/src/test/modules/test_aio/t/001_aio.pl @@ -813,7 +813,7 @@ SELECT invalidate_rel_block('tbl_ok', 2); "first hard IO error is reported", qq(SELECT count(*) FROM tbl_ok), qr/^$/, - qr/ERROR:.*could not read blocks 2\.\.2 in file \"base\/.*\": Input\/output error/ + qr!ERROR:.*could not read blocks 2\.\.2 in file "base/.*": (?:I/O|Input/output) error! ); psql_like( @@ -822,7 +822,7 @@ SELECT invalidate_rel_block('tbl_ok', 2); "second hard IO error is reported", qq(SELECT count(*) FROM tbl_ok), qr/^$/, - qr/ERROR:.*could not read blocks 2\.\.2 in file \"base\/.*\": Input\/output error/ + qr!ERROR:.*could not read blocks 2\.\.2 in file "base/.*": (?:I/O|Input/output) error! ); $psql->query_safe(qq(SELECT inj_io_short_read_detach()));
Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
On Sat, Apr 05, 2025 at 07:09:58PM -0700, Noah Misch wrote: > On Sun, Apr 06, 2025 at 07:42:02AM +0900, Michael Paquier wrote: > > On Sat, Apr 05, 2025 at 12:13:39PM -0700, Noah Misch wrote: > > > Since the 2025-02 releases made non-toy-size archive recoveries fail > > > easily, > > > that's not enough. If the proposed 3-second test is the wrong thing, what > > > instead? > > > > I don't have a good idea about that in ~16, TBH, but I am sure to not > > be a fan of the low reproducibility rate of this test as proposed. > > It's not perfect, but as the design to fix the original race condition > > has been introduced in v15, why not begin with a test in 17~ using > > some injection points? > > Two reasons: > > a) The fix ended calls to the whole range of relevant code. Hence, the >injection point placement that would have been relevant before the fix >isn't reached. In other words, there's no right place for the injection >point. (The place for the injection point would be in durable_rename(), in >the checkpointer. After the fix, the checkpointer just doesn't call >durable_rename().) > > b) Stochastic tests catch defects beyond the specific one the test author >targeted. An injection point test is less likely to do that. (That said, >with reason (a), there's no known injection point test design to compete >with the stochastic design.) Tom and Michael, do you still object to the test addition, or not? If there are no new or renewed objections by 2025-04-20, I'll proceed to add the test. As another data point, raising the runtime from 3s to 17s makes it reproduce the problem 25% of the time. You can imagine a plot with axes of runtime and percent detection. One can pick any point on that plot's curve. Given how little wall time it takes for the buildfarm and CI to reach a few hundred runs, I like the trade-off of 3s runtime and 1% detection. In particular, I like it better than 17s runtime for 25% detection. How do you see it?
Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
Noah Misch writes: > Tom and Michael, do you still object to the test addition, or not? If there > are no new or renewed objections by 2025-04-20, I'll proceed to add the test. While I still don't love it, I don't have a better proposal. regards, tom lane
Re: TOAST versus toast
Hi, I think this point is of no significance at all. Besides, this is a document that has been around for over ten years. Everyone has become accustomed to this kind of expression. This is just a case of being full but having nothing to do with anything. On Sat, 12 Apr 2025 at 10:31, David G. Johnston wrote: > On Sun, Mar 16, 2025 at 8:33 PM Peter Smith wrote: > >> Thanks for your suggestions. At this point option (1) is looking most >> attractive. Probably, I will just withdraw the CF entry soon unless >> there is some new interest. Just chipping away fixing a few places >> isn't going to achieve the consistency this thread was aiming for. >> >> > I've moved this back to waiting on author pending a final decision. > Interested parties might still chime in but it doesn't seem like it is > actively looking for reviewers at this point. > > David J. > >
Re: CSN snapshots in hot standby
Hi all, Apologies — the patch I sent earlier did not appear as expected on the mailing list archives because of wrong attachment style. I'll resend it properly as an inline patch shortly. Thanks for your understanding! Best regards, Mingwei Jia v7-0013-use-clog-in-XidInMVCCSnapshot.patch Description: Binary data
Re: remove unnecessary explicit type conversion (to char) for appendStringInfoChar function calls
On 2025-Apr-12, Andrew Dunstan wrote: > Seems odd that this one is necessary at all. Doesn't a StringInfo always > have a trailing null byte? AFAICT what this is doing that's significant, is increment StringInfo->len. Before doing it, the NUL byte is not part of the message; afterwards it is. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "The important things in the world are problems with society that we don't understand at all. The machines will become more complicated but they won't be more complicated than the societies that run them."(Freeman Dyson)
pg_restore --format= option(without any value) should report an error as pg_dump is reporting an error
Hi, With "pg_restore --format=", we are not giving any error because in code, we are checking length of arg but pg_dump is reporting an error for the same option. For the consistency purpose, pg_dump and pg_restore both should report an error for the test case below. *Ex: (output after this patch)but before this patch, below command is passing.* /pg_restore x1 -d postgres -j 10 -C --verbose --format= pg_restore: error: unrecognized archive format ""; please specify "c", "d", or "t" Here, I am attaching a patch which is fixing the same. I added 2 TAP tests also for invalid options. *Note:* We have 2 more options in pg_restore code which validate the option if arg has non zero length. I will prepare patches for both(--host and --port). We need to add some validate function also for both these options. -- Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com v01-pg_restore-format-option-should-validate-all-values.patch Description: Binary data