Re: per backend I/O statistics
Hi, On Tue, Nov 19, 2024 at 10:47:53AM +0900, Michael Paquier wrote: > On Thu, Nov 14, 2024 at 01:30:11PM +, Bertrand Drouvot wrote: > > - change the arguments to false in the pgstat_drop_entry_internal() call in > > pgstat_drop_all_entries() > > - start the engine > > - kill -9 postgres > > - restart the engine > > > > You'll see the assert failing due to the startup process. I don't think it > > is > > doing something wrong though, just populating its backend stats. Do you have > > another view on it? > > It feels sad that we have to plug in the internals at this level for > this particular case. Perhaps there is something to do with more > callbacks. Or perhaps there is just no point in tracking the stats of > auxiliary processes because we already have this data in the existing > pg_stat_io already? We can not discard the "per backend" stats collection for auxiliary processes because those are the source for pg_stat_io too (once the 2 flush callbacks are merged). I have another idea: after a bit more of investigation it turns out that only the startup process is generating the failed assertion. So, for the startup process only, what about? - don't call pgstat_create_backend_stat() in pgstat_beinit()... - but call it in StartupXLOG() instead (after the stats are discarded or restored). That way we could get rid of the changes linked to the assertion and still handle the startup process particular case. Thoughts? > > How would that work for someone doing "select * from > > pg_stat_get_backend_io()"? > > > > Do you mean copy/paste part of the code from pg_stat_get_activity() into > > pg_stat_get_backend_io() to get the backend type? That sounds like an idea, > > I'll have a look at it. > > I was under the impression that we should keep PgStat_Backend for the > reset_timestamp part, but drop BackendType as it can be guessed from > pg_stat_activity itself. Removing BackendType sounds doable, I'll look at it. > >> Structurally, it may be cleaner to keep all the callbacks and the > >> backend-I/O specific logic into a separate file, perhaps > >> pgstat_io_backend.c or pgstat_backend_io? > > > > Yeah, I was wondering the same. What about pgstat_backend.c (that would > > contain > > only one "section" dedicated to IO currently)? > > pgstat_backend.c looks good to me. Could there be other stats than > just IO, actually? Perhaps not, just asking.. Yeah that's the reason why I suggested "pgstat_backend.c". I would not be surprised if we add more "per backend" stats (that are not I/O related) in the future as the main machinery would be there. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
On 10/31/24 11:18, Vitaly Davydov wrote: > Dear Hackers, > > > > I'd like to discuss a problem with replication slots's restart LSN. > Physical slots are saved to disk at the beginning of checkpoint. At the > end of checkpoint, old WAL segments are recycled or removed from disk, > if they are not kept by slot's restart_lsn values. > I agree that if we can lose WAL still needed for a replication slot, that is a bug. Retaining the WAL is the primary purpose of slots, and we just fixed a similar issue for logical replication. > > If an existing physical slot is advanced in the middle of checkpoint > execution, WAL segments, which are related to saved on disk restart LSN > may be removed. It is because the calculation of the replication slot > miminal LSN is occured at the end of checkpoint, prior to old WAL > segments removal. If to hard stop (pg_stl -m immediate) the postgres > instance right after checkpoint and to restart it, the slot's > restart_lsn may point to the removed WAL segment. I believe, such > behaviour is not good. > Not sure I 100% follow, but let me rephrase, just so that we're on the same page. CreateCheckPoint() does this: ... something ... CheckPointGuts(checkPoint.redo, flags); ... something ... RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr, checkPoint.ThisTimeLineID); The slots get synced in CheckPointGuts(), so IIUC you're saying if the slot gets advanced shortly after the sync, the RemoveOldXlogFiles() may remove still-needed WAL, because we happen to consider a fresh restart_lsn when calculating the logSegNo. Is that right? > > The doc [0] describes that restart_lsn may be set to the some past value > after reload. There is a discussion [1] on pghackers where such > behaviour is discussed. The main reason of not flushing physical slots > on advancing is a performance reason. I'm ok with such behaviour, except > of that the corresponding WAL segments should not be removed. > I don't know which part of [0] you refer to, but I guess you're referring to this: The current position of each slot is persisted only at checkpoint, so in the case of a crash the slot may return to an earlier LSN, which will then cause recent changes to be sent again when the server restarts. Logical decoding clients are responsible for avoiding ill effects from handling the same message more than once. Yes, it's fine if we discard the new in-memory restart_lsn value, and we do this for performance reasons - flushing the slot on every advance would be very expensive. I haven't read [1] as it's quite long, but I guess that's what it says. But we must not make any "permanent" actions based on the unflushed value, I think. Like, we should not remove WAL segments, for example. > > > I propose to keep WAL segments by saved on disk (flushed) restart_lsn of > slots. Add a new field restart_lsn_flushed into ReplicationSlot > structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It > doesn't change the format of storing the slot contents on disk. I > attached a patch. It is not yet complete, but demonstate a way to solve > the problem. > That seems like a possible fix this, yes. And maybe it's the right one. > > I reproduced the problem by the following way: > > * Add some delay in CheckPointBuffers (pg_usleep) to emulate long > checkpoint execution. > * Execute checkpoint and pg_replication_slot_advance right after > starting of the checkpoint from another connection. > * Hard restart the server right after checkpoint completion. > * After restart slot's restart_lsn may point to removed WAL segment. > > The proposed patch fixes it. > I tried to reproduce the issue using a stress test (checkpoint+restart in a loop), but so far without success :-( Can you clarify where exactly you added the pg_usleep(), and how long are the waits you added? I wonder if the sleep is really needed, considering the checkpoints are spread anyway. Also, what you mean by "hard reset"? What confuses me a bit is that we update the restart_lsn (and call ReplicationSlotsComputeRequiredLSN() to recalculate the global value) all the time. Walsender does that in PhysicalConfirmReceivedLocation for example. So we actually see the required LSN to move during checkpoint very often. So how come we don't see the issues much more often? Surely I miss something important. Another option might be that pg_replication_slot_advance() doesn't do something it should be doing. For example, shouldn't be marking the slot as dirty? regards -- Tomas Vondra
Re: wrong comment in libpq.h
On Thu, Oct 17, 2024 at 02:24:33PM +0200, Daniel Gustafsson wrote: > > On 17 Oct 2024, at 04:45, Bruce Momjian wrote: > > > I looked at this and decided the GUC section was illogical, so I just > > moved the variables up into the be-secure.c section. Patch attached. > > No objections. Patch applied to master. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: On non-Windows, hard depend on uselocale(3)
On Wed, Nov 20, 2024 at 10:00 PM Thomas Munro wrote: > OK, do you think these three patches tell the _configthreadlocale() > story properly? (Then after that we can get back to getting rid of > it...) Just by the way, here's another interesting thing I have learned about the msvcrt->ucrt evolution: ucrt introduced UTF-8 support (ie locales can use UTF-8 encoding, and all the standard library char functions work with it just fine like on Unix), but PostgreSQL still believes that Windows can't do that and has a lot of special code paths to work with wchar_t and perform extra conversions. I started nibbling at that[1], but at the time I was still a bit fuzzy on whether we could really rip all that old stuff out yet and harmonise with Unix, as I didn't understand the MinGW/runtime/history situation. It seems the coast is now clear, and that would be a satisfying cleanup. (There's still non-trivial work to do for that though: we allowed weird mismatched encoding scenarios just on that OS, and would need to stop that, which might create some upgrade path problems, needs some thought, see that thread.) [1] https://www.postgresql.org/message-id/CA%2BhUKGJ%3Dca39Cg%3Dy%3DS89EaCYvvCF8NrZRO%3Duog-cnz0VzC6Kfg%40mail.gmail.com
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge wrote: > OK, I'm fine with this. v4 patch attached with one plan showing read, > written, and dirtied buffers. I think this might be a good time for anyone out there who is against turning on BUFFERS when ANALYZE is on to speak up. Votes for changing this so far seem to be: Me, Michael Christofides, Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from 2020) [1]. Votes against: None David [1] https://www.postgresql.org/message-id/20200601010025.GS13741%40fetter.org
Re: Make COPY format extendable: Extract COPY TO format implementations
On Mon, Nov 18, 2024 at 8:44 PM Masahiko Sawada wrote: > > On Mon, Nov 18, 2024 at 5:31 PM Sutou Kouhei wrote: > > > > Hi, > > > > In > > "Re: Make COPY format extendable: Extract COPY TO format implementations" > > on Mon, 18 Nov 2024 17:02:41 -0800, > > Masahiko Sawada wrote: > > > > > I have a question about v22. We use pg_attribute_always_inline for > > > some functions to avoid function call overheads. Applying it to > > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() are legitimate as > > > we've discussed. But there are more function where the patch applied > > > it to: > > > > > > -bool > > > -NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields) > > > +static pg_attribute_always_inline bool > > > +NextCopyFromRawFields(CopyFromState cstate, char ***fields, int > > > *nfields, bool is_csv) > > > > > > -static bool > > > -CopyReadLineText(CopyFromState cstate) > > > +static pg_attribute_always_inline bool > > > +CopyReadLineText(CopyFromState cstate, bool is_csv) > > > > > > +static pg_attribute_always_inline void > > > +CopyToTextLikeSendEndOfRow(CopyToState cstate) > > > > > > I think it's out of scope of this patch even if these changes are > > > legitimate. Is there any reason for these changes? > > > > Yes for NextCopyFromRawFields() and CopyReadLineText(). > > No for CopyToTextLikeSendEndOfRow(). > > > > NextCopyFromRawFields() and CopyReadLineText() have "bool > > is_csv". So I think that we should use > > pg_attribute_always_inline (or inline) like > > CopyToTextLikeOneRow() and CopyFromTextLikeOneRow(). I think > > that it's not out of scope of this patch because it's a part > > of CopyToTextLikeOneRow() and CopyFromTextLikeOneRow() > > optimization. > > > > Note: The optimization is based on "bool is_csv" parameter > > and constant "true"/"false" argument function call. If we > > can inline this function call, all "if (is_csv)" checks in > > the function are removed. > > Understood, thank you for pointing this out. > > > > > pg_attribute_always_inline (or inline) for > > CopyToTextLikeSendEndOfRow() is out of scope of this > > patch. You're right. > > > > I think that inlining CopyToTextLikeSendEndOfRow() is better > > because it's called per row. But it's not related to the > > optimization. > > > > > > Should I create a new patch set without > > pg_attribute_always_inline/inline for > > CopyToTextLikeSendEndOfRow()? Or could you remove it when > > you push? > > Since I'm reviewing the patch and the patch organization I'll include it. > I've extracted the changes to refactor COPY TO/FROM to use the format callback routines from v23 patch set, which seems to be a better patch split to me. Also, I've reviewed these changes and made some changes on top of them. The attached patches are: 0001: make COPY TO use CopyToRoutine. 0002: minor changes to 0001 patch. will be fixed up. 0003: make COPY FROM use CopyFromRoutine. 0004: minor changes to 0003 patch. will be fixed up. I've confirmed that v24 has a similar performance improvement to v23. Please check these extractions and minor change suggestions. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com From 257a284447e64753277f7bc08b387e901bcab8bb Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Tue, 19 Nov 2024 11:52:33 -0800 Subject: [PATCH v24 2/4] fixup: fixup: minor updates for COPY TO refactoring. includes: - reroder function definitions. - clenaup comments. --- src/backend/commands/copyto.c | 242 +++-- src/include/commands/copyapi.h | 23 ++-- 2 files changed, 121 insertions(+), 144 deletions(-) diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c index 46f3507a8b..73b9ca4457 100644 --- a/src/backend/commands/copyto.c +++ b/src/backend/commands/copyto.c @@ -65,7 +65,7 @@ typedef enum CopyDest */ typedef struct CopyToStateData { - /* format routine */ + /* format-specific routines */ const CopyToRoutine *routine; /* low-level state data */ @@ -118,6 +118,19 @@ static void CopyAttributeOutText(CopyToState cstate, const char *string); static void CopyAttributeOutCSV(CopyToState cstate, const char *string, bool use_quote); +/* built-in format-specific routines */ +static void CopyToTextLikeStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToTextLikeOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToTextLikeOneRow(CopyToState cstate, TupleTableSlot *slot, + bool is_csv); +static void CopyToTextLikeEnd(CopyToState cstate); +static void CopyToBinaryStart(CopyToState cstate, TupleDesc tupDesc); +static void CopyToBinaryOutFunc(CopyToState cstate, Oid atttypid, FmgrInfo *finfo); +static void CopyToBinaryOneRow(CopyToState cstate, TupleTableSlot *slot); +static void CopyToBinaryEnd(CopyToState cstate); + /* Low-leve
Re: UUID v7
On Tue, Nov 19, 2024 at 7:54 PM Andrey M. Borodin wrote: > > > > > On 20 Nov 2024, at 00:06, Masahiko Sawada wrote: > > > > On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin > > wrote: > >> > >> > >> > >>> On 19 Nov 2024, at 14:31, Andrey M. Borodin wrote: > >>> > >>> Done. > >> > >> Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to > >> compensate lack of nanoseconds). > >> What do you think? > >> > > > > Thank you for updating the patch! > > > > I've reviewed the v33 patch and made some changes mostly for cosmetic > > things. Please review it to see if we accept these changes. > > Your changes look good to me. I particularly like sortability test. > I see that you removed implementation of clock_gettime() for Windows. Well, > this makes sense. > > > > > > I have one question about the additional patch: > > > > +#if defined(__darwin__) > > + /* > > + * On MacOS real time is truncted to microseconds. Thus, 2 least > > + * significant bits of increased_clock_precision are neither random > > + * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits > > + * are dependent on other time-specific bits, thus they do not contribute > > + * to uniqueness. To make these bit random we mix in two bits from CSPRNG. > > + */ > > + uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6); > > +#endif > > > > I thought that the whole 12 bits in "rand_a" is actually > > time-dependent since we store 1/4096 fraction of sub-milliseconds. Am > > I missing something? > > We have 12 bits in increaesd_clock_precission but only 1000 possible values > of these bits. 2 least significant bits are defined by other 10 bits. > These bits are not equal to 0, they are changing. > True, these bits are time-dependent in a sense that these bits are be > computed from a full timestamp. I wanted to express the fact that timestamp > cannot be altered in a way so only these 2 bits are changed. Understood the idea. But does replacing the least significant 2 bits with random 2 bits really not affect monotonicity? The ensured minimal timestamp step is 245, ((NS_PER_MS / (1 << 12)) + 1), meaning that if two UUIDs are generated within a microsecond on macOS, the two timestamps differ by 245 ns. After calculating the increased clock precision with these two timestamps, they differ only by 1, which seems to be problematic to me. Suppose the two timestamps are: ns1: 1732142033754429000 (Nov 20, 2024 10:33:53.754429000) ns2: 1732142033754429245 (Nov 20, 2024 10:33:53.754429245) Their sub-milliseconds are calculated (by multiplying by 4096) to: subms1: 1757 (0b011011011101) subms2: 1758 (0b01101100) If we replace the least significant bits '01' of subms1 with random bits '11' and replace '10' of subms2 with '00', we cannot guarantee the monotonicity. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: per backend I/O statistics
On Wed, Nov 20, 2024 at 02:10:23PM +, Bertrand Drouvot wrote: > Yeah, also this could useful for custom statistics. So I created a dedicated > thread and a patch proposal (see [1]). > > [1]: > https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal Thanks for putting that on its own thread. Will look at it. -- Michael signature.asc Description: PGP signature
Re: ci: Macos failures due to MacPorts behaviour change
Oh, and yeah, we should include the branch name in the cache key. Something like the attached. For some reason CI is not allowing me to see the output from macOS right now (?!) so I couldn't see what "Populate macports cache" printed out[1], but I think this should be right... will try again tomorrow. I guess the alternative would be to set the package list the same across all branches, even though they need different stuff, so they could share the same cache without fighting over it? [1] https://cirrus-ci.com/task/6707217489985536 From 191236b828e737de2b1a9b62140d0f9b2c9dbb70 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Mon, 18 Nov 2024 21:46:38 +1300 Subject: [PATCH v2] ci: Fix cached MacPorts installation management. 1. The error reporting of "port setrequested list-of-packages..." changed, hiding errors we were relying on to know if all packages in our list were already installed. Use a loop instead. 2. The cached MacPorts installation was shared between PostgreSQL major branches, though each branch wanted different packages. Add the branch name to the cache key, so that different branches, when tested in one github account/repo such as postgres/postgres, stop fighting with each other, adding and removing packages. Back-patch to 15 where CI began. Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/au2uqfuy2nf43nwy2txmc5t2emhwij7kzupygto3d2ffgtrdgr%40ckvrlwyflnh2 --- .cirrus.tasks.yml| 2 ++ src/tools/ci/ci_macports_packages.sh | 10 -- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index fc413eb11ef..cd8e1b5eb0b 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -466,6 +466,8 @@ task: # Include the OS major version in the cache key. If the OS image changes # to a different major version, we need to reinstall. sw_vers -productVersion | sed 's/\..*//' + # Include the branch name, because branches want different packages + echo ${CIRRUS_BRANCH} # Also start afresh if we change our MacPorts install script. md5 src/tools/ci/ci_macports_packages.sh reupload_on_changes: true diff --git a/src/tools/ci/ci_macports_packages.sh b/src/tools/ci/ci_macports_packages.sh index b3df6d36a4e..41cb83593f7 100755 --- a/src/tools/ci/ci_macports_packages.sh +++ b/src/tools/ci/ci_macports_packages.sh @@ -61,9 +61,15 @@ fi # if setting all the required packages as requested fails, we need # to install at least one of them -if ! sudo port setrequested $packages > /dev/null 2>&1 ; then -echo not all required packages installed, doing so now +echo "checking if all required packages are installed" +for package in $packages ; do + if ! sudo port setrequested $package > /dev/null 2>&1 ; then update_cached_image=1 + fi +done +echo "done" +if [ "$update_cached_image" -eq 1 ]; then +echo not all required packages installed, doing so now # to keep the image small, we deleted the ports tree from the image... sudo port selfupdate # XXX likely we'll need some other way to force an upgrade at some -- 2.47.0
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Alexander Lakhin writes: > 17.11.2024 05:33, Tom Lane wrote: >> Yeah. This has been happening off-and-on in the buildfarm ever >> since we added that test. I'm not sure if it's just "the test >> is unstable" or if it's telling us there's a problem with the >> cancel logic. Scraping the last 3 months worth of buildfarm >> logs finds these instances: > Yes, I counted those bf failures at [1] too and posted my explanation > upthread [2]. Sorry, I'd forgotten about that. I added some more debug logging to the modifications you made, and confirmed your theory that the remote session is ignoring the cancel request because it receives it while DoingCommandRead is true; which must mean that it hasn't started the slow query yet. This implies that the 100ms delay in query_cancel.sql is not reliably enough for the remote to receive the command, which surprises me, especially since the failing animals aren't particularly slow ones. Maybe there is something else happening? But I do reproduce the failure after adding your delays, and the patch I'm about to propose does fix it. Anyway, given that info, Jelte's unapplied 0002 patch earlier in the thread is not the answer, because this is about dropping a query cancel not about losing a timeout interrupt. The equivalent thing to what he suggested would be to not clear the cancel request flag during DoingCommandRead, instead letting it kill the next query. But I didn't like the idea for timeouts, and I like it even less for query cancel. What I think we should do instead is to re-issue the cancel request if we've waited a little and nothing came of it. This corresponds more or less to what a human user would likely do (or at least this human would). The attached patch is set up to re-cancel after 1 second, then 2 more seconds, then 4 more, etc until we reach the 30-second "it's dead Jim" threshold. This seems to fix the problem here. Thoughts? BTW, while I didn't do it in the attached, I'm tempted to greatly reduce the 100ms delay in query_cancel.sql. If this does make it more robust, we shouldn't need that much time anymore. regards, tom lane diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 2326f391d3..7a8cac83cb 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -95,6 +95,13 @@ static uint32 pgfdw_we_get_result = 0; */ #define CONNECTION_CLEANUP_TIMEOUT 3 +/* + * Milliseconds to wait before issuing another cancel request. This covers + * the race condition where the remote session ignored our cancel request + * because it arrived while idle. + */ +#define RE_CANCEL_TIMEOUT 1000 + /* Macro for constructing abort command to be sent */ #define CONSTRUCT_ABORT_COMMAND(sql, entry, toplevel) \ do { \ @@ -145,6 +152,7 @@ static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_cancel_query(PGconn *conn); static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime); static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, + TimestampTz recanceltime, bool consume_input); static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, bool ignore_errors); @@ -154,6 +162,7 @@ static bool pgfdw_exec_cleanup_query_end(PGconn *conn, const char *query, bool consume_input, bool ignore_errors); static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime, + TimestampTz recanceltime, PGresult **result, bool *timed_out); static void pgfdw_abort_cleanup(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_abort_cleanup_begin(ConnCacheEntry *entry, bool toplevel, @@ -1322,18 +1331,25 @@ pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel) static bool pgfdw_cancel_query(PGconn *conn) { + TimestampTz now = GetCurrentTimestamp(); TimestampTz endtime; + TimestampTz recanceltime; /* * If it takes too long to cancel the query and discard the result, assume * the connection is dead. */ - endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), - CONNECTION_CLEANUP_TIMEOUT); + endtime = TimestampTzPlusMilliseconds(now, CONNECTION_CLEANUP_TIMEOUT); + + /* + * Also, lose patience and re-issue the cancel request after a little bit. + * (This serves to close some race conditions.) + */ + recanceltime = TimestampTzPlusMilliseconds(now, RE_CANCEL_TIMEOUT); if (!pgfdw_cancel_query_begin(conn, endtime)) return false; - return pgfdw_cancel_query_end(conn, endtime, false); + return pgfdw_cancel_query_end(conn, endtime, recanceltime, false); } /* @@ -1359,9 +1375,10 @@ pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) } static bool -pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input) +pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, + TimestampTz recanceltime, bool consume_input) { - PGresult *result = NUL
Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions
On Tue, Nov 5, 2024 at 05:24:07PM +0800, jian he wrote: > On Thu, Oct 31, 2024 at 11:51 PM Bruce Momjian wrote: > > > > On Fri, Oct 18, 2024 at 10:00:54AM +0800, jian he wrote: > > > On Fri, Oct 18, 2024 at 2:05 AM Bruce Momjian wrote: > > > > Yes, updated patch attached. > > > > > > > looks good. > > > > > > in the meantime, do you think it's necessary to slightly rephrase > > > jsonb_path_match doc entry: > > > > > > currently doc entry: > > > jsonb_path_match ( target jsonb, path jsonpath [, vars jsonb [, silent > > > boolean ]] ) → boolean > > > Returns the result of a JSON path predicate check for the specified JSON > > > value. > > > > > > > > > "the result of a JSON path predicate check for the specified JSON > > > value." is a jsonb boolean. > > > but jsonb_path_match returns sql boolean. > > > maybe add something to describe case like: "if JSON path predicate > > > check return jsonb null, jsonb_path_match will return SQL null". > > > > Yes, I think that is a good point, updated patch attached. > > > > played with > https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-FILTER-EX-TABLE > > The patch looks good to me. Patch applied back to PG 17. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Accessing other session's temp table
Mikhail Gribkov writes: > What do you think? I think this will break cases we don't want to break. Accessing the metadata of other temp tables is fine, and indeed necessary for operations like dropping them. It's access to the table contents that needs to be blocked. I'm surprised that we don't have sufficient tests at that level. [ experiments... ] It looks like this did work as expected up through v15. So somebody broke it fairly recently, perhaps as a side effect of the table-AM work. Might be worth bisecting to see where it broke. (Realistically though, this is all only a problem for superusers, who are supposed to Know Better. For ordinary users the permissions set on temp schemas ought to be enough to prevent such things.) regards, tom lane
Re: Return pg_control from pg_backup_stop().
On 10/3/24 05:11, David Steele wrote: On 10/3/24 07:45, Michael Paquier wrote: 1) is something that has more value than 2), IMO, because there is no need for a manual step when a backup is taken by the replication protocol. Well, custom backup solutions that rely on the replication protocol to copy the data would need to make sure that they have a backup_label, but that's something they should do anyway and what this patch wants to protect users from. The SQL part is optional IMO. It can be done, but it has less impact overall and makes backups more complicated by requiring the manual copy of the control file. I don't think having incremental backup in pg_basebackup means alternate backup solutions are going away or that we should deprecate the SQL interface. If nothing else, third-party solutions need a way to get an untorn copy of pg_control and in general I think the new flag will be universally useful. I updated this patch to fix an issue with -fsanitize=alignment. I'm not entirely happy with copying twice but not sure of another way to do it. As far as I can see VARDATA() will not return aligned data on 64-bit architectures. Regards, -David From 659a1d9b6b1528e139741dc69146848ca15a07b9 Mon Sep 17 00:00:00 2001 From: David Steele Date: Wed, 20 Nov 2024 22:33:36 + Subject: Return pg_control from pg_backup_stop(). Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required. --- doc/src/sgml/backup.sgml| 18 +- doc/src/sgml/func.sgml | 5 +- src/backend/access/transam/xlogfuncs.c | 20 -- src/backend/catalog/system_functions.sql| 2 +- src/include/catalog/pg_proc.dat | 4 +- src/test/recovery/t/042_low_level_backup.pl | 67 - 6 files changed, 101 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index e4e4c56cf14..2fcf1811213 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); values. The second of these fields should be written to a file named backup_label in the root directory of the backup. The third field should be written to a file named - tablespace_map unless the field is empty. These files are + tablespace_map unless the field is empty. The fourth + field should be written into a file named + global/pg_control (overwriting the existing file when + present). These files are vital to the backup working and must be written byte for byte without - modification, which may require opening the file in binary mode. + modification, which will require opening the file in binary mode. @@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); -You should, however, omit from the backup the files within the +You should exclude global/pg_control from your backup +and put the contents of the controlfile column +returned from pg_backup_stop in your backup at +global/pg_control. This version of pg_control contains +safeguards against recovery without backup_label present and is guaranteed +not to be torn. + + + +You should also omit from the backup the files within the cluster's pg_wal/ subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fa37a0469ae..b3d8c73295b 100644 --- a/doc/src/sgml/func.
Re: cannot freeze committed xmax
> On Nov 20, 2024, at 6:39 AM, Andrey M. Borodin wrote: > > > >> On 20 Nov 2024, at 15:58, Andrey M. Borodin wrote: >> >> PFA the patch doing so. > > Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked > uninitiated. > But, well, it highlights the idea: make verify_heapam() aware of such > corruptions. > What do you think? I like the idea of increasing the corruption checking coverage. The worry with these patches is that we'll overlook some legitimate use case of the status bits and call it corruption when it isn't. Indeed, that appears to be the case with this patch, assuming I initialized the xmax_status field in the way you had in mind, and that applying it to REL_17_STABLE is ok. (Maybe this would work differently on HEAD?) + get_xid_status(xmax, ctx, &xmax_status); + if (xmax_status == XID_COMMITTED && (tuphdr->t_infomask & HEAP_UPDATED)) + { + report_corruption(ctx, + psprintf("committed xmax %u while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED flags", + xmax)); + } That results in TAP test failures on a uncorrupted but frozen table: # +++ tap check in contrib/amcheck +++ t/001_verify_heapam.pl . 74/? # Failed test 'all-frozen not corrupted table' # at t/001_verify_heapam.pl line 53. # got: '30|117||committed xmax 2 while tuple has HEAP_XMAX_INVALID and HEAP_UPDATED flags' # expected: '' t/001_verify_heapam.pl . 257/? # Looks like you failed 1 test of 272. t/001_verify_heapam.pl . Dubious, test returned 1 (wstat 256, 0x100) Failed 1/272 subtests The first part of your patch which checks the xmin_status seems ok at first glance. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
On 11/20/24 18:24, Tomas Vondra wrote: > > ... > > What confuses me a bit is that we update the restart_lsn (and call > ReplicationSlotsComputeRequiredLSN() to recalculate the global value) > all the time. Walsender does that in PhysicalConfirmReceivedLocation for > example. So we actually see the required LSN to move during checkpoint > very often. So how come we don't see the issues much more often? Surely > I miss something important. > This question "How come we don't see this more often?" kept bugging me, and the answer is actually pretty simple. The restart_lsn can move backwards after a hard restart (for the reasons explained), but physical replication does not actually rely on that. The replica keeps track of the LSN it received (well, it uses the same LSN), and on reconnect it sends the startpoint to the primary. And the primary just proceeds use that instead of the (stale) restart LSN for the slot. And the startpoint is guaranteed (I think) to be at least restart_lsn. AFAICS this would work for pg_replication_slot_advance() too, that is if you remember the last LSN the slot advanced to, it should be possible to advance to it just fine. Of course, it requires a way to remember that LSN, which for a replica is not an issue. But this just highlights we can't rely on restart_lsn for this purpose. (Apologies if this repeats something obvious, or something you already said, Vitaly.) regards -- Tomas Vondra
Re: per backend I/O statistics
On Wed, Nov 20, 2024 at 02:20:18PM +, Bertrand Drouvot wrote: > Right. I did not had in mind to go that far here (for the per backend stats > needs). My idea was "just" to move the new pgstat_create_backend_stat() (which > is related to per backend stats only) call at the right place in StartupXLOG() > for the startup process only. As that's the startup process that will reset > or restore the stats I think that makes sense. > > It looks like that what you have in mind is much more generic, what about: > > - Focus on this thread first and then move the call as proposed above > - Think about a more generic idea later on (on the per-backend I/O stats is > in). Moving pgstat_create_backend_stat() may be OK in the long-term, at least that would document why we need to care about the startup process. Still, moving only the call feels incomplete, so how about adding a boolean field in PgStat_ShmemControl that defaults to false when the shmem area is initialized in StatsShmemInit(), then switched to true once we know that the stats have been restored or reset by the startup process. Something like that should work to control the dshash inserts, I guess? -- Michael signature.asc Description: PGP signature
Re: sunsetting md5 password support
On Wed, Nov 20, 2024 at 11:33 AM Nathan Bossart wrote: > After thinking about this some more, I'm actually finding myself leaning > towards leaving the hint and potentially adding more detail to the > documentation as a follow-up patch. Sounds good to me. I think my hesitation was more that the hint was overpromising help, so big +1 to more detail and keeping it. Cheers, Greg
Re: per backend I/O statistics
Hi, On Wed, Nov 20, 2024 at 09:01:26AM +0900, Michael Paquier wrote: > On Tue, Nov 19, 2024 at 04:28:55PM +, Bertrand Drouvot wrote: > > So, for the startup process only, what about? > > > > - don't call pgstat_create_backend_stat() in pgstat_beinit()... > > - but call it in StartupXLOG() instead (after the stats are discarded or > > restored). > > > > That way we could get rid of the changes linked to the assertion and still > > handle > > the startup process particular case. Thoughts? > > Hmm. That may prove to be a good idea in the long-term. The startup > process is a specific path kicked in at a very early stage, so it is > also a bit weird that we'd try to insert statistics while we are going > to reset them anyway a bit later. Exactly. > That may also be relevant for > custom statistics, actually, especially if some calls happen in some > hook paths taken before the reset is done. This could happen for > injection point stats when loading it with shared_preload_libraries, > actually, if you load, attach or run a callback at this early stage. Right. I did not had in mind to go that far here (for the per backend stats needs). My idea was "just" to move the new pgstat_create_backend_stat() (which is related to per backend stats only) call at the right place in StartupXLOG() for the startup process only. As that's the startup process that will reset or restore the stats I think that makes sense. It looks like that what you have in mind is much more generic, what about: - Focus on this thread first and then move the call as proposed above - Think about a more generic idea later on (on the per-backend I/O stats is in). Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: proposal: schema variables
> On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote: > Hi > > I wrote POC of VARIABLE(varname) syntax support Thanks, the results look good. I'm still opposed the idea of having a warning, and think it has to be an error -- but it's my subjective opinion. Lets see if there are more votes on that topic.
Re: ALTER TABLE uses a bistate but not for toast tables
> On Wed, Nov 20, 2024 at 06:43:58AM -0600, Justin Pryzby wrote: > > > Thanks for rebasing. To help with review, could you also describe > > current status of the patch? I have to admit, currently the commit > > message doesn't tell much, and looks more like notes for the future you. > > The patch does what it aims to do and AFAIK in a reasonable way. I'm > not aware of any issue with it. It's, uh, waiting for review. > > I'm happy to expand on the message to describe something like design > choices, but the goal here is really simple: why should wide column > values escape the intention of the ring buffer? AFAICT it's fixing an > omission. If you have a question, please ask; that would help to > indicate what needs to be explained. Here is what I see in the commit message: DONE: ALTER, CLUSTER TODO: copyto, copyfrom? slot_getsomeattrs slot_deform_heap_tuple fetchatt heap_getnextslot => heapgettup => heapgetpage => ReadBufferExtended initscan table_beginscan table_scan_getnextslot RelationCopyStorageUsingBuffer ReadBufferWithoutRelcache (gdb) bt #0 table_open (relationId=relationId@entry=16390, lockmode=lockmode@entry=1) at table.c:40 #1 0x56444cb23d3c in toast_fetch_datum (attr=attr@entry=0x7f67933cc6cc) at detoast.c:372 #2 0x56444cb24217 in detoast_attr (attr=attr@entry=0x7f67933cc6cc) at detoast.c:123 #3 0x56444d07a4c8 in pg_detoast_datum_packed (datum=datum@entry=0x7f67933cc6cc) at fmgr.c:1743 #4 0x56444d042c8d in text_to_cstring (t=0x7f67933cc6cc) at varlena.c:224 #5 0x56444d0434f9 in textout (fcinfo=) at varlena.c:573 #6 0x56444d078f10 in FunctionCall1Coll (flinfo=flinfo@entry=0x56444e4706b0, collation=collation@entry=0, arg1=arg1@entry=140082828592844) at fmgr.c:1124 #7 0x56444d079d7f in OutputFunctionCall (flinfo=flinfo@entry=0x56444e4706b0, val=val@entry=140082828592844) at fmgr.c:1561 #8 0x56444ccb1665 in CopyOneRowTo (cstate=cstate@entry=0x56444e470898, slot=slot@entry=0x56444e396d20) at copyto.c:975 #9 0x56444ccb2c7d in DoCopyTo (cstate=cstate@entry=0x56444e470898) at copyto.c:891 #10 0x56444ccab4c2 in DoCopy (pstate=pstate@entry=0x56444e396bb0, stmt=stmt@entry=0x56444e3759b0, stmt_location=0, stmt_len=48, processed=processed@entry=0x7ffc212a6310) at copy.c:308 cluster: heapam_relation_copy_for_cluster reform_and_rewrite_tuple rewrite_heap_tuple raw_heap_insert This gave me an impression, that the patch is deeply WIP, and it doesn't make any sense to review it. I can imagine chances are good, that I'm not alone who get such impression, and you loose potential reviewers. Thus, shaping up a meaningful message might be helpful. > > Since it's in the performance category, I'm also curious how much > > overhead does this shave off? I mean, I get it that bulk insert strategy > > helps with buffers usage, as you've implied in the thread -- but how > > does it look like in benchmark numbers? > > The intent of using a bistate isn't to help the performance of the > process using the bistate. Rather, the intent is to avoid harming the > performance of other processes. If anything, I expect it could slow > down the process using bistate -- the same as for non-toast data. > > https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com Right, but the question is still there, how much does it bring? My point is that if you demonstrate "under this and that load, we get so and so many percents boost", this will hopefully attract more attention to the patch.
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
On Thu, Nov 21, 2024 at 10:22:54AM +1300, David Rowley wrote: > On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge > wrote: > > OK, I'm fine with this. v4 patch attached with one plan showing read, > > written, and dirtied buffers. > > I think this might be a good time for anyone out there who is against > turning on BUFFERS when ANALYZE is on to speak up. > > Votes for changing this so far seem to be: Me, Michael Christofides, > Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from > 2020) [1]. Please add me to the above list. What we have now is too confusing. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Fix an error while building test_radixtree.c with TEST_SHARED_RT
On 19/11/2024 01:20, Masahiko Sawada wrote: > I realized that building test_radixtree.c with TEST_SHARED_RT fails > because it eventually sets RT_SHMEM when #include'ing radixtree.h but > it's missing some header files to include. I've attached a patch to > include necessary header files in radixtree.h to make it > self-contained. If those includes are only needed when RT_SHMEM is defined, I suggest they should be guarded by #ifdef RT_SHMEM, per Peter E's IWYU efforts lately. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)
Accessing other session's temp table
Hi hackers, I was experimenting with temporary tables and noticed many odd things with them. Short story: Having appropriate privileges, user can access other session's temp tables and it is a complete mess. Let's fix it. Longer story. Let's open two sessions for one postgres user. In the first one we will create a temporary table: postgres=# create temp table t(val int); CREATE TABLE postgres=# \dt List of relations Schema | Name | Type | Owner ---+--+---+-- pg_temp_0 | t| table | postgres (1 row) Now, in the second session we will try to insert couple of rows into this table: postgres=# insert into pg_temp_0.t values (1); INSERT 0 1 postgres=# insert into pg_temp_0.t values (2); ERROR: cannot access temporary tables of other sessions Wow! First row insertion succeeded, the second one - failed. But this is just the beginning. Now we will switch back to the first session, insert couple of rows from there and check, what do we have in the table: postgres=# insert into pg_temp_0.t values (3); INSERT 0 1 postgres=# insert into pg_temp_0.t values (4); INSERT 0 1 postgres=# select * from pg_temp_0.t; val - 3 4 (2 rows) We only see the values added in this session. The row successfully inserted from the second session is absent. Now let's switch to the second session and check the table there: postgres=# select * from pg_temp_0.t; val - 1 (1 row) Wow again! In both sessions we only see the rows this very session inserted. Looks like very bad behavior to me. The same situation with DELETE and in fact with all commands handled in ExecModifyTable. By the way the second session does not have to be opened by the same user: it can be some other user, who was given privileges on the temp schema and table. The problem is that temp tables are not intended to be accessed by other sessions at all, and most of the work with them is done in local buffers with no disk syncing. There are already some checks and restrictions, but these are definitely not enough. Attached is a proposed patch fixing the above problems. Maybe relation_open is not the prettiest place for such checks, but it's quite obvious, it works and it covers all the remaining problems. And the check is very cheap so we should not be afraid of doing it often. What do you think? -- best regards, Mikhail A. Gribkov 001-patch-v01-Fix-other-sessions-temp-tables-access.patch Description: Binary data
Re: allow changing autovacuum_max_workers without restarting
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:not tested Hi, - Tested patch with check-world. - Verified CheckAutovacuumWorkerGUCs functionality and the correct WARNING was reported. - For feature specific testing, I created multiple tables and generated bloat. Expected behavior was witnessed. Lower autovacuum_worker_slots = 16 setting is better suited to start with. Thanks Yogesh
Re: Document NULL
Thank you for taking the time to look this over. On Wed, Nov 20, 2024 at 3:19 AM jian he wrote: > On Sat, Jun 29, 2024 at 4:40 AM David G. Johnston > wrote: > > > > The attached are complete and ready for review. I did some file > structure reformatting at the end and left that as the second patch. The > first contains all of the content. > > > > I'm adding this to the commitfest. > > > > Thanks! > > > > David J. > > in doc/src/sgml/nullvalues.sgml > can we mention > \pset null NULL > command, then NULL means this value is NULL. > > you can also see doc/src/sgml/func.sgml >(The above example can be copied-and-pasted >into psql to set things up for the following >examples. > Good idea. I'll see how it plays out. > - > in doc/src/sgml/nullvalues.sgml > see the attached for one example output > > in doc/src/sgml/nullvalues.sgml we have > one_whitespace > two_whitespace > three_whitespace > four_whitespace > > i think you need zero whitespace for tag . like > > > > https://tdg.docbook.org/tdg/4.5/programlisting > says whitespaces are significant. > Did you not apply patch 0002? The indentation in 0001 exists because it was much easier to deal with collapse-all related viewing in my editor. I removed it in 0002 since the final commit would indeed not be so indented. The tag itself doesn't actually care but its content does indeed get undesirably indented if the markup is nested in the typical manner. > <<>> >As noted in , JSON has a null > value >that does not get exposed at the SQL level. > <<>> > i feel like this sentence is weird. since these two are different things. > Yeah, the linked page and this summary/pointer need a bit of work. I don't like the unexplained "different concept" but probably "not exposed" isn't much better. As this gets closer to being committable I'll see about getting this topic cleaned up. Suggestions welcomed. > > I think some of the query and query output can be combined into one > . > no need one for the query, one for the output. > Trivial to change but having both seems more semantically correct and easier to read IMO. We don't have a policy document covering this that I'm aware of, and IIRC both variations presently exist in the documentation. David J.
Re: Sample rate added to pg_stat_statements
On Tue, Nov 19, 2024 at 5:07 PM Michael Paquier wrote: > One piece of it would be to see how much of such "bottlenecks" we > would be able to get rid of by integrating pg_stat_statements into > the central pgstats with the custom APIs, without pushing the module > into core. Any particular reason these days we cannot push this into core and allow disabling on startup? To say this extension is widely used would be an understatement. Cheers, Greg
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
On 11/20/24 14:40, Vitaly Davydov wrote: > Dear Hackers, > > > > To ping the topic, I'd like to clarify what may be wrong with the idea > described here, because I do not see any interest from the community. > The topic is related to physical replication. The primary idea is to > define the horizon of WAL segments (files) removal based on saved on > disk restart LSN values. Now, the WAL segment removal horizon is > calculated based on the current restart LSN values of slots, that can > not be saved on disk at the time of the horizon calculation. The case > take place when a slot is advancing during checkpoint as described > earlier in the topic. > Yeah, a simple way to fix this might be to make sure we don't use the required LSN value set after CheckPointReplicationSlots() to remove WAL. AFAICS the problem is KeepLogSeg() gets the new LSN value by: keep = XLogGetReplicationSlotMinimumLSN(); Let's say we get the LSN before calling CheckPointGuts(), and then pass it to KeepLogSeg, so that it doesn't need to get the fresh value. Wouldn't that fix the issue? > > Such behaviour is not a problem when slots are used only for physical > replication in a conventional way. But it may be a problem when physical > slot is used for some other goals. For example, I have an extension > which keeps the WAL using physical replication slots. It creates a new > physical slot and advances it as needed. After restart, it can use > restart lsn of the slot to read WAL from this LSN. In this case, there > is no guarantee that restart lsn will point to an existing WAL segment. > Yeah. > > The advantage of the current behaviour is that it requires a little bit > less WAL to keep. The disadvantage is that physical slots do not > guarantee WAL keeping starting from its' restart lsns in general. > If it's wrong, it doesn't really matter it has some advantages. regards -- Tomas Vondra
RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
Anymore feedback on this patch? Hoping this is a straightforward one. Raghuveer > -Original Message- > From: Devulapalli, Raghuveer > Sent: Friday, November 8, 2024 11:05 AM > To: Devulapalli, Raghuveer ; Nathan Bossart > > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash > > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C > > V3: With the suggested changes. > > Raghuveer > > > -Original Message- > > From: Devulapalli, Raghuveer > > Sent: Friday, November 8, 2024 10:43 AM > > To: Nathan Bossart > > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash > > > > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C > > > > > I believe we expect MSVC builds to use meson at this point, which is > > > probably why there's this extra check: > > > > > > if cc.get_id() == 'msvc' > > > cdata.set('USE_SSE42_CRC32C', false) > > > cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1) > > > have_optimized_crc = true > > > > > > > Ah, I missed this. This makes sense. > >
Re: EphemeralNamedRelation and materialized view
Yugo NAGATA writes: >> You could even argue that case 2 isn't good enough either, >> and we should be delivering a specific error message saying >> that an ENR can't be used in a view/matview. To do that, >> we'd likely need to pass down the QueryEnvironment in more >> places not fewer. > We can raise a similar error for (not materialized) views by passing > QueryEnv to DefineView() (or in ealier stage) , but there are other > objects that can contain ENR in their definition, for examle, functions, > cursor, or RLS policies. Is it worth introducing this version of error > message for all these objects? If it's worth checking for here, why not in other cases? I'm not sure I like using isQueryUsingTempRelation as a model, because its existing use in transformCreateTableAsStmt seems like mostly a hack. (And I definitely don't love introducing yet another scan of the query.) It seems to me that we should think about this, for MVs as well as those other object types, as fundamentally a dependency problem. That is, the reason we can't allow a reference to an ENR in a long-lived object is that we have no catalog representation for the reference. So that leads to thinking that the issue ought to be handled in recordDependencyOnExpr and friends. If we see an ENR while scanning a rangetable to extract dependencies, then complain. This might be a bit messy to produce good error messages for, though. Speaking of error messages, I'm not sure that it's okay to use the phrase "ephemeral named relation" in a user-facing error message. We don't use that term in our documentation AFAICS, except in some SPI documentation that most users will never have read. In the context of triggers, "transition relation" seems to be what the docs use. regards, tom lane
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
Le mer. 20 nov. 2024 à 16:51, Greg Sabino Mullane a écrit : > On Wed, Nov 20, 2024 at 1:26 AM Guillaume Lelarge > wrote: > >> OK, but I'm not sure which example I should pick to add dirtied and >> written shared buffers. It looks kinda artificial. Should I choose one >> randomly? >> > > It will be artificial, but I think that's ok: anyone running it on their > own will be getting different numbers anyway. I was looking at the "14.1.2 > EXPLAIN ANALYZE" section in perform.sgml. Here's some actual numbers I got > with some playing around with concurrent updates: > > Recheck Cond: (unique1 < 10) >> Heap Blocks: exact=5 >> Buffers: shared hit=2 read=5 written=4 > > ... > >> Planning: >>Buffers: shared hit=289 dirtied=9 > > > OK, I'm fine with this. v4 patch attached with one plan showing read, written, and dirtied buffers. Thanks for all your comments/reviews. -- Guillaume. From 484779613694f777e2ffb815a16aa11269404c20 Mon Sep 17 00:00:00 2001 From: Guillaume Lelarge Date: Tue, 12 Nov 2024 21:59:14 +0100 Subject: [PATCH v4] Enable BUFFERS by default with EXPLAIN ANALYZE This automatically enables the BUFFER option when ANALYZE is used with EXPLAIN. The BUFFER option stays disabled if ANALYZE isn't used. The auto_explain extension gets the same treatment. --- contrib/auto_explain/auto_explain.c | 4 +- doc/src/sgml/auto-explain.sgml| 2 +- doc/src/sgml/perform.sgml | 42 ++--- doc/src/sgml/ref/explain.sgml | 6 +- src/backend/commands/explain.c| 7 ++ src/test/regress/expected/brin_multi.out | 18 ++-- src/test/regress/expected/explain.out | 36 ++- .../regress/expected/incremental_sort.out | 4 +- src/test/regress/expected/memoize.out | 2 +- src/test/regress/expected/merge.out | 2 +- src/test/regress/expected/partition_prune.out | 94 +-- src/test/regress/expected/select.out | 2 +- src/test/regress/expected/select_into.out | 4 +- src/test/regress/expected/select_parallel.out | 6 +- src/test/regress/expected/subselect.out | 2 +- src/test/regress/expected/tidscan.out | 6 +- src/test/regress/sql/brin_multi.sql | 18 ++-- src/test/regress/sql/explain.sql | 7 +- src/test/regress/sql/incremental_sort.sql | 4 +- src/test/regress/sql/memoize.sql | 2 +- src/test/regress/sql/merge.sql| 2 +- src/test/regress/sql/partition_prune.sql | 94 +-- src/test/regress/sql/select.sql | 2 +- src/test/regress/sql/select_into.sql | 4 +- src/test/regress/sql/select_parallel.sql | 6 +- src/test/regress/sql/subselect.sql| 2 +- src/test/regress/sql/tidscan.sql | 6 +- 27 files changed, 220 insertions(+), 164 deletions(-) diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index 623a674f99..484e009577 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -27,7 +27,7 @@ static int auto_explain_log_min_duration = -1; /* msec or -1 */ static int auto_explain_log_parameter_max_length = -1; /* bytes or -1 */ static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; -static bool auto_explain_log_buffers = false; +static bool auto_explain_log_buffers = true; static bool auto_explain_log_wal = false; static bool auto_explain_log_triggers = false; static bool auto_explain_log_timing = true; @@ -152,7 +152,7 @@ _PG_init(void) "Log buffers usage.", NULL, &auto_explain_log_buffers, - false, + true, PGC_SUSET, 0, NULL, diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml index 0c4656ee30..6623d36c99 100644 --- a/doc/src/sgml/auto-explain.sgml +++ b/doc/src/sgml/auto-explain.sgml @@ -122,7 +122,7 @@ LOAD 'auto_explain'; equivalent to the BUFFERS option of EXPLAIN. This parameter has no effect unless auto_explain.log_analyze is enabled. - This parameter is off by default. + This parameter is on by default. Only superusers can change this setting. diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index cd12b9ce48..24916b82b5 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -722,13 +722,19 @@ WHERE t1.unique1 < 10 AND t1.unique2 = t2.unique2; QUERY PLAN ---&zwsp;-- Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1) + Buffers: shared hit=36 read=6 -> Bitmap Heap Scan on tenk1 t1 (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1)
Re: pg_dump --no-comments confusion
On Mon, Nov 18, 2024 at 05:14:53PM -0500, Tom Lane wrote: > Marcos Pegoraro writes: > > But it would be good to have this patch applied to all supported versions, > > as soon as nothing was changed on that pg_dump option, no ? > > Even more to the point, should we change pg_dump's help output? > > ... > --load-via-partition-rootload partitions via the root table > --no-commentsdo not dump comments > --no-publicationsdo not dump publications > ... > > Also, the identical text appears in pg_dumpall's man page and help > output, while pg_restore has a differently worded version: > > printf(_(" --no-commentsdo not restore comments\n")); > > pg_restore's man page seems OK though: > > Do not output commands to restore comments, even if the archive > contains them. Fixed in the attached applied patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml index 4d7c0464687..014f2792589 100644 --- a/doc/src/sgml/ref/pg_dumpall.sgml +++ b/doc/src/sgml/ref/pg_dumpall.sgml @@ -417,7 +417,7 @@ exclude database PATTERN --no-comments -Do not dump comments. +Do not dump COMMENT commands. diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index a8c141b689d..c30aafbe70d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1212,7 +1212,7 @@ help(const char *progname) " servers matching PATTERN\n")); printf(_(" --insertsdump data as INSERT commands, rather than COPY\n")); printf(_(" --load-via-partition-rootload partitions via the root table\n")); - printf(_(" --no-commentsdo not dump comments\n")); + printf(_(" --no-commentsdo not dump comment commands\n")); printf(_(" --no-publicationsdo not dump publications\n")); printf(_(" --no-security-labels do not dump security label assignments\n")); printf(_(" --no-subscriptions do not dump subscriptions\n")); diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c index e3ad8fb2956..9a04e51c81a 100644 --- a/src/bin/pg_dump/pg_dumpall.c +++ b/src/bin/pg_dump/pg_dumpall.c @@ -662,7 +662,7 @@ help(void) printf(_(" --if-exists use IF EXISTS when dropping objects\n")); printf(_(" --insertsdump data as INSERT commands, rather than COPY\n")); printf(_(" --load-via-partition-rootload partitions via the root table\n")); - printf(_(" --no-commentsdo not dump comments\n")); + printf(_(" --no-commentsdo not dump comment commands\n")); printf(_(" --no-publicationsdo not dump publications\n")); printf(_(" --no-role-passwords do not dump passwords for roles\n")); printf(_(" --no-security-labels do not dump security label assignments\n")); diff --git a/src/bin/pg_dump/pg_restore.c b/src/bin/pg_dump/pg_restore.c index f2c1020d053..10da7d27da8 100644 --- a/src/bin/pg_dump/pg_restore.c +++ b/src/bin/pg_dump/pg_restore.c @@ -484,7 +484,7 @@ usage(const char *progname) printf(_(" --filter=FILENAMErestore or skip objects based on expressions\n" " in FILENAME\n")); printf(_(" --if-exists use IF EXISTS when dropping objects\n")); - printf(_(" --no-commentsdo not restore comments\n")); + printf(_(" --no-commentsdo not restore comment commands\n")); printf(_(" --no-data-for-failed-tables do not restore data of tables that could not be\n" " created\n")); printf(_(" --no-publicationsdo not restore publications\n"));
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
On 20/11/2024 22:22, David Rowley wrote: On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge wrote: OK, I'm fine with this. v4 patch attached with one plan showing read, written, and dirtied buffers. I think this might be a good time for anyone out there who is against turning on BUFFERS when ANALYZE is on to speak up. Votes for changing this so far seem to be: Me, Michael Christofides, Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from 2020) [1]. Add me to the pro list, please. https://www.postgresql.org/message-id/b3197ba8-225f-f53c-326d-5b1756c77...@postgresfriends.org -- Vik Fearing
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Hi Peter, On 2024-11-20 04:06, Peter Geoghegan wrote: Hi Masahiro, On Tue, Nov 19, 2024 at 3:30 AM Masahiro Ikeda wrote: Apologies for the delayed response. I've confirmed that the costing is significantly improved for multicolumn indexes in the case I provided. Thanks! https://www.postgresql.org/message-id/TYWPR01MB10982A413E0EC4088E78C0E11B1A62%40TYWPR01MB10982.jpnprd01.prod.outlook.com Great! I made it one of my private/internal test cases for the costing. Your test case was quite helpful. Attached is v15. It works through your feedback. Importantly, v15 has a new patch which has a fix for your test.sql case -- which is the most important outstanding problem with the patch (and has been for a long time now). I've broken those changes out into a separate patch because they're still experimental, and have some known minor bugs. But it works well enough for you to assess how close I am to satisfactorily fixing the known regressions, so it seems worth posting quickly. Thanks for your quick response! IIUC, why not add it to the documentation? It would clearly help users understand how to tune their queries using the counter, and it would also show that the counter is not just for developers. The documentation definitely needs more work. I have a personal TODO item about that. Changes to the documentation can be surprisingly contentious, so I often work on it last, when we have the clearest picture of how to talk about the feature. For example, Matthias said something that's approximately the opposite of what you said about it (though I agree with you about it). OK, I understood. From the perspective of consistency, wouldn't it be better to align the naming between the EXPLAIN output and pg_stat_all_indexes.idx_scan, even though the documentation states they refer to the same concept? I personally prefer something like "search" instead of "scan", as "scan" is commonly associated with node names like Index Scan and similar terms. To maintain consistency, how about renaming pg_stat_all_indexes.idx_scan to pg_stat_all_indexes.idx_search? I suspect that other hackers will reject that proposal on compatibility grounds, even though it would make sense in a "green field" situation. Honestly, discussions about UI/UX details such as EXPLAIN ANALYZE always tend to result in unproductive bikeshedding. What I really want is something that will be acceptable to all parties. I don't have any strong opinions of my own about it -- I just think that it's important to show *something* like "Index Searches: N" to make skip scan user friendly. OK, I agree. Although it's not an optimal solution and would only reduce the degree of performance degradation, how about introducing a threshold per page to switch from skip scan to full index scan? The approach to fixing these regressions from the new experimental patch doesn't need to use any such threshold. It is effective both with simple "WHERE id2 = 100" cases (like the queries from your test.sql test case), as well as more complicated "WHERE id2 BETWEEN 99 AND 101" inequality cases. What do you think? The regressions are easily under 5% with the new patch applied, which is in the noise. I didn't come up with the idea. At first glance, your idea seems good for all cases. Actually, test.sql shows a performance improvement, and the performance is almost the same as the master's seqscan. To be precise, the master's performance is 10-20% better than the v15 patch because the seqscan is executed in parallel. However, the v15 patch is twice as fast as when seqscan is not executed in parallel. However, I found that there is still a problematic case when I read your patch. IIUC, beyondskip becomes true only if the tuple's id2 is greater than the scan key value. Therefore, the following query (see test_for_v15.sql) still degrades. -- build without '--enable-debug' '--enable-cassert' 'CFLAGS=-O0 ' -- SET skipscan_prefix_cols=32; Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1 width=8) (actual time=65.767..65.770 rows=1 loops=1) Output: id1, id2 Index Cond: (t.id2 = 100) Index Searches: 1 Heap Fetches: 0 Buffers: shared hit=2736 Settings: effective_cache_size = '7932MB', work_mem = '15MB' Planning Time: 0.058 ms Execution Time: 65.782 ms (9 rows) -- SET skipscan_prefix_cols=0; Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1 width=8) (actual time=17.276..17.278 rows=1 loops=1) Output: id1, id2 Index Cond: (t.id2 = 100) Index Searches: 1 Heap Fetches: 0 Buffers: shared hit=2736 Settings: effective_cache_size = '7932MB', work_mem = '15MB' Planning Time: 0.044 ms Execution Time: 17.290 ms (9 rows) I’m reporting the above result, though you might already be aware of the issue. At the same time, we're just as capable of skipping whenever the scan encounters a large group of skipped-prefix-column duplicates. For example, if I take your test.sql test case and ad
Re: allow changing autovacuum_max_workers without restarting
rebased -- nathan >From e877271830e076338f999ee72b9d8148e469d5d2 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Sat, 22 Jun 2024 15:05:44 -0500 Subject: [PATCH v10 1/1] allow changing autovacuum_max_workers without restarting --- doc/src/sgml/config.sgml | 28 ++- doc/src/sgml/runtime.sgml | 4 +- src/backend/access/transam/xlog.c | 2 +- src/backend/postmaster/autovacuum.c | 76 +++ src/backend/postmaster/pmchild.c | 4 +- src/backend/storage/lmgr/proc.c | 6 +- src/backend/utils/init/postinit.c | 6 +- src/backend/utils/misc/guc_tables.c | 11 ++- src/backend/utils/misc/postgresql.conf.sample | 3 +- src/include/postmaster/autovacuum.h | 1 + src/test/perl/PostgreSQL/Test/Cluster.pm | 1 + 11 files changed, 112 insertions(+), 30 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index a84e60c09b..7db171198a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8590,6 +8590,25 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + autovacuum_worker_slots (integer) + + autovacuum_worker_slots configuration parameter + + + + +Specifies the number of backend slots to reserve for autovacuum worker +processes. The default is 16. This parameter can only be set at server +start. + + +When changing this value, consider also adjusting +. + + + + autovacuum_max_workers (integer) @@ -8600,7 +8619,14 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default -is three. This parameter can only be set at server start. +is three. This parameter can only be set in the +postgresql.conf file or on the server command line. + + +Note that a setting for this value which is higher than + will have no effect, +since autovacuum workers are taken from the pool of slots established +by that setting. diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index bcd81e2415..8b7ae27908 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -839,7 +839,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (), allowed autovacuum worker process -(), allowed WAL sender process +(), allowed WAL sender process (), allowed background process (), etc., in sets of 16. The runtime-computed parameter @@ -892,7 +892,7 @@ $ postgres -D $PGDATA -C num_os_semaphores When using POSIX semaphores, the number of semaphores needed is the same as for System V, that is one semaphore per allowed connection (), allowed autovacuum worker process -(), allowed WAL sender process +(), allowed WAL sender process (), allowed background process (), etc. On the platforms where this option is preferred, there is no specific diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6f58412bca..706f2127de 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5403,7 +5403,7 @@ CheckRequiredParameterValues(void) */ if (ArchiveRecoveryRequested && EnableHotStandby) { - /* We ignore autovacuum_max_workers when we make this test. */ + /* We ignore autovacuum_worker_slots when we make this test. */ RecoveryRequiresIntParameter("max_connections", MaxConnections, ControlFile->MaxConnections); diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index dc3cf87aba..963924cbc7 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -115,6 +115,7 @@ * GUC parameters */ bool autovacuum_start_daemon = false; +intautovacuum_worker_slots; intautovacuum_max_workers; intautovacuum_work_mem = -1; intautovacuum_naptime; @@ -210,7 +211,7 @@ typedef struct autovac_table /*- * This struct holds information about a single worker's whereabouts. We keep * an array of these in shared memory, sized according to - * autovacuum_max_workers. + * autovacuum_worker_slots. * * wi_linksentry into free list or running list * wi_dboidOID of
Re: meson and check-tests
Hi, On Tue, 12 Nov 2024 at 18:13, jian he wrote: > > also played around with v5-0002, works fine. > overall, v5-0001 and v5-0002 works as i expected. Thanks for checking it! -- Regards, Nazir Bilal Yavuz Microsoft
Re: Sample rate added to pg_stat_statements
On Tue, Nov 19, 2024 at 7:12 AM Andrey M. Borodin wrote: > +1 for the idea. I heard a lot of complaints about that pgss is costly. > Most of them were using it wrong though. I'm curious what "using it wrong" means exactly? Oh, and a +1 in general to the patch, OP, although it would also be nice to start finding the bottlenecks that cause such performance issues. Cheers, Greg
Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin
> On 20 Nov 2024, at 13:36, Tomas Vondra wrote: > On 11/19/24 17:44, Sanjay Khatri wrote: >> And we dont have any data backup too. This will be your first priority once the system is up and running again, maybe even to the point of planning for what immediate action to take before the system boots to be able to act quickly in case it falls over again. > It's very unlikely a Postgres issue would make the whole system crash, > particularly in a way that prevents it from booting again. Agreed, especially since this crash reportedly happened after uninstalling postgres. If the uninstaller could corrupt the OS like that I have a feeling we'd heard about it by now. > I don't think > anyone will be able to help you without more info - you need to make it > boot again, inspect the Postgres logs, etc. I'm not familiar with Dell blade enclosures but if there is a similar system to the iLO Lights Out console that HP ships I would start there (a quick Google search hints at ePSA). -- Daniel Gustafsson
Re: cannot freeze committed xmax
> On 28 Oct 2020, at 21:21, Mark Dilger wrote: > > The other possibillity is that this tuple is erroneously marked as > HEAP_UPDATED. heap_update() sets that, which makes sense. > rewrite_heap_tuple() copies the old tuple's bits to the new tuple and then > does some work to resolve update chains. I guess you could look at whether > that logic might leave things in an invalid state. I don't have any theory > about that. Hi Mark and Konstantin! Recently (Oct 15, 2024) I've observed this kind of problem on one of our production clusters: an old tuple version had come to life. # select ctid,* from skipped where ctid = '(16488,13)' or ctid = '(21597,16)'; -[ RECORD 1 ]+- ctid | (16488,13) id | 1121skipped date_created | 2023-08-16 03:31:36.306466+03 date_updated | 2023-08-16 03:31:36.306481+03 -[ RECORD 2 ]+- ctid | (21597,16) id | 1121skipped date_created | 2023-08-16 03:31:36.306466+03 date_updated | 2024-09-06 14:10:47.926007+03 Freezing was failing with "cannot freeze committed xmax". xmax of old version == xmin of new one. I have no idea what is semantics of date_created and date_updated. The server was running vanilla 14 regularly updated. I suggested delete from skipped where ctid = '(16488,13)'; and it worked, monitoring issue was resolved. I found no other problems in logs, heapcheck, amcheck etc. And found no violations of id uniqueness. What was bothering me was that amcheck verify_heap() was quite about this table. Let's add some checks to detect such conditions? PFA the patch doing so. Thanks! Best regards, Andrey Borodin. 0001-Detect-hintbit-contradictions-to-commit-log.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, 19 Nov 2024 at 12:43, Nisha Moond wrote: > > Attached is the v49 patch set: > - Fixed the bug reported in [1]. > - Addressed comments in [2] and [3]. > > I've split the patch into two, implementing the suggested idea in > comment #5 of [2] separately in 001: > > Patch-001: Adds additional error reports (for all invalidation types) > in ReplicationSlotAcquire() for invalid slots when error_if_invalid = > true. > Patch-002: The original patch with comments addressed. This Assert can fail: + /* +* Check if the slot needs to be invalidated due to +* replication_slot_inactive_timeout GUC. +*/ + if (now && + TimestampDifferenceExceeds(s->inactive_since, now, + replication_slot_inactive_timeout_sec * 1000)) + { + invalidation_cause = cause; + inactive_since = s->inactive_since; + + /* +* Invalidation due to inactive timeout implies that +* no one is using the slot. +*/ + Assert(s->active_pid == 0); With the following scenario: Set replication_slot_inactive_timeout to 10 seconds -- Create a slot postgres=# select pg_create_logical_replication_slot ('test', 'pgoutput', true, true); pg_create_logical_replication_slot (test,0/1748068) (1 row) -- Wait for 10 seconds and execute checkpoint postgres=# checkpoint; WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly The assert fails: #5 0x5b074f0c922f in ExceptionalCondition (conditionName=0x5b074f2f0b4c "s->active_pid == 0", fileName=0x5b074f2f0010 "slot.c", lineNumber=1762) at assert.c:66 #6 0x5b074ee26ead in InvalidatePossiblyObsoleteSlot (cause=RS_INVAL_INACTIVE_TIMEOUT, s=0x740925361780, oldestLSN=0, dboid=0, snapshotConflictHorizon=0, invalidated=0x7fffaee87e63) at slot.c:1762 #7 0x5b074ee273b2 in InvalidateObsoleteReplicationSlots (cause=RS_INVAL_INACTIVE_TIMEOUT, oldestSegno=0, dboid=0, snapshotConflictHorizon=0) at slot.c:1952 #8 0x5b074ee27678 in CheckPointReplicationSlots (is_shutdown=false) at slot.c:2061 #9 0x5b074e9dfda7 in CheckPointGuts (checkPointRedo=24412528, flags=108) at xlog.c:7513 #10 0x5b074e9df4ad in CreateCheckPoint (flags=108) at xlog.c:7179 #11 0x5b074edc6bfc in CheckpointerMain (startup_data=0x0, startup_data_len=0) at checkpointer.c:463 Regards, Vignesh
Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin
Dear, Last week I installed Postgres 15.8 on my Windows server 2016, I installed the pgadmin thats comes packed with postgresql installer. I already have postgres 12.1 with pgAdmin installed on my server. After installing Postgres 15.8, the respective pgAdmin was not working, it showed error 'Postgres server could not be contacted' But the old pgAdmin(of Postgres12) was working fine, but was not displaying the postgres15 server on server list on pgAdmin. But, I wanted to work on the new pgAdmin (which was installed with postgres15). So I tried to find the solution on internet, and I got the solution. It was just to delete the pgAdmin.bak file from 'C\Users\Username\AppData\Roaming\pgAdmin\' And I did that and It worked, the bew PgAdmin started to work, it also displayed the Postgres15 connection se server list. But after 2 hours, my Windows Server 2016 crashed, and I am not able to boot my system. So on hard restart the system and it restarted well. And I quickly uninstalled the Postgres15 with its pgadmin from the machine. But the system crashed again after uninstalling the postgres 15. Now I am not able to boot the system. Its been 2 days the system is not able to boot. The system is on dell iDrac PowerEdge M6300. Is there any solution for this, I can try the solution if you suggest. We cannot format the Storage as our data is within the hardisk (Raid) type. And we dont have any data backup too. Please help if possible
Re: RFC: Additional Directory for Extensions
On 18.11.24 20:19, David E. Wheeler wrote: - The biggest problem is that many extensions set in their control file module_pathname = '$libdir/foo' This disables the use of dynamic_library_path, so this whole idea of installing an extension elsewhere won't work that way. The obvious solution is that extensions change this to just 'foo'. But this will require a lot updating work for many extensions, or a lot of patching by packagers. Yeah, '$libdir/foo' has been the documented way to do it for quite some time, as I recall. Perhaps the behavior of the MODULE_PATHNAME replacement function could be changed to omit $libdir when writing the SQL files? Elsewhere you write: Nothing changes about shared library files. They are looked up in dynamic_library_path or any hardcoded file name. And also point out that the way to install them is: ``` make install datadir=/else/where/share pkglibdir=/else/where/lib ``` So as long as dynamic_library_path includes /else/where/lib it should work, just as before, no? The path is only consulted if the specified name does not contain a slash. So if you do LOAD 'foo', the path is consulted, but if you do LOAD '$libdir/foo', it is not. The problem I'm describing is that most extensions use the latter style, per current recommendation in the documentation.
Re: pg_prepared_xacts returns transactions that are foreign to the caller
> On 20 Nov 2024, at 12:48, Vladimir Sitnikov > wrote: > > "select * from pg_prepared_xacts" might produce transactions created by a > different user, so the caller won't be able to issue "commit prepared". > > I think there should be a view that returns only the transactions that the > caller can commit or rollback. > Is it something that can be implemented at the backend? Like "select * from pg_prepared_xacts where pg_has_role(current_user, owner, 'member');"? Best regards, Andrey Borodin.
Re: memory leak in pgoutput
> You should be more careful with the amount of tests you are doing > here. This fails while waiting for some changes to be streamed when > creating a subscription: > cd src/test/subscription/ && PROVE_TESTS=t/017_stream_ddl.pl make check I apologize for the obvious error in the previous patch. I have corrected it in the new patch(v3) and pass the regression testing. v3-0001-Fix-memory-leak-in-pgoutput.patch Description: v3-0001-Fix-memory-leak-in-pgoutput.patch
Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin
This seems like a combination of (at least) two independent issues - the pgadmin issue and the crash. On 11/19/24 17:44, Sanjay Khatri wrote: > Dear, > Last week I installed Postgres 15.8 on my Windows server 2016, I > installed the pgadmin thats comes packed with postgresql installer. > I already have postgres 12.1 with pgAdmin installed on my server. > After installing Postgres 15.8, the respective pgAdmin was not working, > it showed error 'Postgres server could not be contacted' > But the old pgAdmin(of Postgres12) was working fine, but was not > displaying the postgres15 server on server list on pgAdmin. I don't use pgadmin (or Windows) so I don't have any ideas why it might be having these connection problems, and I'm not sure I quite understand what you did. But it seems the new 15.8 server simply was not running, that would explain all the symptoms. > But, I wanted to work on the new pgAdmin (which was installed with > postgres15). So I tried to find the solution on internet, and I got the > solution. It was just to delete the pgAdmin.bak file from > 'C\Users\Username\AppData\Roaming\pgAdmin\' > And I did that and It worked, the bew PgAdmin started to work, it also > displayed the Postgres15 connection se server list. That is weird, TBH. No idea why deleting a .bak file would matter. > But after 2 hours, my Windows Server 2016 crashed, and I am not able to > boot my system. So on hard restart the system and it restarted well. And > I quickly uninstalled the Postgres15 with its pgadmin from the machine. > But the system crashed again after uninstalling the postgres 15. Now I > am not able to boot the system. Its been 2 days the system is not able > to boot. > The system is on dell iDrac PowerEdge M6300. > Is there any solution for this, I can try the solution if you suggest. > We cannot format the Storage as our data is within the hardisk (Raid) > type. And we dont have any data backup too. > Please help if possible It's very unlikely a Postgres issue would make the whole system crash, particularly in a way that prevents it from booting again. I don't think anyone will be able to help you without more info - you need to make it boot again, inspect the Postgres logs, etc. regards -- Tomas Vondra
Re: ALTER TABLE uses a bistate but not for toast tables
On Tue, Nov 19, 2024 at 03:45:19PM +0100, Dmitry Dolgov wrote: > > On Mon, Jul 15, 2024 at 03:43:24PM GMT, Justin Pryzby wrote: > > @cfbot: rebased > > Thanks for rebasing. To help with review, could you also describe > current status of the patch? I have to admit, currently the commit > message doesn't tell much, and looks more like notes for the future you. The patch does what it aims to do and AFAIK in a reasonable way. I'm not aware of any issue with it. It's, uh, waiting for review. I'm happy to expand on the message to describe something like design choices, but the goal here is really simple: why should wide column values escape the intention of the ring buffer? AFAICT it's fixing an omission. If you have a question, please ask; that would help to indicate what needs to be explained. > The patch numbering is somewhat confusing as well, should it be v5 now? The filename was 0001-WIP-use-BulkInsertState-for-toast-tuples-too.patch. I guess you're referring to the previous filename: v4-*. That shouldn't be so confusing -- I just didn't specify a version, either by choice or by omission. > From what I understand, the new patch does address the review feedback, > but you want to do more, something with copy to / copy from? If I were to do more, it'd be for a future patch, if the current patch were to ever progress. > Since it's in the performance category, I'm also curious how much > overhead does this shave off? I mean, I get it that bulk insert strategy > helps with buffers usage, as you've implied in the thread -- but how > does it look like in benchmark numbers? The intent of using a bistate isn't to help the performance of the process using the bistate. Rather, the intent is to avoid harming the performance of other processes. If anything, I expect it could slow down the process using bistate -- the same as for non-toast data. https://www.postgresql.org/message-id/CA%2BTgmobC6RD2N8kbPPTvATpUY1kisY2wJLh2jsg%3DHGoCp2RiXw%40mail.gmail.com -- Justin
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
> On 15 Nov 2024, at 21:33, Peter Geoghegan wrote: > > Attached patch teaches btree_xlog_vacuum, nbtree VACUUM's REDO > routine, to reset the target page's opaque->btpo_cycleid to 0. This > makes the REDO routine match original execution, which seems like a > good idea on consistency grounds. > > I propose this for the master branch only. The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion. I cannot say anything about beauty of resetting or not resetting the field. I'd suggest renaming the field into something like "btpo_split_vaccycleid". I was aware of index vacuum backtracking, but it took me a while to build context again. Best regards, Andrey Borodin.
RE: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
On Tuesday, November 19, 2024 9:42 PM Shlok Kyal wrote: > I agree that we can remove the test. I debugged and found the test modified in > above patch does not hit the condition added in commit adedf54. > Also, according to me we cannot trigger the bug after the fix in this thread. > So, I > think we can remove the testcase. > > I have attached the latest patch with an updated commit message and also > removed the testcase. Thanks for updating the patch. Out of curiosity, I tried to merge the pub_collist_contains_invalid_column() and replident_has_unpublished_gen_col() functions into a single function to evaluate if this approach might be better. As shown in the attached diff, it reduces some code redundancy *but* also introduces additional IF conditions and parameters in the new combined function, which might complicate the logic a bit. Personally, I think the existing V10 patch looks good and clear. I am just sharing the diff for feedback in case others want to have a look. Best Regards, Hou zj From 8640a4605f32f8072d093902da0af23dfa6d119b Mon Sep 17 00:00:00 2001 From: Hou Zhijie Date: Wed, 20 Nov 2024 16:34:04 +0800 Subject: [PATCH v10] combine functions --- src/backend/commands/publicationcmds.c | 219 ++--- src/backend/utils/cache/relcache.c | 38 ++--- src/include/commands/publicationcmds.h | 7 +- 3 files changed, 105 insertions(+), 159 deletions(-) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 053877c524..0d5daf7626 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -336,21 +336,36 @@ pub_rf_contains_invalid_column(Oid pubid, Relation relation, List *ancestors, } /* - * Check if all columns referenced in the REPLICA IDENTITY are covered by - * the column list. + * Check for invalid columns in the publication table definition. * - * Returns true if any replica identity column is not covered by column list. + * This function evaluates two conditions: + * + * 1. Ensures that all columns referenced in the REPLICA IDENTITY are covered + *by the column list. If any column is missing, *invalid_column_list is set + *to true. + * + * 2. Ensures that the REPLICA IDENTITY does not contain unpublished generated + *columns. If an unpublished generated column is found, + **unpublished_gen_col is set to true. + * + * Returns true if any of the above conditions are not met. */ bool -pub_collist_contains_invalid_column(Oid pubid, Relation relation, List *ancestors, - bool pubviaroot) +pub_contains_invalid_column(Oid pubid, Relation relation, List *ancestors, + bool pubviaroot, bool pubgencols, + bool *invalid_column_list, + bool *unpublished_gen_col) { - HeapTuple tuple; Oid relid = RelationGetRelid(relation); Oid publish_as_relid = RelationGetRelid(relation); - boolresult = false; - Datum datum; - boolisnull; + Bitmapset *idattrs; + Bitmapset *columns = NULL; + TupleDesc desc = RelationGetDescr(relation); + Publication *pub; + int x; + + *invalid_column_list = false; + *unpublished_gen_col = false; /* * For a partition, if pubviaroot is true, find the topmost ancestor that @@ -368,158 +383,90 @@ pub_collist_contains_invalid_column(Oid pubid, Relation relation, List *ancestor publish_as_relid = relid; } - tuple = SearchSysCache2(PUBLICATIONRELMAP, - ObjectIdGetDatum(publish_as_relid), - ObjectIdGetDatum(pubid)); - - if (!HeapTupleIsValid(tuple)) - return false; - - datum = SysCacheGetAttr(PUBLICATIONRELMAP, tuple, - Anum_pg_publication_rel_prattrs, - &isnull); + /* Fetch the column list */ + pub = GetPublication(pubid); + check_and_fetch_column_list(pub, publish_as_relid, NULL, &columns); - if (!isnull) + if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL) { - int x; - Bitmapset *idattrs; - Bitmapset *columns = NULL; - /* With REPLICA IDENTITY FULL, no column list is allowed. */ - if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL) - result = true; - - /* Transform the column list datum to a bitmapset. */ - columns = pub_collist_t
Re: Conflict detection for update_deleted in logical replication
On Thu, Nov 14, 2024 at 8:24 AM Zhijie Hou (Fujitsu) wrote: > > Attach the V9 patch set which addressed above comments. > Reviewed v9 patch-set and here are my comments for below changes: @@ -1175,10 +1189,29 @@ ApplyLauncherMain(Datum main_arg) long elapsed; if (!sub->enabled) + { + can_advance_xmin = false; + xmin = InvalidFullTransactionId; continue; + } LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); w = logicalrep_worker_find(sub->oid, InvalidOid, false); + + if (can_advance_xmin && w != NULL) + { + FullTransactionId nonremovable_xid; + + SpinLockAcquire(&w->relmutex); + nonremovable_xid = w->oldest_nonremovable_xid; + SpinLockRelease(&w->relmutex); + + if (!FullTransactionIdIsValid(xmin) || + !FullTransactionIdIsValid(nonremovable_xid) || + FullTransactionIdPrecedes(nonremovable_xid, xmin)) + xmin = nonremovable_xid; + } + 1) In Patch-0002, could you please add a comment above "+ if (can_advance_xmin && w != NULL)" to briefly explain the purpose of finding the minimum XID at this point? 2) In Patch-0004, with the addition of the 'detect_update_deleted' option, I see the following two issues in the above code: 2a) Currently, all enabled subscriptions are considered when comparing and finding the minimum XID, even if detect_update_deleted is disabled for some subscriptions. I suggest excluding the oldest_nonremovable_xid of subscriptions where detect_update_deleted=false by updating the check as follows: if (sub->detectupdatedeleted && can_advance_xmin && w != NULL) 2b) I understand why advancing xmin is not allowed when a subscription is disabled. However, the current check allows a disabled subscription with detect_update_deleted=false to block xmin advancement, which seems incorrect. Should the check also account for detect_update_deleted?, like : if (sub->detectupdatedeleted && !sub->enabled) + { + can_advance_xmin = false; + xmin = InvalidFullTransactionId; continue; + } However, I'm not sure if this is the right fix, as it could lead to inconsistencies if the detect_update_deleted is set to false after disabling the sub. Thoughts? -- Thanks, Nisha
Re: On non-Windows, hard depend on uselocale(3)
On Fri, Nov 15, 2024 at 1:53 AM Peter Eisentraut wrote: > On 14.11.24 08:48, Thomas Munro wrote: > > The three MinGW environments we test today are using ucrt, and > > configure detects the symbol on all. Namely: fairwren > > (msys2/mingw64), the CI mingw64 task and the mingw cross-build that > > runs on Linux in the CI CompilerWarnings task. As far as I know these > > are the reasons for, and mechanism by which, we keep MinGW support > > working. We have no policy requiring arbitrary old MinGW systems > > work, and we wouldn't know anyway. > > Right. So I think we could unwind this in steps. First, remove the > configure test for _configthreadlocale() and all the associated #ifdefs > in the existing ecpg code. This seems totally safe, it would just leave > behind MinGW older than 2016 and MSVC older than 2015, the latter of > which is already the current threshold. > > Then the question whether we want to re-enable the error checking on > _configthreadlocale() that was reverted by 2cf91ccb, or at least > something similar. This should also be okay based on your description > of the different Windows runtimes. I think it would also be good to do > this to make sure this works before we employ _configthreadlocale() in > higher-stakes situations. > > I suggest doing these two steps as separate patches, so this doesn't get > confused between the various thread-related threads that want to > variously add or remove uses of this function. OK, do you think these three patches tell the _configthreadlocale() story properly? (Then after that we can get back to getting rid of it...) 0001-Remove-configure-check-for-_configthreadlocale.patch Description: Binary data 0002-Formally-require-ucrt-on-Windows.patch Description: Binary data 0003-Revert-Blind-attempt-to-fix-_configthreadlocale-fail.patch Description: Binary data
Re: Enhancing Memory Context Statistics Reporting
Hi, To achieve both completeness and avoid writing to a file, I can consider > displaying the numbers for the remaining contexts as a cumulative total > at the end of the output. > > Something like follows: > ``` > postgres=# select * from pg_get_process_memory_contexts('237244', false); > name | ident > | type | path | total_bytes | tot > al_nblocks | free_bytes | free_chunks | used_bytes | pid > > ---++--+--+-+ > ---++-++ > TopMemoryContext | > | AllocSet | {0} | 97696 | > 5 | 14288 | 11 | 83408 | 237244 > search_path processing cache | > | AllocSet | {0,1}|8192 | > 1 | 5328 | 7 | 2864 | 237244 > > *Remaining contexts total: 23456 bytes (total_bytes) , > 12345(used_bytes), 11,111(free_bytes)* > ``` > Please find attached an updated patch with this change. The file previously used to store spilled statistics has been removed. Instead, a cumulative total of the remaining/spilled context statistics is now stored in the DSM segment, which is displayed as follows. postgres=# select * from pg_get_process_memory_contexts('352966', false); *name *| ident | type | path | *total_bytes* | *total_nblocks* | *free_bytes *| *free_chunks *| *used_bytes* | pi d --+---+--++-+---++-++ TopMemoryContext | | AllocSet | {0}| 97696 | 5 | 14288 | 11 | 83408 | 352 966 . . . MdSmgr | | AllocSet | {0,18} |8192 | 1 | 7424 | 0 |768 | 352 966 * Remaining Totals* | | || *1756016* | *188 *| *658584 *|* 132* | * 1097432 *| 352 966 (7129 rows) - I believe this serves as a good compromise between completeness and avoiding the overhead of file handling. However, I am open to reintroducing file handling if displaying the complete statistics of the remaining contexts prove to be more important. All the known bugs in the patch have been fixed. In summary, one DSA per PostgreSQL process is used to share the statistics of that process. A DSA is created by the first client backend that requests memory context statistics, and it is pinned for all future requests to that process. A handle to this DSA is shared between the client and the publishing process using fixed shared memory. The fixed shared memory consists of an array of size MaxBackends + auxiliary processes, indexed by procno. Each element in this array is less than 100 bytes in size. A PostgreSQL process uses a condition variable to signal a waiting client backend once it has finished publishing the statistics. If, for some reason, the signal is not sent, the waiting client backend will time out. When statistics of a local backend is requested, this function returns the following WARNING and exits, since this can be handled by an existing function which doesn't require a DSA. WARNING: cannot return statistics for local backend HINT: Use pg_get_backend_memory_contexts instead Looking forward to your review. Thank you, Rahila Syed v4-Function-to-report-memory-context-stats-of-a-process.patch Description: Binary data
Re: A way to build PSQL 17.1 source on AIX platform
> On 18 Nov 2024, at 08:03, Raghu Dev Ramaiah wrote: > Is there an alternate way I can build PSQL 17.1 source code on AIX platform? > Please let me know..thanks. As Tom has already answered in your other mail on this, PostgreSQL 17 does not support building on AIX. If PostgreSQL 18 is going to support AIX again there needs to be a significant effort from interested parties to make that happen. -- Daniel Gustafsson
Re: doc fail about ALTER TABLE ATTACH re. NO INHERIT
On Tue, Nov 19, 2024 at 6:41 PM Alvaro Herrera wrote: > On 2024-Nov-14, Amit Langote wrote: > > > Sorry, here's the full example. Note I'd changed both > > AddRelationNotNullConstraints() and AdjustNotNullInheritance() to not > > throw an error *if* the table is a leaf partition when the NO INHERIT > > of an existing constraint doesn't match that of the new constraint. > > > > create table p (a int not null) partition by list (a); > > create table p1 partition of p for values in (1); > > alter table p1 add constraint a_nn_ni not null a no inherit; > > Yeah, I think this behavior is bogus, because the user wants to have > something (a constraint that will not inherit) but they cannot have it, > because there is already a constraint that will inherit. The current > behavior of throwing an error seems correct to me. With your patch, > what this does is mark the constraint as "local" in addition to > inherited, but that'd be wrong, because the constraint the user wanted > is not of the same state. Yeah, just not throwing the error, as my patch did, is not enough. The patch didn't do anything to ensure that a separate constraint with the properties that the user entered will exist alongside the inherited one, but that's not possible given that the design only allows one NOT NULL constraint for a column as you've written below. > > On Wed, Nov 13, 2024 at 10:49 PM Alvaro Herrera > > wrote: > > > > I think: > > > * if a leaf partition already has an inherited not-null constraint > > > from its parent and we want to add another one, we should: > > > - if the one being added is NO INHERIT, then throw an error, because > > > we cannot merge them > > > > Hmm, we'll be doing something else for CHECK constraints if we apply my > > patch: > > > > create table p (a int not null, constraint check_a check (a > 0)) partition > > by list (a); > > create table p1 partition of p (constraint check_a check (a > 0) no > > inherit) for values in (1); > > NOTICE: merging constraint "check_a" with inherited definition > > > > \d p1 > > Table "public.p1" > > Column | Type | Collation | Nullable | Default > > +-+---+--+- > > a | integer | | not null | > > Partition of: p FOR VALUES IN (1) > > Check constraints: > > "check_a" CHECK (a > 0) NO INHERIT > > > > I thought we were fine with allowing merging of such child > > constraints, because leaf partitions can't have children to pass them > > onto, so the NO INHERIT clause is essentially pointless. > > I'm beginning to have second thoughts about this whole thing TBH, as it > feels inconsistent -- and unnecessary, if we get the patch to flip the > INHERIT/NO INHERIT flag of constraints. Ah, ok, I haven't looked at that patch, but I am happy to leave this alone. > > > - if the one being added is not NO INHERIT, then both constraints > > > would have the same state and so we silently do nothing. > > > > Maybe we should emit some kind of NOTICE message in such cases? > > Hmm, I'm not sure. It's not terribly useful, is it? I mean, if the > user wants to have a constraint, then whether the constraint is there or > not, the end result is the same and we needn't say anything about it. > > It's only if the constraint is not what they want (because of > mismatching INHERIT flag) that we throw some message. > > (I wonder if we'd throw an error if the proposed constraint has a > different _name_ from the existing constraint. If a DDL script is > expecting that the constraint will be named a certain way, then by > failing to throw an error we might be giving confusing expectations.) Here's an example that I think matches the above description, which, ISTM, leads to a state similar to what one would encounter with my patch as described earlier: create table parent (a int not null); create table child (a int, constraint a_not_null_child not null a) inherits (parent); NOTICE: merging column "a" with inherited definition \d+ child Table "public.child" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- a | integer | | not null | | plain | | | Not-null constraints: "a_not_null_child" NOT NULL "a" (local, inherited) Inherits: parent Access method: heap I think the inherited constraint should be named parent_a_not_null, but the constraint that gets stored is the user-specified constraint in `create table child`. Actually, even the automatically generated constraint name using the child table's name won't match the name of the inherited constraint: create table child (a int not null) inherits (parent); NOTICE: merging column "a" with inherited definition \d+ child Table "public.child" Column | Type | C
Re: CSN snapshots in hot standby
On Tue, Oct 29, 2024 at 11:34 PM Heikki Linnakangas wrote: > master patched > few-xacts: 0.0041 0.0041 s / iteration > many-xacts:0.0042 0.0042 s / iteration > many-xacts-wide-apart: 0.0043 0.0045 s / iteration Hi Heikki, I have some thoughts about behavior of the cache that might not be apparent in this test: The tree is only as tall as need be to store the highest non-zero byte. On a newly initialized cluster, the current txid is small. The first two test cases here will result in a tree with height of 2. The last one will have a height of 3, and its runtime looks a bit higher, although that could be just noise or touching more cache lines. It might be worth it to try a test run while forcing the upper byte of the keys to be non-zero (something like "key | (1<<30), so that the tree always has a height of 4. That would match real-world conditions more closely. If need be, there are a couple things we can do to optimize node dispatch and touch fewer cache lines. > I added two tests to the test suite: > master patched > insert-all-different-xids: 0.000270.00019 s / iteration > insert-all-different-subxids: 0.000230.00020 s / iteration > The point of these new tests is to test the scenario where the cache > doesn't help and just adds overhead, because each XID is looked up only > once. Seems to be fine. Surprisingly good actually; I'll do some more > profiling on that to understand why it's even faster than 'master'. These tests use a sequential scan. For things like primary key lookups, I wonder if the overhead of creating and destroying the tree's memory contexts for the (not used again) cache would be noticeable. If so, it wouldn't be too difficult to teach radix tree to create the larger contexts lazily. > Now the downside of this new cache: Since it has no size limit, if you > keep looking up different XIDs, it will keep growing until it holds all > the XIDs between the snapshot's xmin and xmax. That can take a lot of > memory in the worst case. Radix tree is pretty memory efficient, but > holding, say 1 billion XIDs would probably take something like 500 MB of > RAM (the radix tree stores 64-bit words with 2 bits per XID, plus the > radix tree nodes). That's per snapshot, so if you have a lot of > connections, maybe even with multiple snapshots each, that can add up. > > I'm inclined to accept that memory usage. If we wanted to limit the size > of the cache, would need to choose a policy on how to truncate it > (delete random nodes?), what the limit should be etc. But I think it'd > be rare to hit those cases in practice. If you have a one billion XID > old transaction running in the primary, you probably have bigger > problems already. I don't have a good sense of whether it needs a limit or not, but if we decide to add one as a precaution, maybe it's enough to just blow the cache away when reaching some limit? Being smarter than that would need some work. -- John Naylor Amazon Web Services
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Dear Hackers, To ping the topic, I'd like to clarify what may be wrong with the idea described here, because I do not see any interest from the community. The topic is related to physical replication. The primary idea is to define the horizon of WAL segments (files) removal based on saved on disk restart LSN values. Now, the WAL segment removal horizon is calculated based on the current restart LSN values of slots, that can not be saved on disk at the time of the horizon calculation. The case take place when a slot is advancing during checkpoint as described earlier in the topic. Such behaviour is not a problem when slots are used only for physical replication in a conventional way. But it may be a problem when physical slot is used for some other goals. For example, I have an extension which keeps the WAL using physical replication slots. It creates a new physical slot and advances it as needed. After restart, it can use restart lsn of the slot to read WAL from this LSN. In this case, there is no guarantee that restart lsn will point to an existing WAL segment. The advantage of the current behaviour is that it requires a little bit less WAL to keep. The disadvantage is that physical slots do not guarantee WAL keeping starting from its' restart lsns in general. I would be happy to get some advice, whether I am on the right or wrong way. Thank you in advance. With best regards, Vitaly On Thursday, November 07, 2024 16:30 MSK, "Vitaly Davydov" wrote: Dear Hackers, I'd like to introduce an improved version of my patch (see the attached file). My original idea was to take into account saved on disk restart_lsn (slot→restart_lsn_flushed) for persistent slots when removing WAL segment files. It helps tackle errors like: ERROR: requested WAL segment 000...0AA has already been removed. Improvements: * flushed_restart_lsn is used only for RS_PERSISTENT slots. * Save physical slot on disk when advancing only once - if restart_lsn_flushed is invalid. It is needed because slots with invalid restart LSN are not used when calculating oldest LSN for WAL truncation. Once restart_lsn becomes valid, it should be saved to disk immediately to update restart_lsn_flushed. Regression tests seems to be ok except: * recovery/t/001_stream_rep.pl (checkpoint is needed) * recovery/t/019_replslot_limit.pl (it seems, slot was invalidated, some adjustments are needed) * pg_basebackup/t/020_pg_receivewal.pl (not sure about it) There are some problems: * More WAL segments may be kept. It may lead to invalidations of slots in some tests (recovery/t/019_replslot_limit.pl). A couple of tests should be adjusted. With best regards, Vitaly Davydov On Thursday, October 31, 2024 13:32 MSK, "Vitaly Davydov" wrote: Sorry, attached the missed patch. On Thursday, October 31, 2024 13:18 MSK, "Vitaly Davydov" wrote: Dear Hackers, I'd like to discuss a problem with replication slots's restart LSN. Physical slots are saved to disk at the beginning of checkpoint. At the end of checkpoint, old WAL segments are recycled or removed from disk, if they are not kept by slot's restart_lsn values. If an existing physical slot is advanced in the middle of checkpoint execution, WAL segments, which are related to saved on disk restart LSN may be removed. It is because the calculation of the replication slot miminal LSN is occured at the end of checkpoint, prior to old WAL segments removal. If to hard stop (pg_stl -m immediate) the postgres instance right after checkpoint and to restart it, the slot's restart_lsn may point to the removed WAL segment. I believe, such behaviour is not good. The doc [0] describes that restart_lsn may be set to the some past value after reload. There is a discussion [1] on pghackers where such behaviour is discussed. The main reason of not flushing physical slots on advancing is a performance reason. I'm ok with such behaviour, except of that the corresponding WAL segments should not be removed. I propose to keep WAL segments by saved on disk (flushed) restart_lsn of slots. Add a new field restart_lsn_flushed into ReplicationSlot structure. Copy restart_lsn to restart_lsn_flushed in SaveSlotToPath. It doesn't change the format of storing the slot contents on disk. I attached a patch. It is not yet complete, but demonstate a way to solve the problem. I reproduced the problem by the following way: * Add some delay in CheckPointBuffers (pg_usleep) to emulate long checkpoint execution. * Execute checkpoint and pg_replication_slot_advance right after starting of the checkpoint from another connection. * Hard restart the server right after checkpoint completion. * After restart slot's restart_lsn may point to removed WAL segment. The proposed patch fixes it. [0] https://www.postgresql.org/docs/current/logicaldecoding-explanation.html [1] https://www.postgresql.org/message-id/flat/059cc53a-8b14-653a-a24d-5f867503b0ee%40postgrespro.ru
Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin
> On 20 Nov 2024, at 14:34, Sanjay Khatri wrote: > > These are the errors from the logs of iDrac M630. > I uninstalled the Postgres 15.8 , once I got the connection. But sadly the > server disconnected again and crashed, even after uninstalling. > No I am not able to boot it. I'm going to go out on a limb and say that this isn't a postgres bug, and highly unlikely to be caused by one. This smells like broken hardware and/or corrupted BIOS/firmware, you should contact your hardware vendor for support. -- Daniel Gustafsson
Re: proposal: schema variables
Hi st 20. 11. 2024 v 14:29 odesílatel Marcos Pegoraro napsal: > Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule < > pavel.steh...@gmail.com> escreveu: > >> I wrote POC of VARIABLE(varname) syntax support >> > > Not related with POC of VARIABLE but seeing your patches ... > > Wouldn't it be better to use just one syntax and message for what to do ON > COMMIT ? > > When creating a new variable you use > CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET > > On PSQL \dV+ you show > Transactional end action > > Maybe all them could be just ON COMMIT > CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on > commit" on title column > ON COMMIT DROP is related to temporary objects. In this case, you don't need to think about ROLLBACK, because in this case, the temp objects are removed implicitly. ON TRANSACTION END RESET can be used for non temporary objects too. So this is a little bit of a different feature. But the reset is executed if the transaction is ended by ROLLBACK too. So using a syntax just ON COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I think. If I remember there was a proposal ON COMMIT OR ROLLBACK, but I think TRANSACTION END is better and more intuitive, and better describes what is implemented. I can imagine to support clauses ON COMMIT RESET or ON ROLLBACK RESET that can be used independently, but for this time, I don't want to increase a complexity now - reset is just at transaction end without dependency if the transaction was committed or rollbacked. Regards Pavel > regards > Marcos >
Re: proposal: schema variables
Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule escreveu: > COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I > think. > >> Exactly to be not messy I would just ON COMMIT for all, and DOCs can explain that this option is ignored for temp objects and do the same at the end of transaction, independently if commited or rolled back
Re: proposal: schema variables
st 20. 11. 2024 v 15:15 odesílatel Marcos Pegoraro napsal: > Em qua., 20 de nov. de 2024 às 10:52, Pavel Stehule < > pavel.steh...@gmail.com> escreveu: > >> COMMIT can be a little bit messy. TRANSACTION END is more intuitive, I >> think. >> >>> > Exactly to be not messy I would just ON COMMIT for all, and DOCs can > explain that this option is ignored for temp objects and do the same at the > end of transaction, independently if commited or rolled back > I feel it differently - when I read ON COMMIT, then I expect really just a COMMIT event, not ROLLBACK. Attention - the metadata about variables are transactional, but the variables are not transactional (by default). But this feeling can be subjective. The objective argument against using ON COMMIT like ON TRANSACTION END is fact, so we lost a possibility for a more precious setting. I can imagine scenarios with ON COMMIT RESET - and variable can hold some last value from transaction, or ON ROLLBACK RESET - and variable can hold some computed value from successfully ended transaction - last inserted id. In this case, I don't see a problem to reopen a discussion about syntax or postpone this feature. I think the possibility to reset variables at some specified time can be an interesting feature (that can increase safety, because the application doesn't need to solve initial setup), but from the list of implemented features for session variables, this is not too far from the end. If TRANSACTION END is not intuitive - with exactly the same functionality can be RESET AT TRANSACTION START - so the ON TRANSACTION END can be transformed to ON BEGIN RESET, and this syntax can be maybe better, because it signalize, for every transaction, the variable will be initialized to default. What do you think? Can be ON BEGIN RESET acceptable for you. Regards Pavel
Re: Enhancing Memory Context Statistics Reporting
On Wed, Nov 20, 2024 at 2:39 PM Rahila Syed wrote: > > Hi, > >> To achieve both completeness and avoid writing to a file, I can consider >> displaying the numbers for the remaining contexts as a cumulative total >> at the end of the output. >> >> Something like follows: >> ``` >> postgres=# select * from pg_get_process_memory_contexts('237244', false); >> name | ident >>| type | path | total_bytes | tot >> al_nblocks | free_bytes | free_chunks | used_bytes | pid >> ---++--+--+-+ >> ---++-++ >> TopMemoryContext | >>| AllocSet | {0} | 97696 | >> 5 | 14288 | 11 | 83408 | 237244 >> search_path processing cache | >>| AllocSet | {0,1}|8192 | >> 1 | 5328 | 7 | 2864 | 237244 >> Remaining contexts total: 23456 bytes (total_bytes) , 12345(used_bytes), >> 11,111(free_bytes) >> >> ``` > > > Please find attached an updated patch with this change. The file previously > used to > store spilled statistics has been removed. Instead, a cumulative total of the > remaining/spilled context statistics is now stored in the DSM segment, which > is > displayed as follows. > > postgres=# select * from pg_get_process_memory_contexts('352966', false); > name | ident | type | path | total_bytes | > total_nblocks | free_bytes | free_chunks | used_bytes | pi > d > --+---+--++-+---++-++ > > TopMemoryContext | | AllocSet | {0}| 97696 | > 5 | 14288 | 11 | 83408 | 352 > 966 > . > . > . > MdSmgr | | AllocSet | {0,18} |8192 | > 1 | 7424 | 0 |768 | 352 > 966 > Remaining Totals | | || 1756016 | > 188 | 658584 | 132 |1097432 | 352 > 966 > (7129 rows) > - > > I believe this serves as a good compromise between completeness > and avoiding the overhead of file handling. However, I am open to > reintroducing file handling if displaying the complete statistics of the > remaining contexts prove to be more important. > > All the known bugs in the patch have been fixed. > > In summary, one DSA per PostgreSQL process is used to share > the statistics of that process. A DSA is created by the first client > backend that requests memory context statistics, and it is pinned > for all future requests to that process. > A handle to this DSA is shared between the client and the publishing > process using fixed shared memory. The fixed shared memory consists > of an array of size MaxBackends + auxiliary processes, indexed > by procno. Each element in this array is less than 100 bytes in size. > > A PostgreSQL process uses a condition variable to signal a waiting client > backend once it has finished publishing the statistics. If, for some reason, > the signal is not sent, the waiting client backend will time out. How does the process know that the client backend has finished reading stats and it can be refreshed? What happens, if the next request for memory context stats comes before first requester has consumed the statistics it requested? Does the shared memory get deallocated when the backend which allocated it exits? > > When statistics of a local backend is requested, this function returns the > following > WARNING and exits, since this can be handled by an existing function which > doesn't require a DSA. > > WARNING: cannot return statistics for local backend > HINT: Use pg_get_backend_memory_contexts instead How about using pg_get_backend_memory_contexts() for both - local as well as other backend? Let PID argument default to NULL which would indicate local backend, otherwise some other backend? -- Best Wishes, Ashutosh Bapat
Re: proposal: schema variables
Em ter., 19 de nov. de 2024 às 16:15, Pavel Stehule escreveu: > I wrote POC of VARIABLE(varname) syntax support > Not related with POC of VARIABLE but seeing your patches ... Wouldn't it be better to use just one syntax and message for what to do ON COMMIT ? When creating a new variable you use CREATE VARIABLE ... ON COMMIT DROP | ON TRANSACTION END RESET On PSQL \dV+ you show Transactional end action Maybe all them could be just ON COMMIT CREATE VARIABLE ... [ON COMMIT {NO ACTION, DROP, RESET}] and \dV+ just "on commit" on title column regards Marcos
Re: sunsetting md5 password support
On Wed, Nov 20, 2024 at 10:56:11AM -0500, Greg Sabino Mullane wrote: > On Tue, Nov 19, 2024 at 8:55 PM Nathan Bossart > wrote: > >> * Expand the documentation. Perhaps we could add a step-by-step guide >> for migrating to SCRAM-SHA-256 since more users will need to do so when >> MD5 password support is removed. >> * Remove the hint. It's arguably doing little more than pointing out the >> obvious, and it doesn't actually tell users where in the documentation >> to look for this information, anyway. >> > > I think both ideally, but maybe just the hint removal for this patch? > > On the other hand, "change your password and update pg_hba.conf" is pretty > much all you need, so not sure how detailed we want to get. :) After thinking about this some more, I'm actually finding myself leaning towards leaving the hint and potentially adding more detail to the documentation as a follow-up patch. While the hint arguably points out the obvious, it should at least nudge users in the right direction instead of just telling them to stop using MD5 passwords. I've always found it incredibly frustrating when something is marked deprecated but there's zero information about what to do instead. I also see a few existing cases where we refer users to the documentation, so it's not without precedent. -- nathan
Re: cannot freeze committed xmax
> On 20 Nov 2024, at 15:58, Andrey M. Borodin wrote: > > PFA the patch doing so. Ugh. The patch is simply dysfunctional, sorry. xmax_status is being checked uninitiated. But, well, it highlights the idea: make verify_heapam() aware of such corruptions. What do you think? Best regards, Andrey Borodin.
Re: Add a write_to_file member to PgStat_KindInfo
Hi, On Wed, Nov 20, 2024 at 05:45:55PM +0300, Nazir Bilal Yavuz wrote: > I think this is a good idea. +1 for the $SUBJECT. Thanks for looking at it! > There are duplicated codes in the injection_stats_fixed.c file. Do you > think that 'modifying existing functions to take an argument to > differentiate whether the kind is default or no-write' would be > better? Some of the new functions are callbacks so we don't have the control on the parameters list that the core is expecting. It remains: pgstat_register_inj_fixed_no_write() pgstat_report_inj_fixed_no_write() injection_points_stats_fixed_no_write() for which I think we could add an extra "write_to_file" argument on their original counterparts. Not sure how many code reduction we would get, so out of curiosity I did the exercice in v2 attached and that gives us: v1: 8 files changed, 211 insertions(+), 2 deletions(-) v2: 8 files changed, 152 insertions(+), 22 deletions(-) I don't have a strong opinion for this particular case here (I think the code is harder to read but yeah there is some code reduction): so I'm fine with v2 too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 53f4afaed64d6975b022da22db7f5e7fe0134ca7 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Wed, 20 Nov 2024 08:35:13 + Subject: [PATCH v2] Add a write_to_file member to PgStat_KindInfo This new member allows the statistics to configure whether or not they want to be written to the file on disk. That gives more flexiblity to the custom statistics. --- src/backend/utils/activity/pgstat.c | 21 +++- src/include/utils/pgstat_internal.h | 3 + .../injection_points--1.0.sql | 3 +- .../injection_points/injection_points.c | 18 ++-- .../injection_points/injection_stats.c| 1 + .../injection_points/injection_stats.h| 6 +- .../injection_points/injection_stats_fixed.c | 100 -- .../modules/injection_points/t/001_stats.pl | 22 +++- 8 files changed, 152 insertions(+), 22 deletions(-) 10.4% src/backend/utils/activity/ 18.1% src/test/modules/injection_points/t/ 70.5% src/test/modules/injection_points/ diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index ea8c5691e8..4b8c57bb8e 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -12,7 +12,8 @@ * Statistics are loaded from the filesystem during startup (by the startup * process), unless preceded by a crash, in which case all stats are * discarded. They are written out by the checkpointer process just before - * shutting down, except when shutting down in immediate mode. + * shutting down (if the stat kind allows it), except when shutting down in + * immediate mode. * * Fixed-numbered stats are stored in plain (non-dynamic) shared memory. * @@ -281,6 +282,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "database", .fixed_amount = false, + .write_to_file = true, /* so pg_stat_database entries can be seen in all databases */ .accessed_across_databases = true, @@ -297,6 +299,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "relation", .fixed_amount = false, + .write_to_file = true, .shared_size = sizeof(PgStatShared_Relation), .shared_data_off = offsetof(PgStatShared_Relation, stats), @@ -311,6 +314,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "function", .fixed_amount = false, + .write_to_file = true, .shared_size = sizeof(PgStatShared_Function), .shared_data_off = offsetof(PgStatShared_Function, stats), @@ -324,6 +328,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "replslot", .fixed_amount = false, + .write_to_file = true, .accessed_across_databases = true, @@ -340,6 +345,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "subscription", .fixed_amount = false, + .write_to_file = true, /* so pg_stat_subscription_stats entries can be seen in all databases */ .accessed_across_databases = true, @@ -359,6 +365,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "archiver", .fixed_amount = true, + .write_to_file = true, .snapshot_ctl_off = offsetof(PgStat_Snapshot, archiver), .shared_ctl_off = offsetof(PgStat_ShmemControl, archiver), @@ -374,6 +381,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "bgwriter", .fixed_amount = true, + .write_to_file = true, .snapshot_ctl_off = offsetof(PgStat_Snapshot, bgwriter), .shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter), @@ -389,6 +397,7 @@ static const PgStat_KindInfo pgstat_kind_builti
Re: per backend I/O statistics
Hi, On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote: > + /* > + * Do serialize or not this kind of stats. > + */ > + boolto_serialize:1; > > Not sure that "serialize" is the best term that applies here. For > pgstats entries, serialization refers to the matter of writing their > entries with a "serialized" name because they have an undefined number > when stored locally after a reload. I'd suggest to split this concept > into its own patch, rename the flag as "write_to_file" Yeah, also this could useful for custom statistics. So I created a dedicated thread and a patch proposal (see [1]). [1]: https://www.postgresql.org/message-id/Zz3skBqzBncSFIuY%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: fix deprecation mention for age() and mxid_age()
On Tue, Nov 19, 2024 at 06:52:40AM +, Bertrand Drouvot wrote: > Thanks for the feedback! Done. The section mistake in REL_16_STABLE was.. Interesting. -- Michael signature.asc Description: PGP signature
Re: Remove useless casts to (void *)
On Thu, Nov 14, 2024 at 09:59:07AM +0100, Peter Eisentraut wrote: > On 29.10.24 15:20, Tom Lane wrote: > > Peter Eisentraut writes: > > > There are a bunch of (void *) casts in the code that don't make sense to > > > me. I think some of these were once necessary because char * was used > > > in place of void * for some function arguments. And some of these were > > > probably just copied around without further thought. I went through and > > > cleaned up most of these. I didn't find any redeeming value in these. > > > They are just liable to hide actual problems such as incompatible types. > > >But maybe there are other opinions. > > > > I don't recall details, but I'm fairly sure some of these prevented > > compiler warnings on some (old?) compilers. Hard to be sure if said > > compilers are all gone. > > > > Looking at the sheer size of the patch, I'm kind of -0.1, just > > because I'm afraid it's going to create back-patching gotchas. > > I don't really find that it's improving readability, though > > clearly that's a matter of opinion. > > I did a bit of archeological research on these. None of these casts were > ever necessary, and in many cases even the original patch that introduced an > API used the coding style inconsistently. So I'm very confident that there > are no significant backward compatibility or backpatching gotchas here. > > I'm more concerned that many of these just keep getting copied around > indiscriminately, and this is liable to hide actual type mismatches or > silently discard qualifiers. So I'm arguing in favor of a more restrictive > style in this matter. I agree. I realize this will cause backpatch complexities, but I think removing these will be a net positive. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Add a write_to_file member to PgStat_KindInfo
Hi hackers, Working on [1] produced the need to give to the statistics the ability to decide whether or not they want to be written to the file on disk. Indeed, there is no need to write the per backend I/O stats to disk (no point to see stats for backends that do not exist anymore after a re-start), so $SUBJECT. This new member could also be useful for custom statistics, so creating this dedicated thread. The attached patch also adds a test in the fixed injection points statistics. It does not add one for the variable injection points statistics as I think adding one test is enough (the code changes are simple enough). [1]: https://www.postgresql.org/message-id/Zy4bmvgHqGjcK1pI%40ip-10-97-1-34.eu-west-3.compute.internal Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 229cbce525382c36d461246dccc0ab6a71b31c17 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Wed, 20 Nov 2024 08:35:13 + Subject: [PATCH v1] Add a write_to_file member to PgStat_KindInfo This new member allows the statistics to configure whether or not they want to be written to the file on disk. That gives more flexiblity to the custom statistics. --- src/backend/utils/activity/pgstat.c | 21 ++- src/include/utils/pgstat_internal.h | 3 + .../injection_points--1.0.sql | 13 ++ .../injection_points/injection_points.c | 6 + .../injection_points/injection_stats.c| 1 + .../injection_points/injection_stats.h| 6 + .../injection_points/injection_stats_fixed.c | 149 ++ .../modules/injection_points/t/001_stats.pl | 14 ++ 8 files changed, 211 insertions(+), 2 deletions(-) 9.7% src/backend/utils/activity/ 11.1% src/test/modules/injection_points/t/ 78.3% src/test/modules/injection_points/ diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index ea8c5691e8..4b8c57bb8e 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -12,7 +12,8 @@ * Statistics are loaded from the filesystem during startup (by the startup * process), unless preceded by a crash, in which case all stats are * discarded. They are written out by the checkpointer process just before - * shutting down, except when shutting down in immediate mode. + * shutting down (if the stat kind allows it), except when shutting down in + * immediate mode. * * Fixed-numbered stats are stored in plain (non-dynamic) shared memory. * @@ -281,6 +282,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "database", .fixed_amount = false, + .write_to_file = true, /* so pg_stat_database entries can be seen in all databases */ .accessed_across_databases = true, @@ -297,6 +299,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "relation", .fixed_amount = false, + .write_to_file = true, .shared_size = sizeof(PgStatShared_Relation), .shared_data_off = offsetof(PgStatShared_Relation, stats), @@ -311,6 +314,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "function", .fixed_amount = false, + .write_to_file = true, .shared_size = sizeof(PgStatShared_Function), .shared_data_off = offsetof(PgStatShared_Function, stats), @@ -324,6 +328,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "replslot", .fixed_amount = false, + .write_to_file = true, .accessed_across_databases = true, @@ -340,6 +345,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "subscription", .fixed_amount = false, + .write_to_file = true, /* so pg_stat_subscription_stats entries can be seen in all databases */ .accessed_across_databases = true, @@ -359,6 +365,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "archiver", .fixed_amount = true, + .write_to_file = true, .snapshot_ctl_off = offsetof(PgStat_Snapshot, archiver), .shared_ctl_off = offsetof(PgStat_ShmemControl, archiver), @@ -374,6 +381,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "bgwriter", .fixed_amount = true, + .write_to_file = true, .snapshot_ctl_off = offsetof(PgStat_Snapshot, bgwriter), .shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter), @@ -389,6 +397,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE] .name = "checkpointer", .fixed_amount = true, + .write_to_file = true, .snapshot_ctl_off = offsetof(PgStat_Snapshot, checkpointer), .shared_ctl_off = offsetof(PgStat_ShmemControl, checkpointer), @@ -404,6 +413,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZ
Re: Add a write_to_file member to PgStat_KindInfo
Hi, Thank you for working on this! On Wed, 20 Nov 2024 at 17:05, Bertrand Drouvot wrote: > > Hi hackers, > > Working on [1] produced the need to give to the statistics the ability to > decide whether or not they want to be written to the file on disk. > > Indeed, there is no need to write the per backend I/O stats to disk (no point > to > see stats for backends that do not exist anymore after a re-start), so > $SUBJECT. I think this is a good idea. +1 for the $SUBJECT. > This new member could also be useful for custom statistics, so creating this > dedicated thread. > > The attached patch also adds a test in the fixed injection points statistics. > It does not add one for the variable injection points statistics as I think > adding > one test is enough (the code changes are simple enough). There are duplicated codes in the injection_stats_fixed.c file. Do you think that 'modifying existing functions to take an argument to differentiate whether the kind is default or no-write' would be better? -- Regards, Nazir Bilal Yavuz Microsoft
Re: scalability bottlenecks with (many) partitions (and more)
On Wed, 4 Sept 2024 at 17:32, Tomas Vondra wrote: > > On 9/4/24 16:25, Matthias van de Meent wrote: > > On Tue, 3 Sept 2024 at 18:20, Tomas Vondra wrote: > >> FWIW the actual cost is somewhat higher, because we seem to need ~400B > >> for every lock (not just the 150B for the LOCK struct). > > > > We do indeed allocate two PROCLOCKs for every LOCK, and allocate those > > inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in > > dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash > > buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being > > used[^1]. Does that align with your usage numbers, or are they > > significantly larger? > > > > I see more like ~470B per lock. If I patch CalculateShmemSize to log the > shmem allocated, I get this: > > max_connections=100 max_locks_per_transaction=1000 => 194264001 > max_connections=100 max_locks_per_transaction=2000 => 241756967 > > and (((241756967-194264001)/100/1000)) = 474 > > Could be alignment of structs or something, not sure. NLOCKENTS is calculated based off of MaxBackends, which is the sum of MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders; which by default add 22 more slots. After adjusting for that, we get 388 bytes /lock, which is approximately in line with the calculation. > >> At least based on a quick experiment. (Seems a bit high, right?). > > > > Yeah, that does seem high, thanks for nerd-sniping me. [...] > > Alltogether that'd save 40 bytes/lock entry on size, and ~35 > > bytes/lock on "safety margin", for a saving of (up to) 19% of our > > current allocation. I'm not sure if these tricks would benefit with > > performance or even be a demerit, apart from smaller structs usually > > being better at fitting better in CPU caches. > > > > Not sure either, but it seems worth exploring. If you do an experimental > patch for the LOCK size reduction, I can get some numbers. It took me some time to get back to this, and a few hours to experiment, but here's that experimental patch. Attached 4 patches, which together reduce the size of the shared lock tables by about 34% on my 64-bit system. 1/4 implements the MAX_LOCKMODES changes to LOCK I mentioned before, saving 16 bytes. 2/4 packs the LOCK struct more tightly, for another 8 bytes saved. 3/4 reduces the PROCLOCK struct size by 8 bytes with a PGPROC* -> ProcNumber substitution, allowing packing with fields previously reduced in size in patch 2/4. 4/4 reduces the size fo the PROCLOCK table by limiting the average number of per-backend locks to max_locks_per_transaction (rather than the current 2*max_locks_per_transaction when getting locks that other backends also requested), and makes the shared lock tables fully pre-allocated. 1-3 together save 11% on the lock tables in 64-bit builds, and 4/4 saves another ~25%, for a total of ~34% on per-lockentry shared memory usage; from ~360 bytes to ~240 bytes. Note that this doesn't include the ~4.5 bytes added per PGPROC entry per mlpxid for fastpath locking; I've ignored those for now. Not implemented, but technically possible: the PROCLOCK table _could_ be further reduced in size by acknowledging that each of that struct is always stored after dynahash HASHELEMENTs, which have 4 bytes of padding on 64-bit systems. By changing PROCLOCKTAG's myProc to ProcNumber, one could pack that field into the padding of the hash element header, reducing the effective size of the hash table's entries by 8 bytes, and thus the total size of the tables by another few %. I don't think that trade-off is worth it though, given the complexity and trickery required to get that to work well. > I'm not sure about the safety margins. 10% sure seems like quite a bit > of memory (it might not have in the past, but as the instances are > growing, that probably changed). I have not yet touched this safety margin. Kind regards, Matthias van de Meent Neon (https://neon.tech) v0-0002-Reduce-size-of-LOCK-by-8-more-bytes.patch Description: Binary data v0-0001-Reduce-size-of-LOCK-by-16-bytes.patch Description: Binary data v0-0003-Reduce-size-of-PROCLOCK-by-8-bytes-on-64-bit-syst.patch Description: Binary data v0-0004-Reduce-PROCLOCK-hash-table-size.patch Description: Binary data
Re: Statistics Import and Export
On Tue, Nov 19, 2024 at 05:40:20PM -0500, Bruce Momjian wrote: > On Tue, Nov 19, 2024 at 03:47:20PM -0500, Corey Huinker wrote: > > * create a pg_stats_health_check script that lists tables missing stats, > > with > > --fix/--fix-in-stages options, effectively replacing vacuumdb for those > > purposes, and then crank up the messaging about that change. The "new shiny" > > effect of a new utility that has "stats", "health", and "check" in the name > > may > > be the search/click-bait we need to get the word out effectively. That last > > sentence may sound facetious, but it isn't, it's just accepting how search > > engines and eyeballs currently function. With that in place, we can then > > change > > the vacuumdb documentation to be deter future use in post-upgrade > > situations. > > We used to create a script until the functionality was added to > vacuumdb. Since 99% of users will not need to do anything after > pg_upgrade, it would make sense to output the script only for the 1% of > users who need it and tell users to run it, rather than giving > instructions that are a no-op for 99% of users. One problem with the above approach is that it gives users upgrading or loading via pg_dump no way to know which tables need analyze statistics, right? I think that is why we ended up putting the pg_upgrade statistics functionality in vacuumdb --analyze-in-stages. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Consider pipeline implicit transaction as a transaction block
On Tue, Nov 5, 2024 at 6:50 AM Michael Paquier wrote: > It would be nice to document all these behaviors with regression > tests in pgbench as it is the only place where we can control that > with error pattern checks. It's not the first time I wanted to be able to do pipelining in psql as relying on pgbench and tap tests is not the most flexible so I took a stab at adding it. 0001: This is a small bug I've stumbled upon. The query buffer is not cleared on a backslash error. For example, "SELECT 1 \parse" would fail due to a missing statement name but would leave "SELECT 1\n" in the query buffer which would impact the next command. 0002: This adds pipeline support to psql by adding the same meta-commands as pgbench: \startpipeline, \endpipeline and \syncpipeline. Once a pipeline is started, all extended protocol queries are piped until \endpipeline is called. To keep things simple, meta-commands like \gx, \gdesc and \gexec are forbidden inside a pipeline. The tests are covering the existing pipelining behaviour with the SET LOCAL, SAVEPOINTS, REINDEX CONCURRENTLY and LOCK commands, depending if it's the first command or not. The \syncpipeline allows us to see that this "first command" behaviour also happens after a synchronisation point within a pipeline. 0003: The initial change to use implicit transaction block when pipelining is detected. The tests reflect the change in behaviour like the LOCK command working if it's the second command in the pipeline. I've updated the pipeline documentation to provide more details about the use of implicit transaction blocks with pipelines. v03-0001-Reset-query-buffer-on-a-backslash-error.patch Description: Binary data v03-0003-Consider-pipeline-implicit-transaction-as-a-tran.patch Description: Binary data v03-0002-Add-pipeline-support-in-psql.patch Description: Binary data
Re: [PATCH] Fixed assertion issues in "pg_get_viewdef"
"=?gb18030?B?emVuZ21hbg==?=" writes: > Thanks for your guidance, you are right, I looked at your patch > and combined it with the example to generate a new patch, > which is really better. I pushed the code fix, but I can't really convince myself that the test case is worth the cycles it'd eat forevermore. If we had a way to reach the situation where there's setops but not any of the other clauses in a leaf query, perhaps that would be worth checking ... but we don't. It's just belt-and-suspenders-too programming. regards, tom lane
Re: Add reject_limit option to file_fdw
On 2024/11/19 21:40, torikoshia wrote: These messages may be unexpected for some users because the documentation of fild_fdw does not explicitly describe that file_fdw uses COPY internally. (I can find several wordings like "as COPY", though.) However, since the current file_fdw already has such messages, I came to think making the messages on "reject_limit" consistent with COPY is reasonable. I mean, the previous version of the message seems fine. Agreed. Thanks for your investigation. I've pushed the 0001 patch. Thanks! As another comment, do we need to add validator test for on_error and reject_limit as similar to other options? That might be better. Added some queries which had wrong options to cause errors. I was unsure whether to add it to "validator test" section or "on_error, log_verbosity and reject_limit tests" section, but I chose "validator test" as I believe it makes things clearer. Added 0002 patch since some of the tests are not related to reject_limit but just on_error. How about adding validator tests for log_verbosity and reject_limit=0, similar to the tests in the copy2.sql regression test? I've added those two tests, and the latest version of the patch is attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION From b73cf7b8aec5d93490f0d303eda3207bd4acbf8c Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 19 Nov 2024 21:12:52 +0900 Subject: [PATCH v6] file_fdw: Add regression tests for ON_ERROR and other options. This commit introduces regression tests to validate incorrect settings for the ON_ERROR, LOG_VERBOSITY, and REJECT_LIMIT options in file_fdw. --- contrib/file_fdw/expected/file_fdw.out | 8 contrib/file_fdw/sql/file_fdw.sql | 4 2 files changed, 12 insertions(+) diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out index 4f63c047ec..df8d43b374 100644 --- a/contrib/file_fdw/expected/file_fdw.out +++ b/contrib/file_fdw/expected/file_fdw.out @@ -90,8 +90,16 @@ ERROR: COPY delimiter cannot be newline or carriage return CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null ' '); -- ERROR ERROR: COPY null representation cannot use newline or carriage return +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'unsupported'); -- ERROR +ERROR: COPY ON_ERROR "unsupported" not recognized +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore'); -- ERROR +ERROR: only ON_ERROR STOP is allowed in BINARY mode +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 'unsupported'); -- ERROR +ERROR: COPY LOG_VERBOSITY "unsupported" not recognized CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1'); -- ERROR ERROR: COPY REJECT_LIMIT requires ON_ERROR to be set to IGNORE +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', reject_limit '0'); -- ERROR +ERROR: REJECT_LIMIT (0) must be greater than zero CREATE FOREIGN TABLE tbl () SERVER file_server; -- ERROR ERROR: either filename or program is required for file_fdw foreign tables \set filename :abs_srcdir '/data/agg.data' diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql index 4805ca8c04..2cdbe7a8a4 100644 --- a/contrib/file_fdw/sql/file_fdw.sql +++ b/contrib/file_fdw/sql/file_fdw.sql @@ -77,7 +77,11 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', delimiter '); -- ERROR CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null ' '); -- ERROR +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'unsupported'); -- ERROR +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'binary', on_error 'ignore'); -- ERROR +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (log_verbosity 'unsupported'); -- ERROR CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (reject_limit '1'); -- ERROR +CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (on_error 'ignore', reject_limit '0'); -- ERROR CREATE FOREIGN TABLE tbl () SERVER file_server; -- ERROR \set filename :abs_srcdir '/data/agg.data' -- 2.47.0
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
On Wed, Nov 20, 2024 at 1:26 AM Guillaume Lelarge wrote: > OK, but I'm not sure which example I should pick to add dirtied and > written shared buffers. It looks kinda artificial. Should I choose one > randomly? > It will be artificial, but I think that's ok: anyone running it on their own will be getting different numbers anyway. I was looking at the "14.1.2 EXPLAIN ANALYZE" section in perform.sgml. Here's some actual numbers I got with some playing around with concurrent updates: Recheck Cond: (unique1 < 10) > Heap Blocks: exact=5 > Buffers: shared hit=2 read=5 written=4 ... > Planning: >Buffers: shared hit=289 dirtied=9 Cheers, Greg
Add Option To Check All Addresses For Matching target_session_attr
Hi, I was attempting to set up a high availability system using DNS and target_session_attrs. I was using a DNS setup similar to below and was trying to use the connection strings `psql postgresql:// u...@pg.database.com/db_name?target_session=read-write` to have clients dynamically connect to the primary or `psql postgresql:// u...@pg.database.com/db_name?target_session=read-only` to have clients connect to a read replica. The problem that I found with this setup is that if libpq is unable to get a matching target_session_attr on the first connection attempt it does not consider any further addresses for the given host. This patch is designed to provide an option that allows libpq to look at additional addresses for a given host if the target_session_attr check fails for previous addresses. Would appreciate any feedback on the applicability/relevancy of the goal here or the implementation. Example DNS setup Name | Type| Record __|__|___ pg.database.com | A| ip_address_1 pg.database.com | A| ip_address_2 pg.database.com | A| ip_address_3 pg.database.com | A| ip_address_4 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 51083dcfd8..865eb74fd7 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -365,6 +365,11 @@ static const internalPQconninfoOption PQconninfoOptions[] = { "Load-Balance-Hosts", "", 8, /* sizeof("disable") = 8 */ offsetof(struct pg_conn, load_balance_hosts)}, + {"check_all_addrs", "PGCHECKALLADDRS", + DefaultLoadBalanceHosts, NULL, + "Check-All-Addrs", "", 1, + offsetof(struct pg_conn, check_all_addrs)}, + /* Terminating entry --- MUST BE LAST */ {NULL, NULL, NULL, NULL, NULL, NULL, 0} @@ -4030,11 +4035,11 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_OK; sendTerminateConn(conn); - /* - * Try next host if any, but we don't want to consider - * additional addresses for this host. - */ - conn->try_next_host = true; + if (conn->check_all_addrs && conn->check_all_addrs[0] == '1') + conn->try_next_addr = true; + else + conn->try_next_host = true; + goto keep_going; } } @@ -4085,11 +4090,11 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_OK; sendTerminateConn(conn); - /* - * Try next host if any, but we don't want to consider - * additional addresses for this host. - */ - conn->try_next_host = true; + if (conn->check_all_addrs && conn->check_all_addrs[0] == '1') + conn->try_next_addr = true; + else + conn->try_next_host = true; + goto keep_going; } } @@ -4703,6 +4708,7 @@ freePGconn(PGconn *conn) free(conn->rowBuf); free(conn->target_session_attrs); free(conn->load_balance_hosts); + free(conn->check_all_addrs); termPQExpBuffer(&conn->errorMessage); termPQExpBuffer(&conn->workBuffer); diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 08cc391cbd..c51ec28234 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -427,6 +427,7 @@ struct pg_conn char *target_session_attrs; /* desired session properties */ char *require_auth; /* name of the expected auth method */ char *load_balance_hosts; /* load balance over hosts */ + char *check_all_addrs; /* whether to check all ips within a host or terminate on failure */ bool cancelRequest; /* true if this connection is used to send a * cancel request, instead of being a normal
Re: sunsetting md5 password support
On Tue, Nov 19, 2024 at 8:55 PM Nathan Bossart wrote: > * Expand the documentation. Perhaps we could add a step-by-step guide > for migrating to SCRAM-SHA-256 since more users will need to do so when > MD5 password support is removed. > * Remove the hint. It's arguably doing little more than pointing out the > obvious, and it doesn't actually tell users where in the documentation > to look for this information, anyway. > I think both ideally, but maybe just the hint removal for this patch? On the other hand, "change your password and update pg_hba.conf" is pretty much all you need, so not sure how detailed we want to get. :) Cheers, Greg
Re: RFC: Additional Directory for Extensions
On Nov 20, 2024, at 04:05, Peter Eisentraut wrote: > The path is only consulted if the specified name does not contain a slash. > So if you do LOAD 'foo', the path is consulted, but if you do LOAD > '$libdir/foo', it is not. The problem I'm describing is that most extensions > use the latter style, per current recommendation in the documentation. I see; some details here: https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-C-DYNLOAD And I suppose the `directory` control file variable and `MODULEDIR` make variable make that necessary. Maybe $libdir should be stripped out when installing extensions to work with this patch? Best, David
Re: Replace current implementations in crypt() and gen_salt() to OpenSSL
> On 19 Nov 2024, at 18:30, Joe Conway wrote: > Any other opinions out there? Couldn't installations who would be satisfied with a GUC gate revoke privileges from the relevant functions already today and achieve almost the same result? -- Daniel Gustafsson
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Wed, Nov 20, 2024 at 09:51:09PM -0500, Jonathan Katz wrote: > On 11/20/24 9:50 PM, Jonathan S. Katz wrote: > > On 11/20/24 9:48 PM, Tom Lane wrote: > > > "David G. Johnston" writes: > > > > On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: > > > > > so when we decided to remove the downloads > > > > > > > Can you elaborate on who "we" is here? > > > > > > More to the point, what downloads were removed? I still see the > > > source tarballs in the usual place [1]. If some packager(s) removed > > > or never posted derived packages, that's on them not the project. > > > > Downloads weren't removed, and I don't see why we'd want to do so in > > this case. > > Maybe here's the confusion - EDB doesn't have the downloads for the latest > released posted on the Windows installer: > > https://www.enterprisedb.com/downloads/postgres-postgresql-downloads Yes, or for MacOS. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: proposal: schema variables
st 20. 11. 2024 v 21:14 odesílatel Dmitry Dolgov <9erthali...@gmail.com> napsal: > > On Tue, Nov 19, 2024 at 08:14:01PM +0100, Pavel Stehule wrote: > > Hi > > > > I wrote POC of VARIABLE(varname) syntax support > > Thanks, the results look good. I'm still opposed the idea of having a > warning, and think it has to be an error -- but it's my subjective > opinion. Lets see if there are more votes on that topic. > The error breaks the possibility to use variables (session variables) like Oracle's package variables easily. It increases effort for transformation or porting because you should identify variables inside queries and you should wrap it to fence. On the other hand, extensions that can read a query after transformation can easily detect unwrapped variables and they can raise an error. It can be very easy to implement this check to plpgsql_check for example or like plpgsql.extra_check. In my ideal world, the shadowing warning should be enabled by default, and an unfencing warning disabled by default. But I have not a problem with activating both warnings by default. I think warnings are enough, because if there is some risk then a shadowing warning is activated. And my experience is more positive about warnings, people read it. Regards Pavel
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Wed, Nov 20, 2024 at 10:12:55PM -0500, Jonathan Katz wrote: > On 11/20/24 10:08 PM, Bruce Momjian wrote: > > Yes, or for MacOS. > > Well, why did EDB remove them? We didn't issue any guidance to remove > downloads. We only provided guidance to users on decision making about > whether to wait or not around the upgrade. All of the other packages hosted > on community infrastructure (and AFAICT other OS distros) are all available. I don't know. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Fri, Nov 15, 2024 at 06:28:42PM -0500, Jonathan Katz wrote: > We're scheduling an out-of-cycle release on November 21, 2024 to address two > regressions that were released as part of the November 14, 2024 update > release[1]. As part of this release, we will issue fixes for all supported > versions (17.2, 16.6, 15.10, 14.15, 13.20), and for 12.22, even though > PostgreSQL 12 is now EOL. > > A high-level description of the regressions are as follows. > > 1. The fix for CVE-2024-10978 prevented `ALTER USER ... SET ROLE ...` from > having any effect[2]. This will be fixed in the upcoming release. > > 2. Certain PostgreSQL extensions took a dependency on an Application Build > Interface (ABI) that was modified in this release and caused them to > break[3]. Currently, this can be mitigated by rebuilding the extensions > against the updated definition. > > Please follow all standard guidelines for commits ahead of the release. > Thanks for your help in assisting with this release, I want to point out a complexity of this out-of-cycle release. Our 17.1, etc. releases had four CVEs: https://www.postgresql.org/message-id/173159332163.1547975.13346191756810493...@wrigleys.postgresql.org so when we decided to remove the downloads and encourage people to wait for the 17.2 etc. releases, we had the known CVEs in Postgres releases with no recommended way to fix them. I am not sure what we could have done differently, but I am surprised we didn't get more complaints about the security situation we put them in. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: 039_end_of_wal: error in "xl_tot_len zero" test
On Fri, Nov 15, 2024 at 11:21 AM Thomas Munro wrote: > Not sure yet what is different in this environment or why you're > suddenly noticing on 13.17. The logic has been there since 13.13 (ie > it was backpatched then). Hi Christoph, Also why only this branch, when they all have it? Reproducible or one-off? Do you have any more clues?
Re: pg_rewind WAL segments deletion pitfall
Alvaro Herrera wrote: > Oh wow, thanks for noticing that. I had already rewritten the commit > message to some extent, but "master" had remained. Now I pushed the > patch to branches 14+, having replaced it as you suggested. When doing some unrelated work I noticed that in the new test 010_keep_recycled_wals.pl the server fails to reload the configuration file. The line it complains about is archive_command = '/usr/bin/perl -e 'exit(1)'' The test still succeeds for some reason. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On 11/20/24 9:50 PM, Jonathan S. Katz wrote: On 11/20/24 9:48 PM, Tom Lane wrote: "David G. Johnston" writes: On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: so when we decided to remove the downloads Can you elaborate on who "we" is here? More to the point, what downloads were removed? I still see the source tarballs in the usual place [1]. If some packager(s) removed or never posted derived packages, that's on them not the project. Downloads weren't removed, and I don't see why we'd want to do so in this case. Maybe here's the confusion - EDB doesn't have the downloads for the latest released posted on the Windows installer: https://www.enterprisedb.com/downloads/postgres-postgresql-downloads Jonathan
Re: POC, WIP: OR-clause support for indexes
On Wed, Nov 20, 2024 at 8:20 AM Andrei Lepikhov wrote: > On 18/11/2024 06:19, Alexander Korotkov wrote: > > On Fri, Nov 15, 2024 at 3:27 PM Alexander Korotkov > > wrote: > > Here is the next revision of this patch. No material changes, > > adjustments for comments and commit message. > I have passed through the code and found no issues. Maybe only phrase: > "eval_const_expressions() will be simplified if there is more than one." > which is used in both patches: here, the 'will' may be removed, as for me. Exactly same wording is used in match_index_to_operand(). So, I think we can save this. > Also, I re-read the thread, and as AFAICS, no other issues remain. So, I > think it would be OK to move the status of this feature to 'ready for > committer'. Yes, I also re-read the thread. One thing caught my eye is that Robert didn't answer my point that as we generally don't care about lazy parameters evaluation while pushing quals as index conds then we don't have to do this in this patch. I think there were quite amount of time to express disagreement if any. If even this question will arise, that's well isolated issue which could be nailed down later. I'm going to push this if no objections. Links. 1. https://www.postgresql.org/message-id/CAPpHfdt8kowRDUkmOnO7_WJJQ1uk%2BO379JiZCk_9_Pt5AQ4%2B0w%40mail.gmail.com -- Regards, Alexander Korotkov Supabase
Redundant initialization of table AM oid in relcache
Hi~Just found 'relam' field is assigned twice during relcache initialization. Wehave a specific area for initialization AM related stuff several lines later: /* * initialize the table am handler */ relation->rd_rel->relam = HEAP_TABLE_AM_OID; relation->rd_tableam = GetHeapamTableAmRoutine();so previous assignment seems not needed and can be removed.—Regards, Jingtang 0001-Remove-redundant-initialization-of-table-AM-oid-in-r.patch Description: Binary data
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: > so when we decided to remove the downloads Can you elaborate on who "we" is here? I don't recall this event happening. I suppose "encouraging people to wait" is arguably a bad position to take compared to directing them to a page on our wiki where the risk factors are laid out so they can make an informed decision based upon their situation. But that seems like a person-to-person matter and not something the project can take responsibility for or control. So, "immediately create a wiki page when PR-level problems arise" could be added to the "could have done better" list, so people have a URL to send instead of off-the-cuff advice. Obviously "alter role set role" is a quite common usage in our community yet we lack any regression or tap tests exercising it. That we could have done better and caught the bug in the CVE fix. If the CVEs do have mitigations available those should probably be noted even if we expect people to apply the minor updates that remove the vulnerability. If we didn't reason through and write out such mitigations for any of these 4 that would be something to consider going forward. David J.
Re: Add a write_to_file member to PgStat_KindInfo
On Wed, Nov 20, 2024 at 05:13:18PM +, Bertrand Drouvot wrote: > I don't have a strong opinion for this particular case here (I think the code > is harder to read but yeah there is some code reduction): so I'm fine with > v2 too. Well, I like the enthusiasm of having tests, but injection_points stats being persistent on disk is the only write property I want to keep in the module, because it can be useful across clean restarts. Your patch results in a weird mix where you now need to have a total of four stats IDs rather than two: one more for the fixed-numbered case and one more for the variable-numbered case to be able to control the write/no-write property set in shared memory. The correct design, if we were to control the write/no-write for individual entries, would be to store a state flag in each individual entries and decide if an entry should be flushed or not, which would require an additional callback to check the state of an individual entry at write. Not sure if we really need that, TBH, until someone comes up with a case for it. Your patch adding backend statistics would be a much better fit to add a test for this specific property, so let's keep injection_points out of it, even if it means that we would have only coverage for variable-numbered stats. So let's keep it simple here, and just set .write_to_file to true for both injection_points stats kinds per the scope of this thread. Another point is that injection_points acts as a template for custom pgstats, this makes the code harder to use as a reference. Point taken that the tests added in v2 show that the patch works. -- Michael signature.asc Description: PGP signature
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 20 Nov 2024 14:14:27 -0800, Masahiko Sawada wrote: > I've extracted the changes to refactor COPY TO/FROM to use the format > callback routines from v23 patch set, which seems to be a better patch > split to me. Also, I've reviewed these changes and made some changes > on top of them. The attached patches are: > > 0001: make COPY TO use CopyToRoutine. > 0002: minor changes to 0001 patch. will be fixed up. > 0003: make COPY FROM use CopyFromRoutine. > 0004: minor changes to 0003 patch. will be fixed up. > > I've confirmed that v24 has a similar performance improvement to v23. > Please check these extractions and minor change suggestions. Thanks. Here are my comments: 0002: +/* TEXT format */ "text" may be better than "TEXT". We use "text" not "TEXT" in other places. +static const CopyToRoutine CopyToRoutineText = { + .CopyToStart = CopyToTextLikeStart, + .CopyToOutFunc = CopyToTextLikeOutFunc, + .CopyToOneRow = CopyToTextOneRow, + .CopyToEnd = CopyToTextLikeEnd, +}; +/* BINARY format */ "binary" may be better than "BINARY". We use "binary" not "BINARY" in other places. +static const CopyToRoutine CopyToRoutineBinary = { + .CopyToStart = CopyToBinaryStart, + .CopyToOutFunc = CopyToBinaryOutFunc, + .CopyToOneRow = CopyToBinaryOneRow, + .CopyToEnd = CopyToBinaryEnd, +}; +/* Return COPY TO routines for the given option */ option -> options +static const CopyToRoutine * +CopyToGetRoutine(CopyFormatOptions opts) 0003: diff --git a/src/include/commands/copyapi.h b/src/include/commands/copyapi.h index 99981b1579..224fda172e 100644 --- a/src/include/commands/copyapi.h +++ b/src/include/commands/copyapi.h @@ -56,4 +56,46 @@ typedef struct CopyToRoutine void(*CopyToEnd) (CopyToState cstate); } CopyToRoutine; +/* + * API structure for a COPY FROM format implementation. Note this must be Should we remove a tab character here? 0004: diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index e77986f9a9..7f1de8a42b 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -106,31 +106,65 @@ typedef struct CopyMultiInsertInfo /* non-export function prototypes */ static void ClosePipeFromProgram(CopyFromState cstate); - /* - * CopyFromRoutine implementations for text and CSV. + * built-in format-specific routines. One-row callbacks are defined in built-in -> Built-in /* - * CopyFromTextLikeInFunc - * - * Assign input function data for a relation's attribute in text/CSV format. + * COPY FROM routines for built-in formats. ++ "+" -> " *" +/* TEXT format */ TEXT -> text? +/* BINARY format */ BINARY -> binary? +/* Return COPY FROM routines for the given option */ option -> options +static const CopyFromRoutine * +CopyFromGetRoutine(CopyFormatOptions opts) diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 0447c4df7e..5416583e94 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c +static bool CopyFromTextLikeOneRow(CopyFromState cstate, ExprContext *econtext, + Datum *values, bool *nulls, bool is_csv); Oh, I didn't know that we don't need inline in a function declaration. @@ -1237,7 +1219,7 @@ CopyReadLine(CopyFromState cstate, bool is_csv) /* * CopyReadLineText - inner loop of CopyReadLine for text mode */ -static pg_attribute_always_inline bool +static bool CopyReadLineText(CopyFromState cstate, bool is_csv) Is this an intentional change? CopyReadLineText() has "bool in_csv". Thanks, -- kou
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
"David G. Johnston" writes: > On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: >> so when we decided to remove the downloads > Can you elaborate on who "we" is here? More to the point, what downloads were removed? I still see the source tarballs in the usual place [1]. If some packager(s) removed or never posted derived packages, that's on them not the project. regards, tom lane [1] https://www.postgresql.org/ftp/source/
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On 11/20/24 9:18 PM, Bruce Momjian wrote: On Fri, Nov 15, 2024 at 06:28:42PM -0500, Jonathan Katz wrote: We're scheduling an out-of-cycle release on November 21, 2024 to address two regressions that were released as part of the November 14, 2024 update release[1]. As part of this release, we will issue fixes for all supported versions (17.2, 16.6, 15.10, 14.15, 13.20), and for 12.22, even though PostgreSQL 12 is now EOL. A high-level description of the regressions are as follows. 1. The fix for CVE-2024-10978 prevented `ALTER USER ... SET ROLE ...` from having any effect[2]. This will be fixed in the upcoming release. 2. Certain PostgreSQL extensions took a dependency on an Application Build Interface (ABI) that was modified in this release and caused them to break[3]. Currently, this can be mitigated by rebuilding the extensions against the updated definition. Please follow all standard guidelines for commits ahead of the release. Thanks for your help in assisting with this release, I want to point out a complexity of this out-of-cycle release. Our 17.1, etc. releases had four CVEs: https://www.postgresql.org/message-id/173159332163.1547975.13346191756810493...@wrigleys.postgresql.org so when we decided to remove the downloads and encourage people to wait for the 17.2 etc. releases, we had the known CVEs in Postgres releases with no recommended way to fix them. I am not sure what we could have done differently, but I am surprised we didn't get more complaints about the security situation we put them in. The announcement[1] specified the issues and advised waiting if users were impacted by them directly (and tried to be as specific as possible) and gave guidance to prevent help users avoid upgrading and then ending up in a situation where they're broken, regardless if they're impacted by the CVE or not (e.g. they don't have PL/Perl installed). That said, while it's certainly advisable to upgrade based on having CVEs in a release, many upgrade patterns are determined by the CVE score[2]. For example, a HIGH score (7.0 - 8.9 - our highest for this release was 8.8; 3 of them were less than 5.0) often dictates upgrading within 14-30 days of announcing the CVE, and lower scores having more time. This could be why people didn't complain, particularly because we got the announcement out 36 hours after the release, and stated the updates would be available within the next week. Thanks, Jonathan [1] https://www.postgresql.org/about/news/out-of-cycle-release-scheduled-for-november-21-2024-2958/ [2] https://www.first.org/cvss/v3.1/specification-document
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Wed, Nov 20, 2024 at 07:40:36PM -0700, David G. Johnston wrote: > On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: > > so when we decided to remove the downloads > > > Can you elaborate on who "we" is here? > > I don't recall this event happening. Uh, I only see 17.0 available for Windows, MacOS, and all EDB downloads, not 17.1: https://www.enterprisedb.com/downloads/postgres-postgresql-downloads I am not sure if other distributions removed 17.1. > I suppose "encouraging people to wait" is arguably a bad position to take > compared to directing them to a page on our wiki where the risk factors are > laid out so they can make an informed decision based upon their situation. > But > that seems like a person-to-person matter and not something the project can > take responsibility for or control. So, "immediately create a wiki page when > PR-level problems arise" could be added to the "could have done better" list, > so people have a URL to send instead of off-the-cuff advice. Interesting. > Obviously "alter role set role" is a quite common usage in our community yet > we > lack any regression or tap tests exercising it. That we could have done > better > and caught the bug in the CVE fix. Yes, I saw a lot of reports about this failure. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On Wed, Nov 20, 2024 at 09:49:27PM -0500, Jonathan Katz wrote: > That said, while it's certainly advisable to upgrade based on having CVEs in > a release, many upgrade patterns are determined by the CVE score[2]. For > example, a HIGH score (7.0 - 8.9 - our highest for this release was 8.8; 3 > of them were less than 5.0) often dictates upgrading within 14-30 days of > announcing the CVE, and lower scores having more time. This could be why > people didn't complain, particularly because we got the announcement out 36 > hours after the release, and stated the updates would be available within > the next week. Makes sense. This is the discussion I wanted to have. Thanks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, 19 Nov 2024 at 12:51, Nisha Moond wrote: > > On Thu, Nov 14, 2024 at 9:14 AM vignesh C wrote: > > > > On Wed, 13 Nov 2024 at 15:00, Nisha Moond wrote: > > > > > > Please find the v48 patch attached. > > > > > 2) Currently it allows a minimum value of less than 1 second like in > > milliseconds, I feel we can have some minimum value at least something > > like checkpoint_timeout: > > diff --git a/src/backend/utils/misc/guc_tables.c > > b/src/backend/utils/misc/guc_tables.c > > index 8a67f01200..367f510118 100644 > > --- a/src/backend/utils/misc/guc_tables.c > > +++ b/src/backend/utils/misc/guc_tables.c > > @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] = > > NULL, NULL, NULL > > }, > > > > + { > > + {"replication_slot_inactive_timeout", PGC_SIGHUP, > > REPLICATION_SENDING, > > + gettext_noop("Sets the amount of time a > > replication slot can remain inactive before " > > +"it will be invalidated."), > > + NULL, > > + GUC_UNIT_S > > + }, > > + &replication_slot_inactive_timeout, > > + 0, 0, INT_MAX, > > + NULL, NULL, NULL > > + }, > > > > Currently, the feature is disabled by default when > replication_slot_inactive_timeout = 0. However, if we set a minimum > value, the default_val cannot be less than min_val, making it > impossible to use 0 to disable the feature. > Thoughts or any suggestions? We could implement this similarly to how the vacuum_buffer_usage_limit GUC is handled. Setting the value to 0 would allow the operation to use any amount of shared_buffers. Otherwise, valid sizes would range from 128 kB to 16 GB. Similarly, we can modify check_replication_slot_inactive_timeout to behave in the same way as check_vacuum_buffer_usage_limit function. Regards, Vignesh
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On 11/20/24 10:08 PM, Bruce Momjian wrote: On Wed, Nov 20, 2024 at 09:51:09PM -0500, Jonathan Katz wrote: On 11/20/24 9:50 PM, Jonathan S. Katz wrote: On 11/20/24 9:48 PM, Tom Lane wrote: "David G. Johnston" writes: On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: so when we decided to remove the downloads Can you elaborate on who "we" is here? More to the point, what downloads were removed? I still see the source tarballs in the usual place [1]. If some packager(s) removed or never posted derived packages, that's on them not the project. Downloads weren't removed, and I don't see why we'd want to do so in this case. Maybe here's the confusion - EDB doesn't have the downloads for the latest released posted on the Windows installer: https://www.enterprisedb.com/downloads/postgres-postgresql-downloads Yes, or for MacOS. Well, why did EDB remove them? We didn't issue any guidance to remove downloads. We only provided guidance to users on decision making about whether to wait or not around the upgrade. All of the other packages hosted on community infrastructure (and AFAICT other OS distros) are all available. Jonathan
Re: IMPORTANT: Out-of-cycle release scheduled for November 21, 2024
On 11/20/24 9:48 PM, Tom Lane wrote: "David G. Johnston" writes: On Wed, Nov 20, 2024 at 7:18 PM Bruce Momjian wrote: so when we decided to remove the downloads Can you elaborate on who "we" is here? More to the point, what downloads were removed? I still see the source tarballs in the usual place [1]. If some packager(s) removed or never posted derived packages, that's on them not the project. Downloads weren't removed, and I don't see why we'd want to do so in this case. Jonathan
Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE
On Wed, 2024-11-20 at 22:54 +0100, Vik Fearing wrote: > On 20/11/2024 22:22, David Rowley wrote: > > On Thu, 21 Nov 2024 at 08:30, Guillaume Lelarge > > wrote: > > > OK, I'm fine with this. v4 patch attached with one plan showing read, > > > written, and dirtied buffers. > > I think this might be a good time for anyone out there who is against > > turning on BUFFERS when ANALYZE is on to speak up. > > > > Votes for changing this so far seem to be: Me, Michael Christofides, > > Guillaume Lelarge, Robert, Greg Sabino Mullane, and David Fetter (from > > 2020) [1]. > > Add me to the pro list, please. > > https://www.postgresql.org/message-id/b3197ba8-225f-f53c-326d-5b1756c77...@postgresfriends.org Me, too! https://postgr.es/m/790e175d16cca11244907d3366a6fa376c33e882.ca...@cybertec.at Yours, Laurenz Albe
Re: Logical replication timeout
On Wed, 6 Nov 2024 at 13:07, RECHTÉ Marc wrote: > > Hello, > > For some unknown reason (probably a very big transaction at the source), we > experienced a logical decoding breakdown, > due to a timeout from the subscriber side (either wal_receiver_timeout or > connexion drop by network equipment due to inactivity). > > The problem is, that due to that failure, the wal_receiver process stops. > When the wal_sender is ready to send some data, it finds the connexion broken > and exits. > A new wal_sender process is created that restarts from the beginning (restart > LSN). This is an endless loop. > > Checking the network connexion between wal_sender and wal_receiver, we found > that no traffic occurs for hours. > > We first increased wal_receiver_timeout up to 12h and still got a > disconnection on the receiver party: > > 2024-10-17 16:31:58.645 GMT [1356203:2] user=,db=,app=,client= ERROR: > terminating logical replication worker due to timeout > 2024-10-17 16:31:58.648 GMT [849296:212] user=,db=,app=,client= LOG: > background worker "logical replication worker" (PID 1356203) exited with exit > code 1 > > Then put this parameter to 0, but got then disconnected by the network (note > the slight difference in message): > > 2024-10-21 11:45:42.867 GMT [1697787:2] user=,db=,app=,client= ERROR: could > not receive data from WAL stream: could not receive data from server: > Connection timed out > 2024-10-21 11:45:42.869 GMT [849296:40860] user=,db=,app=,client= LOG: > background worker "logical replication worker" (PID 1697787) exited with exit > code 1 > > The message is generated in libpqrcv_receive function > (replication/libpqwalreceiver/libpqwalreceiver.c) which calls > pqsecure_raw_read (interfaces/libpq/fe-secure.c) > > The last message "Connection timed out" is the errno translation from the > recv system function: > > ETIMEDOUT Connection timed out (POSIX.1-2001) > > When those timeout occurred, the sender was still busy deleting files from > data/pg_replslot/bdcpb21_sene, accumulating more than 6 millions small > ".spill" files. > It seems this very long pause is at cleanup stage were PG is blindly trying > to delete those files. > > strace on wal sender show tons of calls like: > > unlink("pg_replslot/bdcpb21_sene/xid-2 721 821 917-lsn-439C-0.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > unlink("pg_replslot/bdcpb21_sene/xid-2721821917-lsn-439C-100.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > unlink("pg_replslot/bdcpb21_sene/xid-2721821917-lsn-439C-200.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > unlink("pg_replslot/bdcpb21_sene/xid-2721821917-lsn-439C-300.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > unlink("pg_replslot/bdcpb21_sene/xid-2721821917-lsn-439C-400.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > unlink("pg_replslot/bdcpb21_sene/xid-2721821917-lsn-439C-500.spill") = -1 > ENOENT (Aucun fichier ou dossier de ce type) > > This occurs in ReorderBufferRestoreCleanup > (backend/replication/logical/reorderbuffer.c). > The call stack presumes this may probably occur in DecodeCommit or > DecodeAbort (backend/replication/logical/decode.c): > > unlink("pg_replslot/bdcpb21_sene/xid-2730444214-lsn-43A6-8800.spill") = > -1 ENOENT (Aucun fichier ou dossier de ce type) > > /usr/lib64/libc-2.17.so(unlink+0x7) [0xf12e7] > > /usr/pgsql-15/bin/postgres(ReorderBufferRestoreCleanup.isra.17+0x5d) > [0x769e3d] > > /usr/pgsql-15/bin/postgres(ReorderBufferCleanupTXN+0x166) [0x76aec6] <=== > replication/logical/reorderbuff.c:1480 (mais cette fonction (static) n'est > utiliée qu'au sein de ce module ...) > > /usr/pgsql-15/bin/postgres(xact_decode+0x1e7) [0x75f217] <=== > replication/logical/decode.c:175 > > /usr/pgsql-15/bin/postgres(LogicalDecodingProcessRecord+0x73) [0x75eee3] > <=== replication/logical/decode.c:90, appelle la fonction rmgr.rm_decode(ctx, > &buf) = 1 des 6 méthodes du resource manager > > /usr/pgsql-15/bin/postgres(XLogSendLogical+0x4e) [0x78294e] > > /usr/pgsql-15/bin/postgres(WalSndLoop+0x151) [0x785121] > > /usr/pgsql-15/bin/postgres(exec_replication_command+0xcba) [0x785f4a] > > /usr/pgsql-15/bin/postgres(PostgresMain+0xfa8) [0x7d0588] > > /usr/pgsql-15/bin/postgres(ServerLoop+0xa8a) [0x493b97] > > /usr/pgsql-15/bin/postgres(PostmasterMain+0xe6c) [0x74d66c] > > /usr/pgsql-15/bin/postgres(main+0x1c5) [0x494a05] > > /usr/lib64/libc-2.17.so(__libc_start_main+0xf4) [0x22554] > > /usr/pgsql-15/bin/postgres(_start+0x28) [0x494fb8] > > We did not find any other option than deleting the subscription to stop that > loop and start a new one (thus loosing transactions). > > The publisher is PostgreSQL 15.6 > The subscriber is PostgreSQL 14.5 > > Thanks Hi, Do you have a reproducible test case for the above scenario? Please share the same. I am also trying to reproduce the above issue by generating large no. of spill files. Thanks and Regards,
Meson rebuilds and reinstalls autoinc and refint libraries during regression tests.
Hello! Accidentally I found that shared libraries autoinc.so and refint.so rewrites in the "tmp_install" folder during the regression tests. This happens because these libraries builds again for regression tests purposes and rewrites by the "install_test_files" test [0]. I tried to change this bahavior, but there are few problems: It seems, Meson doesn't have a function to just copy a file from one build directory to another. There is no way to install the product of the "shared_module()" function into the two different locations. "configure_file()" function in "copy mode" runs at setup time and cannot be used. "fs.copyfile()" function runs at build time, but I couldn't find the way to start it after the building of necessary modules. So this function fails with the "file not found" error. Also it needs the Meson version >= 0.64 And also there is no single cross platform copy program. I tried to use "cp" on Linux and "cmd /c copy..." on Windows, but the code has increased in size very much. Especially, after fixing the slash in the return value of the "meson.current_build_dir()" function on Windows. That slash arrives from the "subdir('src/test')" call [1]. To avoid double building of autoinc and refint libraries I suggest to use the "custom_target()" Meson function and own copy program. The "custom_target()" function can depends on contrib modules and use any script to just a copy libraries to the regression tests build directory. I included the patch which adds copy.py script into the regression tests directory and uses it to "build" autoinc and refint modules. What do you think? Thanks! Best regards, Roman Zharkov. [0] https://github.com/postgres/postgres/blob/f95da9f0e091ddbc5d50ec6c11be772da009b24e/src/test/regress/meson.build#L46 [1] https://github.com/postgres/postgres/blob/f95da9f0e091ddbc5d50ec6c11be772da009b24e/meson.build#L3095diff --git a/src/test/regress/copy.py b/src/test/regress/copy.py new file mode 100644 index ..9996e64474649855606a8968cf2f3129e792042f --- /dev/null +++ b/src/test/regress/copy.py @@ -0,0 +1,9 @@ +#! /usr/bin/env python3 + +import sys +import shutil + +if len(sys.argv) == 3: + shutil.copy(sys.argv[1], sys.argv[2]) +else: + raise Exception("this homemade copy program accepts two arguments") diff --git a/src/test/regress/meson.build b/src/test/regress/meson.build index 5a9be73531e9c38595d7aabbd1b17d0d94c2a8fd..df010b7dc02bf46cfb2bc8c1a2df47fd83f1b598 100644 --- a/src/test/regress/meson.build +++ b/src/test/regress/meson.build @@ -43,23 +43,19 @@ regress_module = shared_module('regress', ) test_install_libs += regress_module -# Get some extra C modules from contrib/spi but mark them as not to be -# installed. -# FIXME: avoid the duplication. - -autoinc_regress = shared_module('autoinc', - ['../../../contrib/spi/autoinc.c'], - kwargs: pg_test_mod_args, +autoinc_regress = custom_target('autoinc', + command: ['copy.py', autoinc.full_path(), meson.current_build_dir()], + output: fs.name(autoinc.full_path()), + depends: [autoinc], + build_by_default: true, ) -test_install_libs += autoinc_regress -refint_regress = shared_module('refint', - ['../../../contrib/spi/refint.c'], - c_args: refint_cflags, - kwargs: pg_test_mod_args, +refint_regress = custom_target('refint', + command: ['copy.py', refint.full_path(), meson.current_build_dir()], + output: fs.name(refint.full_path()), + depends: [refint], + build_by_default: true, ) -test_install_libs += refint_regress - tests += { 'name': 'regress',