Re: Add new error_action COPY ON_ERROR "log"
On Tue, Mar 12, 2024 at 12:22 PM Michael Paquier wrote: > > On Mon, Mar 11, 2024 at 06:00:00PM +0530, Bharath Rupireddy wrote: > > Please see the attached v6 patch set. > > I am tempted to tweak a few things in the docs, the comments and the > tests (particularly adding more patterns for tuples that fail on > conversion while it's clear that there are not enough attributes after > the incorrect values), but that looks roughly OK. +1. But, do you want to add them now or later as a separate patch/discussion altogether? > Wouldn't it be better to squash the patches together, by the way? I guess not. They are two different things IMV. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote: > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila wrote: > > > > You might want to consider its interaction with sync slots on standby. > > Say, there is no activity on slots in terms of processing the changes > > for slots. Now, we won't perform sync of such slots on standby showing > > them inactive as per your new criteria where as same slots could still > > be valid on primary as the walsender is still active. This may be more > > of a theoretical point as in running system there will probably be > > some activity but I think this needs some thougths. > > I believe the xmin and catalog_xmin of the sync slots on the standby > keep advancing depending on the slots on the primary, no? If yes, the > XID age based invalidation shouldn't be a problem. > > I believe there are no walsenders started for the sync slots on the > standbys, right? If yes, the inactive timeout based invalidation also > shouldn't be a problem. Because, the inactive timeouts for a slot are > tracked only for walsenders because they are the ones that typically > hold replication slots for longer durations and for real replication > use. We did a similar thing in a recent commit [1]. > > Is my understanding right? Do you still see any problems with it? Would that make sense to "simply" discard/prevent those kind of invalidations for "synced" slot on standby? I mean, do they make sense given the fact that those slots are not usable until the standby is promoted? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: DROP DATABASE is interruptible
Hi, 13.07.2023 23:52, Andres Freund wrote: Backpatching indeed was no fun. Not having BackgroundPsql.pm was the worst part. But also a lot of other conflicts in tests... Took me 5-6 hours or so. But I now finally pushed the fixes. Hope the buildfarm agrees with it... Thanks for the review! I've discovered that the test 037_invalid_database, introduced with c66a7d75e, hangs when a server built with -DCLOBBER_CACHE_ALWAYS or with debug_discard_caches = 1 set via TEMP_CONFIG: echo "debug_discard_caches = 1" >/tmp/extra.config TEMP_CONFIG=/tmp/extra.config make -s check -C src/test/recovery/ PROVE_TESTS="t/037*" # +++ tap check in src/test/recovery +++ [09:05:48] t/037_invalid_database.pl .. 6/? regress_log_037_invalid_database ends with: [09:05:51.622](0.021s) # issuing query via background psql: # CREATE DATABASE regression_invalid_interrupt; # BEGIN; # LOCK pg_tablespace; # PREPARE TRANSACTION 'lock_tblspc'; [09:05:51.684](0.062s) ok 8 - blocked DROP DATABASE completion I see two backends waiting: law 2420132 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] DROP DATABASE waiting law 2420135 2420108 0 09:05 ? 00:00:00 postgres: node: law postgres [local] startup waiting and the latter's stack trace: #0 0x7f65c8fd3f9a in epoll_wait (epfd=9, events=0x563c40e15478, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #1 0x563c3fa9a9fa in WaitEventSetWaitBlock (set=0x563c40e15410, cur_timeout=-1, occurred_events=0x7fff579dda80, nevents=1) at latch.c:1570 #2 0x563c3fa9a8e4 in WaitEventSetWait (set=0x563c40e15410, timeout=-1, occurred_events=0x7fff579dda80, nevents=1, wait_event_info=50331648) at latch.c:1516 #3 0x563c3fa99b14 in WaitLatch (latch=0x7f65c5e112e4, wakeEvents=33, timeout=0, wait_event_info=50331648) at latch.c:538 #4 0x563c3fac7dee in ProcSleep (locallock=0x563c40e41e80, lockMethodTable=0x563c4007cba0 ) at proc.c:1339 #5 0x563c3fab4160 in WaitOnLock (locallock=0x563c40e41e80, owner=0x563c40ea5af8) at lock.c:1816 #6 0x563c3fab2c80 in LockAcquireExtended (locktag=0x7fff579dde30, lockmode=1, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x7fff579dde28) at lock.c:1080 #7 0x563c3faaf86d in LockRelationOid (relid=1213, lockmode=1) at lmgr.c:116 #8 0x563c3f537aff in relation_open (relationId=1213, lockmode=1) at relation.c:55 #9 0x563c3f5efde9 in table_open (relationId=1213, lockmode=1) at table.c:44 #10 0x563c3fca2227 in CatalogCacheInitializeCache (cache=0x563c40e8fe80) at catcache.c:980 #11 0x563c3fca255e in InitCatCachePhase2 (cache=0x563c40e8fe80, touch_index=true) at catcache.c:1083 #12 0x563c3fcc0556 in InitCatalogCachePhase2 () at syscache.c:184 #13 0x563c3fcb7db3 in RelationCacheInitializePhase3 () at relcache.c:4317 #14 0x563c3fce2748 in InitPostgres (in_dbname=0x563c40e54000 "postgres", dboid=5, username=0x563c40e53fe8 "law", useroid=0, flags=1, out_dbname=0x0) at postinit.c:1177 #15 0x563c3fad90a7 in PostgresMain (dbname=0x563c40e54000 "postgres", username=0x563c40e53fe8 "law") at postgres.c:4229 #16 0x563c3f9f01e4 in BackendRun (port=0x563c40e45360) at postmaster.c:4475 It looks like no new backend can be started due to the pg_tablespace lock, when a new relcache file is needed during the backend initialization. Best regards, Alexander
Re: Built-in CTYPE provider
On 08.03.24 02:00, Jeff Davis wrote: And here's v22 (I didn't post v21). I committed Unicode property tables and functions, and the simple case mapping. I separated out the full case mapping changes (based on SpecialCasing.txt) into patch 0006. 0002: Basic builtin collation provider that only supports "C". Overall, this patch looks sound. In the documentation, let's make the list of locale providers an actual list instead of a sequence of s. We had some discussion on initdb option --builtin-locale and whether it should be something more general. I'm ok with leaving it like this for now and maybe consider as an "open item" for PG17. In errmsg("parameter \"locale\" must be specified") make "locale" a placeholder. (See commit 36a14afc076). It seems the builtin provider accepts both "C" and "POSIX" as locale names, but the documentation says it must be "C". Maybe we don't need to accept "POSIX"? (Seeing that there are no plans for "POSIX.UTF-8", maybe we just ignore the "POSIX" spelling altogether?) Speaking of which, the code in postinit.c is inconsistent in that respect with builtin_validate_locale(). Shouldn't postinit.c use builtin_validate_locale(), to keep it consistent? Or, there could be a general function that accepts a locale provider and a locale string and validates everything together? In initdb.c, this message printf(_("The database cluster will be initialized with no locale.\n")); sounds a bit confusing. I think it's ok to show "C" as a locale. I'm not sure we need to change the logic here. Also in initdb.c, this message pg_fatal("locale must be specified unless provider is libc"); should be flipped around, like locale must be specified if provider is %s In pg_dump.c, dumpDatabase(), there are some new warning messages that are not specifically about the builtin provider. Are those existing deficiencies? It's not clear to me. What are the changes in the pg_upgrade test about? Maybe explain the scenario it is trying to test briefly? 0004: Inline some UTF-8 functions to improve performance Makes sense that inlining can be effective here. But why aren't you just inlining the existing function pg_utf_mblen()? Now we have two functions that do the same thing. And the comment at pg_utf_mblen() is removed completely, so it's not clear anymore why it exists. 0005: Add a unicode_strtitle() function and move the implementation for the builtin provider out of formatting.c. In the recent discussion you had expression some uncertainty about the detailed semantics of this. INITCAP() was copied from Oracle, so we could check there for reference, too. Or we go with full Unicode semantics. I'm not clear on all the differences and tradeoffs, if there are any. In any case, it would be good if there were documentation or a comment that somehow wrote down the resolution of this.
Re: Transaction timeout
> On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: > > I think if checking psql stderr is problematic, checking just logs is > fine. Could we wait for the relevant log messages one by one with > $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? PFA version with $node->wait_for_log() Best regards, Andrey Borodin. v4-0001-Add-timeouts-TAP-tests.patch Description: Binary data
Re: Disconnect if socket cannot be put into non-blocking mode
On 11/03/2024 16:44, Heikki Linnakangas wrote: While self-reviewing my "Refactoring backend fork+exec code" patches, I noticed this in pq_init(): /* * In backends (as soon as forked) we operate the underlying socket in * nonblocking mode and use latches to implement blocking semantics if * needed. That allows us to provide safely interruptible reads and * writes. * * Use COMMERROR on failure, because ERROR would try to send the error to * the client, which might require changing the mode again, leading to * infinite recursion. */ #ifndef WIN32 if (!pg_set_noblock(MyProcPort->sock)) ereport(COMMERROR, (errmsg("could not set socket to nonblocking mode: %m"))); #endif #ifndef WIN32 /* Don't give the socket to any subprograms we execute. */ if (fcntl(MyProcPort->sock, F_SETFD, FD_CLOEXEC) < 0) elog(FATAL, "fcntl(F_SETFD) failed on socket: %m"); #endif Using COMMERROR here seems bogus. Firstly, if there was a problem with recursion, surely the elog(FATAL) that follows would also be wrong. But more seriously, it's not cool to continue using the connection as if everything is OK, if we fail to put it into non-blocking mode. We should disconnect. (COMMERROR merely logs the error, it does not bail out like ERROR does) Barring objections, I'll commit and backpatch the attached to fix that. Committed. -- Heikki Linnakangas Neon (https://neon.tech)
Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
On Sat, 9 Mar 2024 08:50:28 -0600 Nathan Bossart wrote: > On Sat, Mar 09, 2024 at 11:57:18AM +0900, Yugo NAGATA wrote: > > On Fri, 8 Mar 2024 16:17:58 -0600 > > Nathan Bossart wrote: > >> Is this guaranteed to be TOASTed for all possible page sizes? > > > > Should we use block_size? > > > > SHOW block_size \gset > > INSERT INTO test_chunk_id(v1,v2) > > VALUES (repeat('x', 1), repeat('x', (:block_size / 4))); > > > > I think this will work in various page sizes. > > WFM > > > +SHOW block_size; \gset > > + block_size > > + > > + 8192 > > +(1 row) > > I think we need to remove the ';' so that the output of the query is not > saved in the ".out" file. With that change, this test passes when Postgres > is built with --with-blocksize=32. However, many other unrelated tests > begin failing, so I guess this fix isn't tremendously important. I rewrote the patch to use current_setting('block_size') instead of SHOW and \gset as other tests do. Although some tests are failing with block_size=32, I wonder it is a bit better to use "block_size" instead of the constant to make the test more general to some extent. Regards, Yugo Nagata > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com -- Yugo NAGATA >From 5b3be1ca6f8d8bafc1d9ce7bea252f364c9c09a9 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Wed, 29 Mar 2023 09:59:25 +0900 Subject: [PATCH v9] Add pg_column_toast_chunk_id function This function returns the chunk_id of an on-disk TOASTed value, or NULL if the value is un-TOASTed or not on disk. This enables users to know which columns are actually TOASTed. This function is also useful to identify a problematic row when an error like "ERROR: unexpected chunk number ... (expected ...) for toast value" occurs. --- doc/src/sgml/func.sgml | 17 src/backend/utils/adt/varlena.c | 41 src/include/catalog/pg_proc.dat | 3 ++ src/test/regress/expected/misc_functions.out | 21 ++ src/test/regress/sql/misc_functions.sql | 23 +++ 5 files changed, 105 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0bb7aeb40e..3169e2aeb7 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28552,6 +28552,23 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset + + + + pg_column_toast_chunk_id + +pg_column_toast_chunk_id ( "any" ) +oid + + +Shows the chunk_id of an on-disk +TOASTed value. Returns NULL +if the value is un-TOASTed or not on-disk. +See for details about +TOAST. + + + diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c index 543afb66e5..84d36781a4 100644 --- a/src/backend/utils/adt/varlena.c +++ b/src/backend/utils/adt/varlena.c @@ -5105,6 +5105,47 @@ pg_column_compression(PG_FUNCTION_ARGS) PG_RETURN_TEXT_P(cstring_to_text(result)); } +/* + * Return the chunk_id of the on-disk TOASTed value. + * Return NULL if the value is unTOASTed or not on disk. + */ +Datum +pg_column_toast_chunk_id(PG_FUNCTION_ARGS) +{ + int typlen; + struct varlena *attr; + struct varatt_external toast_pointer; + + /* On first call, get the input type's typlen, and save at *fn_extra */ + if (fcinfo->flinfo->fn_extra == NULL) + { + /* Lookup the datatype of the supplied argument */ + Oid argtypeid = get_fn_expr_argtype(fcinfo->flinfo, 0); + + typlen = get_typlen(argtypeid); + if (typlen == 0) /* should not happen */ + elog(ERROR, "cache lookup failed for type %u", argtypeid); + + fcinfo->flinfo->fn_extra = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt, + sizeof(int)); + *((int *) fcinfo->flinfo->fn_extra) = typlen; + } + else + typlen = *((int *) fcinfo->flinfo->fn_extra); + + if (typlen != -1) + PG_RETURN_NULL(); + + attr = (struct varlena *) DatumGetPointer(PG_GETARG_DATUM(0)); + + if (!VARATT_IS_EXTERNAL_ONDISK(attr)) + PG_RETURN_NULL(); + + VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); + + PG_RETURN_OID(toast_pointer.va_valueid); +} + /* * string_agg - Concatenates values and returns string. * diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 291ed876fc..443ca854a6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7447,6 +7447,9 @@ { oid => '2121', descr => 'compression method for the compressed datum', proname => 'pg_column_compression', provolatile => 's', prorettype => 'text', proargtypes => 'any', prosrc => 'pg_column_compression' }, +{ oid => '8393', descr => 'chunk ID of on-disk TOASTed value', + proname => 'pg_column_toast_chunk_id', provolatile => 's', prorettype => 'oid', + proargtypes => 'any', prosrc => 'pg_column_toast_chunk_id' }, { oid => '2322', descr => 'total disk space usage for the specified ta
Re: Statistics Import and Export
Greetings, * Corey Huinker (corey.huin...@gmail.com) wrote: > > > One persistent problem is that there is no _safe equivalent to ARRAY_IN, > > so > > > that can always fail on us, though it should only do so if the string > > > passed in wasn't a valid array input format, or the values in the array > > > can't coerce to the attribute's basetype. > > > > That would happen before we even get to being called and there's not > > much to do about it anyway. > > Not sure I follow you here. the ARRAY_IN function calls happen once for > every non-null stavaluesN parameter, and it's done inside the function > because the result type could be the base type for a domain/array type, or > could be the type itself. I suppose we could move that determination to the > caller, but then we'd need to call get_base_element_type() inside a client, > and that seems wrong if it's even possible. Ah, yeah, ok, I see what you're saying here and sure, there's a risk those might ERROR too, but that's outright invalid data then as opposed to a NULL getting passed in. > > > Or one compound command > > > > > > SELECT pg_set_relation_stats(t.oid, ...) > > > pg_set_attribute_stats(t.oid, 'id'::name, ...), > > > pg_set_attribute_stats(t.oid, 'last_name'::name, ...), > > > ... > > > FROM (VALUES('foo.bar'::regclass)) AS t(oid); > > > > > > The second one has the feature that if any one attribute fails, then the > > > whole update fails, except, of course, for the in-place update of > > pg_class. > > > This avoids having an explicit transaction block, but we could get that > > > back by having restore wrap the list of commands in a transaction block > > > (and adding the explicit lock commands) when it is safe to do so. > > > > Hm, I like this approach as it should essentially give us the > > transaction block we had been talking about wanting but without needing > > to explicitly do a begin/commit, which would add in some annoying > > complications. This would hopefully also reduce the locking concern > > mentioned previously, since we'd get the lock needed in the first > > function call and then the others would be able to just see that we've > > already got the lock pretty quickly. > > True, we'd get the lock needed in the first function call, but wouldn't we > also release that very lock before the subsequent call? Obviously we'd be > shrinking the window in which another process could get in line and take a > superior lock, and the universe of other processes that would even want a > lock that blocks us is nil in the case of an upgrade, identical to existing > behavior in the case of an FDW ANALYZE, and perfectly fine in the case of > someone tinkering with stats. No, we should be keeping the lock until the end of the transaction (which in this case would be just the one statement, but it would be the whole statement and all of the calls in it). See analyze.c:268 or so, where we call relation_close(onerel, NoLock); meaning we're closing the relation but we're *not* releasing the lock on it- it'll get released at the end of the transaction. Thanks! Stephen signature.asc Description: PGP signature
Re: Spurious pgstat_drop_replslot() call
Hello Bertrand and Michael, 12.03.2024 09:17, Bertrand Drouvot wrote: May I suggest the attached additions with LWLockHeldByMeInMode to make sure that the stats are dropped and created while we hold ReplicationSlotAllocationLock? Yeah, makes fully sense and looks good to me. Sorry for a bit off-topic, but I've remembered an anomaly with a similar assert: https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org Maybe you would find it worth considering while working in this area... (I've just run that reproducer on b36fbd9f8 and confirmed that the assertion failure is still here.) Best regards, Alexander
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Here's a last one for the cfbot. I have a question about this one int PQcancelStart(PGcancelConn *cancelConn) { [...] if (cancelConn->conn.status != CONNECTION_ALLOCATED) { libpq_append_conn_error(&cancelConn->conn, "cancel request is already being sent on this connection"); cancelConn->conn.status = CONNECTION_BAD; return 0; } If we do this and we see conn.status is not ALLOCATED, meaning a cancel is already ongoing, shouldn't we leave conn.status alone instead of changing to CONNECTION_BAD? I mean, we shouldn't be juggling the elbow of whoever's doing that, should we? Maybe just add the error message and return 0? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations) >From fc0cbf0a6184d374e12e051f88f9f8eef7cc30d9 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 12 Mar 2024 10:09:25 +0100 Subject: [PATCH v34 1/2] libpq: Add encrypted and non-blocking query cancellation routines The existing PQcancel API uses blocking IO, which makes PQcancel impossible to use in an event loop based codebase without blocking the event loop until the call returns. It also doesn't encrypt the connection over which the cancel request is sent, even when the original connection required encryption. This commit adds a PQcancelConn struct and assorted functions, which provide a better mechanism of sending cancel requests; in particular all the encryption used in the original connection are also used in the cancel connection. The main entry points are: - PQcancelCreate creates the PQcancelConn based on the original connection (but does not establish an actual connection). - PQcancelStart can be used to initiate non-blocking cancel requests, using encryption if the original connection did so, which must be pumped using - PQcancelPoll. - PQcancelReset puts a PQcancelConn back in state so that it can be reused to send a new cancel request to the same connection. - PQcancelBlocking is a simpler-to-use blocking API that still uses encryption. Additional functions are - PQcancelStatus, mimicks PQstatus; - PQcancelSocket, mimicks PQcancelSocket; - PQcancelErrorMessage, mimicks PQerrorMessage; - PQcancelFinish, mimicks PQfinish. Author: Jelte Fennema-Nio Reviewed-by: Denis Laxalde Discussion: https://postgr.es/m/am5pr83mb0178d3b31ca1b6ec4a8ecc42f7...@am5pr83mb0178.eurprd83.prod.outlook.com --- doc/src/sgml/libpq.sgml | 465 -- src/interfaces/libpq/exports.txt | 9 + src/interfaces/libpq/fe-cancel.c | 297 ++- src/interfaces/libpq/fe-connect.c | 129 - src/interfaces/libpq/libpq-fe.h | 31 +- src/interfaces/libpq/libpq-int.h | 5 + .../modules/libpq_pipeline/libpq_pipeline.c | 125 + src/tools/pgindent/typedefs.list | 1 + 8 files changed, 1015 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a2bbf33d02..373d0dc322 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -265,7 +265,7 @@ PGconn *PQsetdb(char *pghost, PQconnectStartParamsPQconnectStartParams PQconnectStartPQconnectStart - PQconnectPollPQconnectPoll + PQconnectPollPQconnectPoll nonblocking connection @@ -5287,7 +5287,7 @@ int PQisBusy(PGconn *conn); / can also attempt to cancel a command that is still being processed by the server; see . But regardless of - the return value of , the application + the return value of , the application must continue with the normal result-reading sequence using . A successful cancellation will simply cause the command to terminate sooner than it would have @@ -6034,10 +6034,402 @@ int PQsetSingleRowMode(PGconn *conn); SQL command - - A client application can request cancellation of a command that is - still being processed by the server, using the functions described in - this section. + + Functions for Sending Cancel Requests + + + PQcancelCreatePQcancelCreate + + + + Prepares a connection over which a cancel request can be sent. + +PGcancelConn *PQcancelCreate(PGconn *conn); + + + + +creates a + PGcancelConnPGcancelConn + object, but it won't instantly start sending a cancel request over this + connection. A cancel request can be sent over this connection in a + blocking manner using and in a + non-blocking manner using . + The return value can be passed to + to check if the PGcancelConn object was + created successfully. The PGcancelConn object + is an opaque structure that is not meant to be accessed directly by the + application. This PGcancelConn object can be + used to cancel the que
Re: remaining sql/json patches
Hi, wanted to share the below case: ‘postgres[146443]=#’SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000, "department_id":1}', '$.* ? (@== $dept_id && @ == $sal)' PASSING 1000 AS sal, 1 as dept_id); json_exists - f (1 row) isn't it supposed to return "true" as json in input is matching with both the condition dept_id and salary? -- Regards, Himanshu Upadhyaya EnterpriseDB: http://www.enterprisedb.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > If we do this and we see conn.status is not ALLOCATED, meaning a cancel > is already ongoing, shouldn't we leave conn.status alone instead of > changing to CONNECTION_BAD? I mean, we shouldn't be juggling the elbow > of whoever's doing that, should we? Maybe just add the error message > and return 0? I'd rather fail as hard as possible when someone is using the API wrongly. Not doing so is bound to cause confusion imho. e.g. if the state is still CONNECTION_OK because the user forgot to call PQcancelReset then keeping the connection status "as is" might seem as if the cancel request succeeded even though nothing happened. So if the user uses the API incorrectly then I'd rather use all the avenues possible to indicate that there was an error. Especially since in all other cases if PQcancelStart returns false CONNECTION_BAD is the status, and this in turn means that PQconnectPoll will return PGRES_POLLING_FAILED. So I doubt people will always check the actual return value of the function to check if an error happened. They might check PQcancelStatus or PQconnectPoll instead, because that integrates easier with the rest of their code.
Re: Add bump memory context type and use it for tuplesorts
On Tue, Mar 12, 2024 at 6:41 AM David Rowley wrote: > Thanks for trying this out. I didn't check if the performance was > susceptible to the memory size before the reset. It certainly would > be once the allocation crosses some critical threshold of CPU cache > size, but probably it will also be to some extent regarding the number > of actual mallocs that are required underneath. I neglected to mention it, but the numbers I chose did have the L2/L3 cache in mind, but the reset frequency didn't seem to make much difference. > I see there's some discussion of bump in [1]. Do you still have a > valid use case for bump for performance/memory usage reasons? Yeah, that was part of my motivation for helping test, although my interest is in saving memory in cases of lots of small allocations. It might help if I make this a little more concrete, so I wrote a quick-and-dirty function to measure the bytes used by the proposed TID store and the vacuum's current array. Using bitmaps really shines with a high number of offsets per block, e.g. with about a million sequential blocks, and 49 offsets per block (last parameter is a bound): select * from tidstore_memory(0,1*1001*1000, 1,50); array_mem | ts_mem ---+-- 294294000 | 42008576 The break-even point with this scenario is around 7 offsets per block: select * from tidstore_memory(0,1*1001*1000, 1,8); array_mem | ts_mem ---+-- 42042000 | 42008576 Below that, the array gets smaller, but the bitmap just has more empty space. Here, 8 million bytes are used by the chunk header in bitmap allocations, so the bump context would help there (I haven't actually tried). Of course, the best allocation is no allocation at all, and I have a draft patch to store up to 3 offsets in the last-level node's pointer array, so for 2 or 3 offsets per block we're smaller than the array again: select * from tidstore_memory(0,1*1001*1000, 1,4); array_mem | ts_mem ---+- 18018000 | 8462336 Sequential blocks are not the worst case scenario for memory use, but this gives an idea of what's involved. So, with aset, on average I still expect to use quite a bit less memory, with some corner cases that use more. The bump context would be some extra insurance to reduce those corner cases, where there are a large number of blocks in play.
Re: Reports on obsolete Postgres versions
> On 12 Mar 2024, at 02:37, Nathan Bossart wrote: > > On Mon, Mar 11, 2024 at 05:17:13PM -0400, Bruce Momjian wrote: >> On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote: >>> I've read that the use of the term "minor release" can be confusing. While >>> the versioning page clearly describes what is eligible for a minor release, >>> not everyone reads it, so I suspect that many folks think there are new >>> features, etc. in minor releases. I think a "minor release" of Postgres is >>> more similar to what other projects would call a "patch version." >> >> Well, we do say: >> >> While upgrading will always contain some level of risk, PostgreSQL >> minor releases fix only frequently-encountered bugs, security issues, >> and data corruption problems to reduce the risk associated with >> upgrading. For minor releases, the community considers not upgrading to >> be riskier than upgrading. >> >> but that is far down the page. Do we need to improve this? > > I think making that note more visible would certainly be an improvement. We have this almost at the top of the page, which IMHO isn't a very good description about what a minor version is: Each major version receives bug fixes and, if need be, security fixes that are released at least once every three months in what we call a "minor release." Maybe we can rewrite that sentence to properly document what a minor is (bug fixes *and* security fixes) with a small blurb about the upgrade risk? (Adding Jonathan in CC: who is good at website copy). -- Daniel Gustafsson
Re: Reports on obsolete Postgres versions
Hi, On Mon, Mar 11, 2024 at 05:17:13PM -0400, Bruce Momjian wrote: > On Mon, Mar 11, 2024 at 04:12:04PM -0500, Nathan Bossart wrote: > > On Mon, Mar 11, 2024 at 04:37:49PM -0400, Bruce Momjian wrote: > > > https://www.postgresql.org/support/versioning/ > > > > > > This web page should correct the idea that "upgrades are more risky than > > > staying with existing versions". Is there more we can do? Should we > > > have a more consistent response for such reporters? > > > > I've read that the use of the term "minor release" can be confusing. While > > the versioning page clearly describes what is eligible for a minor release, > > not everyone reads it, so I suspect that many folks think there are new > > features, etc. in minor releases. I think a "minor release" of Postgres is > > more similar to what other projects would call a "patch version." > > Well, we do say: > > While upgrading will always contain some level of risk, PostgreSQL > minor releases fix only frequently-encountered bugs, security issues, > and data corruption problems to reduce the risk associated with > upgrading. For minor releases, the community considers not upgrading to > be riskier than upgrading. > > but that is far down the page. Do we need to improve this? I liked the statement from Laurenz a while ago on his blog (paraphrased): "Upgrading to the latest patch release does not require application testing or recertification". I am not sure we want to put that into the official page (or maybe tone down/qualify it a bit), but I think a lot of users stay on older minor versions because they dread their internal testing policies. The other thing that could maybe be made a bit better is the fantastic patch release schedule, which however is buried in the "developer roadmap". I can see how this was useful years ago, but I think this page should be moved to the end-user part of the website, and maybe (also) integrated into the support/versioning page? Michael
Re: speed up a logical replica setup
On Thu, Mar 7, 2024 at 10:44 PM Tomas Vondra wrote: > > 4) Is FOR ALL TABLES a good idea? > > I'm not sure FOR ALL TABLES is a good idea. Or said differently, I'm > sure it won't work for a number of use cases. I know large databases > it's common to create "work tables" (not necessarily temporary) as part > of a batch job, but there's no need to replicate those tables. > > AFAIK that'd break this FOR ALL TABLES publication, because the tables > will qualify for replication, but won't be present on the subscriber. Or > did I miss something? > As the subscriber is created from standby, all the tables should be present at least initially during and after creating the subscriber. Users are later free to modify the publications/subscriptions. > I do understand that FOR ALL TABLES is the simplest approach, and for v1 > it may be an acceptable limitation, but maybe it'd be good to also > support restricting which tables should be replicated (e.g. blacklist or > whitelist based on table/schema name?). > This would be useful, but OTOH could also be enhanced in a later version unless we think it is a must for the first version. -- With Regards, Amit Kapila.
Re: Properly pathify the union planner
On Mon, 11 Mar 2024 at 19:56, Richard Guo wrote: > * There are cases where the setop_pathkeys of a subquery does not match > the union_pathkeys generated in generate_union_paths() for sorting by > the whole target list. In such cases, even if we have explicitly sorted > the paths of the subquery to meet the ordering required for its > setop_pathkeys, we cannot find appropriate ordered paths for > MergeAppend. For instance, > > explain (costs off) > (select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b); > QUERY PLAN > --- > Unique >-> Sort > Sort Key: t1.a, t1.b > -> Append >-> Index Only Scan using t_a_b_idx on t t1 > Filter: (a = b) >-> Index Only Scan using t_a_b_idx on t t2 > Filter: (a = b) > (8 rows) > > (Assume t_a_b_idx is a btree index on t(a, b)) > > In this query, the setop_pathkeys of the subqueries includes only one > PathKey because 'a' and 'b' are in the same EC inside the subqueries, > while the union_pathkeys of the whole query includes two PathKeys, one > for each target entry. After we convert the setop_pathkeys to outer > representation, we'd notice that it does not match union_pathkeys. > Consequently, we are unable to recognize that the index scan paths are > already appropriately sorted, leading us to miss the opportunity to > utilize MergeAppend. > > Not sure if this case is common enough to be worth paying attention to. I've spent almost all day looking into this, which is just enough work to satisfy myself this *is* future work rather than v17 work. The reason I feel this is future work rather than work for this patch is that this is already a limitation of subqueries in general and it's not unique to union child queries. For example: create table ab(a int, b int, primary key(a,b)); insert into ab select x,y from generate_series(1,100)x,generate_series(1,100)y; vacuum analyze ab; explain select * from (select a,b from ab where a = 1 order by a,b limit 10) order by a,b; The current plan for this is: QUERY PLAN - Sort Sort Key: ab.a, ab.b -> Limit -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (5 rows) The additional sort isn't required but is added because the outer query requires the pathkeys {a,b} and the inner query only has the pathkey {b}. {a} is removed due to it being redundant because of the const member. The outer query does not know about the redundant pathkeys so think the subquery is only sorted by {b} therefore adds the sort on "a", "b". The attached 0001 patch (renamed as .txt so it's ignored by the CFbot) adjusts convert_subquery_pathkeys() to have it look a bit deeper and try harder to match the path to the outer query's query_pathkeys. After patching with that, the plan becomes: QUERY PLAN --- Limit -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (3 rows) The patch is still incomplete as the matching is quite complex for the case you mentioned with a=b. It's a bit late here to start making that work, but I think the key to make that work is to give subquery_matches_pathkeys() an extra parameter or 2 for where to start working on each list and recursively call itself where I've left the TODO comment in the function and on the recursive call, try the next query_pathkeys and the same sub_pathkey. If the recursive call function returns false, continue on trying to match the normal way. If it returns true, return true. There'd be a bit more work elsewhere to do to make this work for the general case. For example: explain (costs off) select * from (select a,b from ab where a = 1 offset 0) order by a,b; still produces the following plan with the patched version: QUERY PLAN --- Sort Sort Key: ab.a, ab.b -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (4 rows) the reason for this is that the subquery does not know the outer query would like the index path to have the pathkeys {a,b}. I've not looked at this issue in detail but I suspect we could do something somewhere like set_subquery_pathlist() to check if the outer query's query_pathkeys are all Vars from the subquery. I think we'd need to trawl through the eclasses to ensure that each query_pathkeys eclasses contains a member mentioning only a Var from the subquery and if so, get the subquery to set those pathkeys. Anyway, this is all at least PG18 work so I'll raise a thread about it around June. The patch is included in case you want to mess around with it. I'd be happy if you want to look into the subquery pathkey issue portion of the work. I won't be giving this much more focus until after the freeze. > * In build_se
Re: a wrong index choose when statistics is out of date
On 8/3/2024 18:53, Andy Fan wrote: I just reviewed the bad queries plan for the past half years internally, I found many queries used the Nested loop which is the direct cause. now I think I find out a new reason for this, because the missed optimizer statistics cause the rows in outer relation to be 1, which make the Nest loop is choosed. I'm not sure your idea could help on this or can help on this than mine at this aspect. Having had the same problem for a long time, I've made an attempt and invented a patch that probes an index to determine whether the estimated constant is within statistics' scope. I remember David's remark on the overhead problem, but I don't argue it here. This patch is on the table to have one more solution sketch for further discussion. Also, Andy, if you have a specific problem with index choosing, you can try a tiny option that makes the index-picking technique less dependent on the ordering of index lists [1]. [1] https://www.postgresql.org/message-id/9b3dbf6d-776a-4953-a5a4-609929393...@postgrespro.ru -- regards, Andrei Lepikhov Postgres Professional diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c index 2635050861..bc8e59bbf3 100644 --- a/src/backend/utils/adt/like_support.c +++ b/src/backend/utils/adt/like_support.c @@ -643,7 +643,7 @@ patternsel_common(PlannerInfo *root, /* * Pattern specifies an exact match, so estimate as for '=' */ - result = var_eq_const(&vardata, eqopr, collation, prefix->constvalue, + result = var_eq_const(root, &vardata, eqopr, collation, prefix->constvalue, false, true, false); } else @@ -1294,7 +1294,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata, * probably off the end of the histogram, and thus we probably got a very * small estimate from the >= condition; so we still need to clamp. */ - eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue, + eq_sel = var_eq_const(root, vardata, eqopr, collation, prefixcon->constvalue, false, true, false); prefixsel = Max(prefixsel, eq_sel); diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index cea777e9d4..be6213046e 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -273,7 +273,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) * in the query.) */ if (IsA(other, Const)) - selec = var_eq_const(&vardata, operator, collation, + selec = var_eq_const(root, &vardata, operator, collation, ((Const *) other)->constvalue, ((Const *) other)->constisnull, varonleft, negate); @@ -286,13 +286,133 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate) return selec; } +/* + * Lookup the value in the index and try to estimate number of the tuples with + * the same value. + */ +static Selectivity +estimate_ntuples_by_index(PlannerInfo *root, VariableStatData *vardata, + Datum constval, + Oid collation, Oid regprocoid, int32 stawidth) +{ + RelOptInfo *rel = vardata->rel; + RangeTblEntry *rte = root->simple_rte_array[rel->relid]; + ListCell *lc; + int ntuples = 0; + Selectivity selec = -1.; + + foreach(lc, rel->indexlist) + { + IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); + ScanKeyData scankeys[1]; + SnapshotDataSnapshotNonVacuumable; + IndexScanDesc index_scan; + RelationheapRel; + RelationindexRel; + + if (index->relam != BTREE_AM_OID) + continue; + if (index->indpred != NIL) + continue; + if (index->hypothetical) + continue; + if (!match_index_to_operand(vardata->var, 0, index)) + continue; + + heapRel = table_open(rte->relid, NoLock); + indexRel = index_open(index->indexoid, NoLock); + + ScanKeyEntryInitialize(&scankeys[0], + 0, + 1, + BTEqualStrategyNumber, + vardata->atttype, +
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada wrote: > > On Mon, Mar 11, 2024 at 12:20 PM John Naylor wrote: > > > > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada > > wrote: > > + ts->context = CurrentMemoryContext; > > > > As far as I can tell, this member is never accessed again -- am I > > missing something? > > You're right. It was used to re-create the tidstore in the same > context again while resetting it, but we no longer support the reset > API. Considering it again, would it be better to allocate the iterator > struct in the same context as we store the tidstore struct? That makes sense. > > + /* DSA for tidstore will be detached at the end of session */ > > > > No other test module pins the mapping, but that doesn't necessarily > > mean it's wrong. Is there some advantage over explicitly detaching? > > One small benefit of not explicitly detaching dsa_area in > tidstore_destroy() would be simplicity; IIUC if we want to do that, we > need to remember the dsa_area using (for example) a static variable, > and free it if it's non-NULL. I've implemented this idea in the > attached patch. Okay, I don't have a strong preference at this point. > > +-- Add tids in random order. > > > > I don't see any randomization here. I do remember adding row_number to > > remove whitespace in the output, but I don't remember a random order. > > On that subject, the row_number was an easy trick to avoid extra > > whitespace, but maybe we should just teach the setting function to > > return blocknumber rather than null? > > Good idea, fixed. + test_set_block_offsets + + 2147483647 + 0 + 4294967294 + 1 + 4294967295 Hmm, was the earlier comment about randomness referring to this? I'm not sure what other regression tests do in these cases, or how relibale this is. If this is a problem we could simply insert this result into a temp table so it's not output. > > +Datum > > +tidstore_create(PG_FUNCTION_ARGS) > > +{ > > ... > > + tidstore = TidStoreCreate(max_bytes, dsa); > > > > +Datum > > +tidstore_set_block_offsets(PG_FUNCTION_ARGS) > > +{ > > > > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs); > > > > These names are too similar. Maybe the test module should do > > s/tidstore_/test_/ or similar. > > Agreed. Mostly okay, although a couple look a bit generic now. I'll leave it up to you if you want to tweak things. > > In general, the .sql file is still very hard-coded. Functions are > > created that contain a VALUES statement. Maybe it's okay for now, but > > wanted to mention it. Ideally, we'd have some randomized tests, > > without having to display it. That could be in addition to (not > > replacing) the small tests we have that display input. (see below) > > > > Agreed to add randomized tests in addition to the existing tests. I'll try something tomorrow. > Sounds a good idea. In fact, if there are some bugs in tidstore, it's > likely that even initdb would fail in practice. However, it's a very > good idea that we can test the tidstore anyway with such a check > without a debug-build array. > > Or as another idea, I wonder if we could keep the debug-build array in > some form. For example, we use the array with the particular build > flag and set a BF animal for that. That way, we can test the tidstore > in more real cases. I think the purpose of a debug flag is to help developers catch mistakes. I don't think it's quite useful enough for that. For one, it has the same 1GB limitation as vacuum's current array. For another, it'd be a terrible way to debug moving tidbitmap.c from its hash table to use TID store -- AND/OR operations and lossy pages are pretty much undoable with a copy of vacuum's array. Last year, when I insisted on trying a long term realistic load that compares the result with the array, the encoding scheme was much harder to understand in code. I think it's now easier, and there are better tests. > In the latest (v69) patch: > > - squashed v68-0005 and v68-0006 patches. > - removed most of the changes in v68-0007 patch. > - addressed above review comments in v69-0002 patch. > - v69-0003, 0004, and 0005 are miscellaneous updates. > > As for renaming TidStore to TIDStore, I dropped the patch for now > since it seems we're using "Tid" in some function names and variable > names. If we want to update it, we can do that later. I think we're not consistent across the codebase, and it's fine to drop that patch. v70-0008: @@ -489,7 +489,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs) /* * Free the current tidstore and return allocated DSA segments to the * operating system. Then we recreate the tidstore with the same max_bytes - * limitation. + * limitation we just used. Nowadays, max_bytes is now more like a hint for tidstore, and not a limitation, right? Vacuum has the limitation. Maybe instead of "with", we should say "passing the same limitatio
Re: Streaming I/O, vectored I/O (WIP)
On Tue, Mar 12, 2024 at 12:10 PM Thomas Munro wrote: > > I think you'd be right if StartReadBuffers() were capable of > processing a sequence consisting of a hit followed by misses, but > currently it always gives up after the first hit. That is, it always > processes some number of misses (0-16) and then at most one hit. So > for now the variable would always turn out to be the same as blockNum. > Okay, then shouldn't this "if (found)" block immediately break the loop so that when we hit the block we just return that block? So it makes sense what you explained but with the current code if there are the first few hits followed by misses then we will issue the smgrprefetch() for the initial hit blocks as well. + if (found) + { + /* + * Terminate the read as soon as we get a hit. It could be a + * single buffer hit, or it could be a hit that follows a readable + * range. We don't want to create more than one readable range, + * so we stop here. + */ + actual_nblocks = operation->nblocks = *nblocks = i + 1;(Dilip: I think we should break after this?) + } + else + { + /* Extend the readable range to cover this block. */ + operation->io_buffers_len++; + } + } > The reason is that I wanted to allows "full sized" read system calls > to form. If you said "hey please read these 16 blocks" (I'm calling > that "full sized", AKA MAX_BUFFERS_PER_TRANSFER), and it found 2 hits, > then it could only form a read of 14 blocks, but there might be more > blocks that could be read after those. We would have some arbitrary > shorter read system calls, when we wanted to make them all as big as > possible. So in the current patch you say "hey please read these 16 > blocks" and it returns saying "only read 1", you call again with 15 > and it says "only read 1", and you call again and says "read 16!" > (assuming 2 more were readable after the original range we started > with). Then physical reads are maximised. Maybe there is some nice > way to solve that, but I thought this way was the simplest (and if > there is some instruction-cache-locality/tight-loop/perf reason why we > should work harder to find ranges of hits, it could be for later). > Does that make sense? Understood, I think this makes sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: speed up a logical replica setup
On Mon, Mar 11, 2024 at 9:42 AM vignesh C wrote: > > On Sat, 9 Mar 2024 at 00:56, Tomas Vondra > wrote: > > > > >> 5) slot / publication / subscription name > > >> > > >> I find it somewhat annoying it's not possible to specify names for > > >> objects created by the tool - replication slots, publication and > > >> subscriptions. If this is meant to be a replica running for a while, > > >> after a while I'll have no idea what pg_createsubscriber_569853 or > > >> pg_createsubscriber_459548_2348239 was meant for. > > >> > > >> This is particularly annoying because renaming these objects later is > > >> either not supported at all (e.g. for replication slots), or may be > > >> quite difficult (e.g. publications). > > >> > > >> I do realize there are challenges with custom names (say, if there are > > >> multiple databases to replicate), but can't we support some simple > > >> formatting with basic placeholders? So we could specify > > >> > > >> --slot-name "myslot_%d_%p" > > >> > > >> or something like that? > > > > > > Not sure we can do in the first version, but looks nice. One concern is > > > that I > > > cannot find applications which accepts escape strings like > > > log_line_prefix. > > > (It may just because we do not have use-case.) Do you know examples? > > > > > > > I can't think of a tool already doing that, but I think that's simply > > because it was not needed. Why should we be concerned about this? > > > > +1 to handle this. > Currently, a) Publication name = pg_createsubscriber_%u, where %u is > database oid, b) Replication slot name = pg_createsubscriber_%u_%d, > Where %u is database oid and %d is the pid and c) Subscription name = > pg_createsubscriber_%u_%d, Where %u is database oid and %d is the pid > How about we have a non mandatory option like > --prefix_object_name=mysetup1, which will create a) Publication name = > mysetup1_pg_createsubscriber_%u, and b) Replication slot name = > mysetup1_pg_createsubscriber_%u (here pid is also removed) c) > Subscription name = mysetup1_pg_createsubscriber_%u (here pid is also > removed). > > In the default case where the user does not specify > --prefix_object_name the object names will be created without any > prefix names. > Tomas's idea is better in terms of useability. So, we should instead have three switches --slot-name, --publication-name, and --subscriber-name with some provision of appending dbid's and some unique identifier for standby. The unique identifier can help in creating multiple subscribers from different standbys. -- With Regards, Amit Kapila.
Re: POC: Extension for adding distributed tracing - pg_tracing
Hi all! Thanks for the reviews and comments. > - pg_tracing.c should include postgres.h as the first thing Will do. > afaict none of the use of volatile is required, spinlocks have been barriers > for a long time now Got it, I will remove them. I've been mimicking what was done in pg_stat_statements and all spinlocks are done on volatile variables [1]. > - I don't think it's a good idea to do memory allocations in the middle of a > PG_CATCH. If the error was due to out-of-memory, you'll throw another error. Good point. I was wondering what were the risks of generating spans for errors. I will try to find a better way to handle this. To give a quick update: following Heikki's suggestion, I'm currently working on getting pg_tracing as a separate extension on github. I will send an update when it's ready. [1]: https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c#L2000 On Fri, Feb 9, 2024 at 7:50 PM Andres Freund wrote: > > Hi, > > On 2024-02-09 19:46:58 +0300, Nikita Malakhov wrote: > > I agree with Heikki on most topics and especially the one he recommended > > to publish your extension on GitHub, this tool is very promising for highly > > loaded > > systems, you will get a lot of feedback very soon. > > > > I'm curious about SpinLock - it is recommended for very short operations, > > taking up to several instructions, and docs say that for longer ones it > > will be > > too expensive, and recommends using LWLock. Why have you chosen SpinLock? > > Does it have some benefits here? > > Indeed - e.g. end_tracing() looks to hold the spinlock for far too long for > spinlocks to be appropriate. Use an lwlock. > > Random stuff I noticed while skimming: > - pg_tracing.c should include postgres.h as the first thing > > - afaict none of the use of volatile is required, spinlocks have been barriers > for a long time now > > - you acquire the spinlock for single increments of n_writers, perhaps that > could become an atomic, to reduce contention? > > - I don't think it's a good idea to do memory allocations in the middle of a > PG_CATCH. If the error was due to out-of-memory, you'll throw another error. > > > Greetings, > > Andres Freund
Re: Reports on obsolete Postgres versions
>> but that is far down the page. Do we need to improve this? > I liked the statement from Laurenz a while ago on his blog > (paraphrased): "Upgrading to the latest patch release does not require > application testing or recertification". I am not sure we want to put > that into the official page (or maybe tone down/qualify it a bit), but I > think a lot of users stay on older minor versions because they dread > their internal testing policies. I think we need a more conservative language since a minor release might fix a planner bug that someone's app relied on and their plans will be worse after upgrading. While rare, it can for sure happen so the official wording should probably avoid such bold claims. > The other thing that could maybe be made a bit better is the fantastic > patch release schedule, which however is buried in the "developer > roadmap". I can see how this was useful years ago, but I think this page > should be moved to the end-user part of the website, and maybe (also) > integrated into the support/versioning page? Fair point. -- Daniel Gustafsson
Re: Disconnect if socket cannot be put into non-blocking mode
> On 12 Mar 2024, at 09:49, Heikki Linnakangas wrote: >> Barring objections, I'll commit and backpatch the attached to fix that. > > Committed. Sorry for being slow to review, but +1 on this change. -- Daniel Gustafsson
Re: Regarding the order of the header file includes
On Fri, Mar 8, 2024 at 6:58 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Thu, Mar 7, 2024 at 12:39 PM Richard Guo > wrote: > > > > While rebasing one of my patches I noticed that the header file includes > > in relnode.c are not sorted in order. So I wrote a naive script to see > > if any other C files have the same issue. The script is: > > > > #!/bin/bash > > > > find . -name "*.c" | while read -r file; do > > headers=$(grep -o '#include "[^>]*"' "$file" | > > grep -v "postgres.h" | grep -v "postgres_fe.h" | > > sed 's/\.h"//g') > > > > sorted_headers=$(echo "$headers" | sort) > > > > results=$(diff <(echo "$headers") <(echo "$sorted_headers")) > > > > if [[ $? != 0 ]]; then > > echo "Headers in '$file' are out of order" > > echo $results > > echo > > fi > > done > > Cool. Isn't it a better idea to improve this script to auto-order the > header files and land it under src/tools/pginclude/headerssort? It can > then be reusable and be another code beautification weapon one can use > before the code release. Yeah, perhaps. However the current script is quite unrefined and would require a lot of effort to make it a reusable tool. I will add it to my to-do list and hopefully one day I can get back to it. Feel free to mess around with it if someone is interested. > FWIW, I'm getting the syntax error when ran the above shell script: > > headerssort.sh: 10: Syntax error: "(" unexpected I think the error is due to line 10 containing bash-style syntax. Hmm, have you tried to use 'bash' instead of 'sh' to run this script? Thanks Richard
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:53, Jelte Fennema-Nio wrote: > I'd rather fail as hard as possible when someone is using the API > wrongly. To be clear, this is my way of looking at it. If you feel strongly about that we should not change conn.status, I'm fine with making that change to the patchset.
Re: Have pg_basebackup write "dbname" in "primary_conninfo"?
On Mon, 11 Mar 2024 at 17:16, Amit Kapila wrote: > > On Tue, Feb 27, 2024 at 2:07 PM Hayato Kuroda (Fujitsu) > wrote: > > > > > PSA the patch for implementing it. It is basically same as Ian's one. > > > However, this patch still cannot satisfy the condition 3). > > > > > > `pg_basebackup -D data_N2 -d "user=postgres" -R` > > > -> dbname would not be appeared in primary_conninfo. > > > > > > This is because `if (connection_string)` case in GetConnection() explicy > > > override > > > a dbname to "replication". I've tried to add a dummy entry {"dbname", > > > NULL} pair > > > before the overriding, but it is no-op. Because The replacement of the > > > dbname in > > > pqConnectOptions2() would be done only for the valid (=lastly specified) > > > connection options. > > > > Oh, this patch missed the straightforward case: > > > > pg_basebackup -D data_N2 -d "user=postgres dbname=replication" -R > > -> dbname would not be appeared in primary_conninfo. > > > > So I think it cannot be applied as-is. Sorry for sharing the bad item. > > > > Can you please share the patch that can be considered for review? Here is a patch with few changes: a) Removed the version check based on discussion from [1] b) updated the documentation. I have tested various scenarios with the attached patch, the dbname that is written in postgresql.auto.conf for each of the cases with pg_basebackup is given below: 1) ./pg_basebackup -U vignesh -R -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo will have dbname as username specified) 2) ./pg_basebackup -D data -R -> primary_conninfo = "dbname=vignesh" (In this case primary_conninfo will have the dbname as username (which is the same as the OS user while setting defaults)) 3) ./pg_basebackup -d "user=vignesh" -D data -R -> primary_conninfo = "dbname=replication" (In this case primary_conninfo will have dbname as replication which is the default value from GetConnection as connection string is specified) 4) ./pg_basebackup -d "user=vignesh dbname=postgres" -D data -R -> primary_conninfo = "dbname=postgres" (In this case primary_conninfo will have the dbname as the dbname specified) 5) ./pg_basebackup -d "" -D data -R -> primary_conninfo = "dbname=replication" (In this case primary_conninfo will have dbname as replication which is the default value from GetConnection as connection string is specified) I have mentioned the reasons as to why dbname is being set to replication or username in each of the cases. How about replacing these values in postgresql.auto.conf manually. [1] - https://www.postgresql.org/message-id/CAGECzQTh9oB3nu98DsHMpRaVaqXPDRgTDEikY82OAKYF0%3DhVMA%40mail.gmail.com Regards, Vignesh diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 88c689e725..1a6980d1b0 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -243,7 +243,10 @@ PostgreSQL documentation The postgresql.auto.conf file will record the connection settings and, if specified, the replication slot that pg_basebackup is using, so that -streaming replication will use the same settings later on. +a) synchronization of logical replication slots on the primary to the +hot_standby from +pg_sync_replication_slots or slot sync worker +and b) streaming replication will use the same settings later on. diff --git a/src/fe_utils/recovery_gen.c b/src/fe_utils/recovery_gen.c index 2585f11939..cb37703ab9 100644 --- a/src/fe_utils/recovery_gen.c +++ b/src/fe_utils/recovery_gen.c @@ -49,7 +49,6 @@ GenerateRecoveryConfig(PGconn *pgconn, char *replication_slot) { /* Omit empty settings and those libpqwalreceiver overrides. */ if (strcmp(opt->keyword, "replication") == 0 || - strcmp(opt->keyword, "dbname") == 0 || strcmp(opt->keyword, "fallback_application_name") == 0 || (opt->val == NULL) || (opt->val != NULL && opt->val[0] == '\0'))
Re: Regarding the order of the header file includes
On Wed, Mar 6, 2024 at 5:32 PM Richard Guo wrote: > Please note that this patch only addresses the order of header file > includes in backend modules (and might not be thorough). It is possible > that other modules may have a similar issue, but I have not evaluated > them yet. > Attached is v2, which also includes the 0002 patch that addresses the order of header file includes in non-backend modules. Thanks Richard v2-0001-Make-the-order-of-the-header-file-includes-consistent-in-backend-modules.patch Description: Binary data v2-0002-Make-the-order-of-the-header-file-includes-consistent-in-non-backend-modules.patch Description: Binary data
Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
On Thu, 7 Mar 2024 at 19:37, Michail Nikolaev wrote: > > Hello! > > > I'm not a fan of this approach. Changing visibility and cleanup > > semantics to only benefit R/CIC sounds like a pain to work with in > > essentially all visibility-related code. I'd much rather have to deal > > with another index AM, even if it takes more time: the changes in > > semantics will be limited to a new plug in the index AM system and a > > behaviour change in R/CIC, rather than behaviour that changes in all > > visibility-checking code. > > Technically, this does not affect the visibility logic, only the > clearing semantics. > All visibility related code remains untouched. Yeah, correct. But it still needs to update the table relations' information after finishing creating the indexes, which I'd rather not have to do. > But yes, still an inelegant and a little strange-looking option. > > At the same time, perhaps it can be dressed in luxury > somehow - for example, add as a first class citizen in > ComputeXidHorizonsResult > a list of blocks to clear some relations. Not sure what you mean here, but I don't think ComputeXidHorizonsResult should have anything to do with actual relations. > > But regardless of second scan snapshots, I think we can worry about > > that part at a later moment: The first scan phase is usually the most > > expensive and takes the most time of all phases that hold snapshots, > > and in the above discussion we agreed that we can already reduce the > > time that a snapshot is held during that phase significantly. Sure, it > > isn't great that we have to scan the table again with only a single > > snapshot, but generally phase 2 doesn't have that much to do (except > > when BRIN indexes are involved) so this is likely less of an issue. > > And even if it is, we would still have reduced the number of > > long-lived snapshots by half. > > Hmm, but it looks like we don't have the infrastructure to "update" xmin > propagating to the horizon after the first snapshot in a transaction is taken. We can just release the current snapshot, and get a new one, right? I mean, we don't actually use the transaction for much else than visibility during the first scan, and I don't think there is a need for an actual transaction ID until we're ready to mark the index entry with indisready. > One option I know of is to reuse the > d9d076222f5b94a85e0e318339cfc44b8f26022d (1) approach. > But if this is the case, then there is no point in re-taking the > snapshot again during the first > phase - just apply this "if" only for the first phase - and you're done. Not a fan of that, as it is too sensitive to abuse. Note that extensions will also have access to these tools, and I think we should build a system here that's not easy to break, rather than one that is. > Do you know any less-hacky way? Or is it a nice way to go? I suppose we could be resetting the snapshot every so often? Or use multiple successive TID range scans with a new snapshot each? Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: remaining sql/json patches
Hi Himanshu, On Tue, Mar 12, 2024 at 6:42 PM Himanshu Upadhyaya wrote: > > Hi, > > wanted to share the below case: > > ‘postgres[146443]=#’SELECT JSON_EXISTS(jsonb '{"customer_name": "test", > "salary":1000, "department_id":1}', '$.* ? (@== $dept_id && @ == $sal)' > PASSING 1000 AS sal, 1 as dept_id); > json_exists > - > f > (1 row) > > isn't it supposed to return "true" as json in input is matching with both the > condition dept_id and salary? I think you meant to use || in your condition, not &&, because 1000 != 1. See: SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000, "department_id":1}', '$.* ? (@ == $dept_id || @ == $sal)' PASSING 1000 AS sal, 1 as dept_id); json_exists - t (1 row) Or you could've written the query as: SELECT JSON_EXISTS(jsonb '{"customer_name": "test", "salary":1000, "department_id":1}', '$ ? (@.department_id == $dept_id && @.salary == $sal)' PASSING 1000 AS sal, 1 as dept_id); json_exists - t (1 row) Does that make sense? In any case, JSON_EXISTS() added by the patch here returns whatever the jsonpath executor returns. The latter is not touched by this patch. PASSING args, which this patch adds, seem to be working correctly too. -- Thanks, Amit Langote
Re: collect_corrupt_items_vacuum.patch
On Sun, Jan 14, 2024 at 4:35 AM Alexander Korotkov wrote: > On Sat, Jan 13, 2024 at 7:33 PM Dmitry Koval wrote: > > Thank you, there is one small point left (in the comment): can you > > replace "guarantteed to be to be newer" to "guaranteed to be newer", > > file src/backend/storage/ipc/procarray.c? > > Fixed. Thank you for catching this. I made the following improvements to the patch. 1. I find a way to implement the path with less changes to the core code. The GetRunningTransactionData() function allows to get the least running xid, all I need is to add database-aware values. 2. I added the TAP test reproducing the original issue. I'm going to push this if no objections. -- Regards, Alexander Korotkov v6-0001-Fix-false-reports-in-pg_visibility.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot wrote: > > On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote: > > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila wrote: > > > > > > You might want to consider its interaction with sync slots on standby. > > > Say, there is no activity on slots in terms of processing the changes > > > for slots. Now, we won't perform sync of such slots on standby showing > > > them inactive as per your new criteria where as same slots could still > > > be valid on primary as the walsender is still active. This may be more > > > of a theoretical point as in running system there will probably be > > > some activity but I think this needs some thougths. > > > > I believe the xmin and catalog_xmin of the sync slots on the standby > > keep advancing depending on the slots on the primary, no? If yes, the > > XID age based invalidation shouldn't be a problem. > > > > I believe there are no walsenders started for the sync slots on the > > standbys, right? If yes, the inactive timeout based invalidation also > > shouldn't be a problem. Because, the inactive timeouts for a slot are > > tracked only for walsenders because they are the ones that typically > > hold replication slots for longer durations and for real replication > > use. We did a similar thing in a recent commit [1]. > > > > Is my understanding right? Do you still see any problems with it? > > Would that make sense to "simply" discard/prevent those kind of invalidations > for "synced" slot on standby? I mean, do they make sense given the fact that > those slots are not usable until the standby is promoted? > AFAIR, we don't prevent similar invalidations due to 'max_slot_wal_keep_size' for sync slots, so why to prevent it for these new parameters? This will unnecessarily create inconsistency in the invalidation behavior. -- With Regards, Amit Kapila.
Re: Detoasting optionally to make Explain-Analyze less misleading
On Mon, 26 Feb 2024 at 21:54, stepan rutz wrote: > > Hi Matthias, thanks for picking it up. I still believe this is valuable > to a lot of people out there. Thanks for dealing with my proposal. > Matthias, Tom, Tomas everyone. > > Two (more or less) controversial remarks from side. > > 1. Actually serialization should be the default for "analyze" in > explain, as current analyze doesn't detoast and thus distorts the result > in extreme (but common) cases easily by many order of magnitude (see my > original video on that one). So current "explain analyze" only works for > some queries and since detoasting is really transparent, it is quite > something to leave detoasting out of explain analyze. This surprises > people all the time, since explain analyze suggests it "runs" the query > more realistically. I'm not sure about this, but it could easily be a mid-beta decision (if this is introduced before the feature freeze of 17, whenever that is). > 2. The bandwidth I computed in one of the previous versions of the patch > was certainly cluttering up the explain output and it is misleading yes, > but then again it performs a calculation people will now do in their > head. The "bandwidth" here is how much data your query gets out of > backend by means of the query and the deserialization. So of course if > you do id-lookups you get a single row and such querries do have a lower > data-retrieval bandwidth compared to bulk querries. I think that's a job for post-processing the EXPLAIN output by the user. If we don't give users the raw data, they won't be able to do their own cross-referenced processing: "5MB/sec" doesn't tell you how much time or data was actually spent. > However having some > measure of how fast data is delivered from the backend especially on > larger joins is still a good indicator of one aspect of a query. I'm not sure about that. Network speed is a big limiting factor that we can't measure here, and the size on disk is often going to be smaller than the data size when transfered across the network. > Sorry for the remarks. Both are not really important, just restating my > points here. I understand the objections and reasons that speak against > both points and believe the current scope is just right. No problem. Remarks from users (when built on solid arguments) are always welcome, even if we may not always agree on the specifics. -->8-- Attached is v9, which is rebased on master to handle 24eebc65's changed signature of pq_sendcountedtext. It now also includes autocompletion, and a second patch which adds documentation to give users insights into this new addition to EXPLAIN. Kind regards, Matthias van de Meent v9-0002-Add-EXPLAIN-SERIALIZE-docs.patch Description: Binary data v9-0001-Explain-Add-SERIALIZE-option.patch Description: Binary data
Re: A failure in t/038_save_logical_slots_shutdown.pl
On Thu, Jan 25, 2024 at 5:07 PM Amit Kapila wrote: > > On Thu, Jan 25, 2024 at 4:27 PM Bharath Rupireddy > wrote: > > > > Thanks. I'll wait a while and then add it to CF to not lose it in the wild. > > > > Feel free to add it to CF. However, I do plan to look at it in the > next few days. > The patch looks mostly good to me. I have changed the comments and commit message in the attached. I am planning to push this tomorrow unless you or others have any comments on it. -- With Regards, Amit Kapila. v2-0001-Fix-a-random-failure-in-038_save_logical_slots_sh.patch Description: Binary data
confusing `case when` column name
Hi hackers, Below is a `case when` demo, ```sql create table foo(a int, b int); insert into foo values (1, 2); select case 1 when 1 then a else b end from foo; ``` Currently, psql output is, ```text b --- 1 (1 row) ``` At the first glance at the output column title, I assume the result of the sql is wrong. It should be `a`. After some investigation, I discovered that the result's value is accurate. However, PostgreSQL utilizes b as the title for the output column. Nee we change the title of the case-when output column? If you hackers think it's worth the effort, I'm willing to invest time in working on it. Best Regards, Winter Loo
Re: Make attstattarget nullable
On 06.03.24 22:34, Tomas Vondra wrote: 0001 1) I think this bit in ALTER STATISTICS docs is wrong: - new_target + SET STATISTICS { integer | DEFAULT } because it means we now have list entries for name, ..., new_name, new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". That's a bit weird. Ok, how would you change it? List out the full clauses of the other variants under Parameters as well? We have similar inconsistencies on other ALTER reference pages, so I'm not sure what the preferred approach is. 2) The newtarget handling in AlterStatistics seems rather confusing. Why does it get set to -1 just to ignore the value later? For a while I was 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field to -1. Maybe ditching the first if block and directly checking stmt->stxstattarget before setting repl_val/repl_null would be better? But we also need to continue accepting -1 for default on input. The current code achieves that, the proposed variant would not. Note that this patch matches the equivalent patch for attstattarget (4f622503d6d), which uses the same logic. We could change it if we have a better idea, but then we should change both. 0002 1) I think InsertPgAttributeTuples comment probably needs to document what the new tupdesc_extra parameter does. Yes, I'll update that comment.
Re: [PATCHES] Post-special page storage TDE support
Hi David, > I have finished the reworking of this particular patch series, and have tried > to > organize this in such a way that it will be easily reviewable. It is > constructed progressively to be able to follow what is happening here. As > such, > each individual commit is not guaranteed to compile on its own, so the whole > series would need to be applied before it works. (It does pass CI tests.) > > Here is a brief roadmap of the patches; some of them have additional details > in > the commit message describing a little more about them. > > These two patches do some refactoring of existing code to make a common place > to > modify the definitions: > > v3-0001-refactor-Create-PageUsableSpace-to-represent-spac.patch > v3-0002-refactor-Make-PageGetUsablePageSize-routine.patch > > These two patches add the ReservedPageSize variable and teach PageInit to use > to > adjust sizing accordingly: > > v3-0003-feature-Add-ReservedPageSize-variable.patch > v3-0004-feature-Adjust-page-sizes-at-PageInit.patch > > This patch modifies the definitions of 4 symbols to be computed based on > PageUsableSpace: > > v3-0005-feature-Create-Calc-Limit-and-Dynamic-forms-for-f.patch > > These following 4 patches are mechanical replacements of all existing uses of > these symbols; this provides both visibility into where the existing symbol is > used as well as distinguishing between parts that care about static allocation > vs dynamic usage. The only non-mechanical change is to remove the definition > of > the old symbol so we can be guaranteed that all uses have been considered: > > v3-0006-chore-Split-MaxHeapTuplesPerPage-into-Limit-and-D.patch > v3-0007-chore-Split-MaxIndexTuplesPerPage-into-Limit-and-.patch > v3-0008-chore-Split-MaxHeapTupleSize-into-Limit-and-Dynam.patch > v3-0009-chore-Split-MaxTIDsPerBTreePage-into-Limit-and-Dy.patch > > The following patches are related to required changes to support dynamic toast > limits: > > v3-0010-feature-Add-hook-for-setting-reloptions-defaults-.patch > v3-0011-feature-Dynamically-calculate-toast_tuple_target.patch > v3-0012-feature-Add-Calc-options-for-toast-related-pieces.patch > v3-0013-chore-Replace-TOAST_MAX_CHUNK_SIZE-with-ClusterTo.patch > v3-0014-chore-Translation-updates-for-TOAST_MAX_CHUNK_SIZ.patch > > In order to calculate some of the sizes, we need to include nbtree.h > internals, > but we can't use in front-end apps, so we separate out the pieces we care > about > into a separate include and use that: > > v3-0015-chore-Split-nbtree.h-structure-defs-into-an-inter.patch > > This is the meat of the patch; provide a common location for these > block-size-related constants to be computed using the infra that has been set > up > so far. Also ensure that we are properly initializing this in front end and > back end code. A tricky piece here is we have two separate include files for > blocksize.h; one which exposes externs as consts for optimizations, and one > that > blocksize.c itself uses without consts, which it uses to create/initialized > the > vars: > > v3-0016-feature-Calculate-all-blocksize-constants-in-a-co.patch > > Add ControlFile and GUC support for reserved_page_size: > > v3-0017-feature-ControlFile-GUC-support-for-reserved_page.patch > > Add initdb support for reserving page space: > > v3-0018-feature-Add-reserved_page_size-to-initdb-bootstra.patch > > Fixes for pg_resetwal: > > v3-0019-feature-Updates-for-pg_resetwal.patch > > The following 4 patches mechanically replace the Dynamic form to use the new > Cluster variables: > > v3-0020-chore-Rename-MaxHeapTupleSizeDynamic-to-ClusterMa.patch > v3-0021-chore-Rename-MaxHeapTuplesPerPageDynamic-to-Clust.patch > v3-0022-chore-Rename-MaxIndexTuplesPerPageDynamic-to-Clus.patch > v3-0023-chore-Rename-MaxTIDsPerBTreePageDynamic-to-Cluste.patch > > Two pieces of optimization required for visibility map: > > v3-0024-optimization-Add-support-for-fast-non-division-ba.patch > v3-0025-optimization-Use-fastdiv-code-in-visibility-map.patch > > Update bufpage.h comments: > > v3-0026-doc-update-bufpage-docs-w-reserved-space-data.patch > > Fixes for bloom to use runtime size: > > v3-0027-feature-Teach-bloom-about-PageUsableSpace.patch > > Fixes for FSM to use runtime size: > > v3-0028-feature-teach-FSM-about-reserved-page-space.patch > > I hope this makes sense for reviewing, I know it's a big job, so breaking > things up a little more and organizing will hopefully help. Just wanted to let you know that the patchset seems to need a rebase, according to cfbot. Best regards, Aleksander Alekseev (wearing a co-CFM hat)
Re: confusing `case when` column name
On Tuesday, March 12, 2024, adjk...@126.com wrote: > > Nee we change the title of the case-when output column? > > Choosing either a or b as the label seems wrong and probably worth changing to something that has no meaning and encourages the application of a column alias. David J.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, > I took a closer look at this patch over the last couple days, and I did > a bunch of benchmarks to see how expensive the accounting is. The good > news is that I haven't observed any overhead - I did two simple tests, > that I think would be particularly sensitive to this: > > [...] Just wanted to let you know that v20231226 doesn't apply. The patch needs love from somebody interested in it. Best regards, Aleksander Alekseev (wearing a co-CFM hat)
Re: Make attstattarget nullable
On 3/12/24 13:47, Peter Eisentraut wrote: > On 06.03.24 22:34, Tomas Vondra wrote: >> 0001 >> >> >> 1) I think this bit in ALTER STATISTICS docs is wrong: >> >> - > class="parameter">new_target >> + SET STATISTICS { > class="parameter">integer | DEFAULT } >> >> because it means we now have list entries for name, ..., new_name, >> new_schema, and then suddenly "SET STATISTICS { integer | DEFAULT }". >> That's a bit weird. > > Ok, how would you change it? List out the full clauses of the other > variants under Parameters as well? I'd go with a parameter, essentially exactly as it used to be, except for adding the DEFAULT option. So the list would define new_target, and mention DEFAULT as a special value. > We have similar inconsistencies on other ALTER reference pages, so I'm > not sure what the preferred approach is. > Yeah, the other reference pages may have some inconsistencies too, but let's not add more. >> 2) The newtarget handling in AlterStatistics seems rather confusing. Why >> does it get set to -1 just to ignore the value later? For a while I was >> 99% sure ALTER STATISTICS ... SET STATISTICS DEFAULT will set the field >> to -1. Maybe ditching the first if block and directly checking >> stmt->stxstattarget before setting repl_val/repl_null would be better? > > But we also need to continue accepting -1 for default on input. The > current code achieves that, the proposed variant would not. > OK, I did not realize that. But then maybe this should be explained in a comment before the new "if" block, because people won't realize why it needs to be this way. > Note that this patch matches the equivalent patch for attstattarget > (4f622503d6d), which uses the same logic. We could change it if we have > a better idea, but then we should change both. > >> 0002 >> >> >> 1) I think InsertPgAttributeTuples comment probably needs to document >> what the new tupdesc_extra parameter does. > > Yes, I'll update that comment. > OK. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: CF entries for 17 to be reviewed
Hi, > > On 6 Mar 2024, at 18:49, Andrey M. Borodin wrote: > > > > Here are statuses for "Refactoring" section: > > [...] > Aleksander, I would greatly appreciate if you join me in managing CF. > Together we can move more stuff :) > Currently, I'm going through "SQL Commands". And so far I had not come to > "Performance" and "Server Features" at all... So if you can handle updating > statuses of that sections - that would be great. Server Features: * Fix partitionwise join with partially-redundant join clauses I see there is a good discussion in the progress here. Doesn't seem to be needing more reviewers at the moment. * ALTER TABLE SET ACCESS METHOD on partitioned tables Ditto. * Patch to implement missing join selectivity estimation for range types The patch doesn't apply and has been "Waiting on Author" for a few months now. Could be a candidate for closing with RwF. * logical decoding and replication of sequences, take 2 According to Tomas Vondra the patch is not ready for PG17. The patch is marked as "Waiting on Author". Although it could be withrowed for now, personally I see no problem with keeping it WoA until the PG18 cycle begins. * Add the ability to limit the amount of memory that can be allocated to backends v20231226 doesn't apply. The patch needs love from somebody interested in it. * Multi-version ICU By a quick look the patch doesn't apply and was moved between several commitfests, last time in "Waiting on Author" state. Could be a candidate for closing with RwF. * Post-special Page Storage TDE support A large patch divided into 28 (!) parts. Currently needs a rebase. Which shouldn't necessarily stop a reviewer looking for a challenging task. * Built-in collation provider for "C" and "C.UTF-8" Peter E left some feedback today, so I changed the status to "Waiting on Author" * ltree hash functions Marked as RfC and cfbot seems to be happy with the patch. Could use some attention from a committer? * UUID v7 The patch is in good shape. Michael argued that the patch should be merged when RFC is approved. No action seems to be needed until then. * (to be continued) -- Best regards, Aleksander Alekseev
Re: [PATCH] LockAcquireExtended improvement
On Mon, Mar 11, 2024 at 11:11 PM Jingxian Li wrote: > Your version changes less code than mine by pushing the nowait flag down > into ProcSleep(). This looks fine in general, except for a little advice, > which I don't think there is necessary to add 'waiting' suffix to the > process name in function WaitOnLock with dontwait being true, as follows: That could be done, but in my opinion it's not necessary. The waiting suffix will appear only very briefly, and probably only in relatively rare cases. It doesn't seem worth adding code to avoid it. -- Robert Haas EDB: http://www.enterprisedb.com
Disable LLVM bitcode generation with pgxs.mk framework.
Hi hackers, When the PostgreSQL server is configured with --with-llvm, the pgxs.mk framework will generate LLVM bitcode for extensions automatically. Sometimes, I don't want to generate bitcode for some extensions. I can turn off this feature by specifying with_llvm=0 in the make command. ``` make with_llvm=0 ``` Would it be possible to add a new switch in the pgxs.mk framework to allow users to disable this feature? E.g., the Makefile looks like: ``` WITH_LLVM=no PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) ``` Best Regards, Xing
Re: Commitfest Manager for March
Hi Andrey, > > If you need any help please let me know. > > Aleksander, I would greatly appreciate if you join me in managing CF. > Together we can move more stuff :) > Currently, I'm going through "SQL Commands". And so far I had not come to > "Performance" and "Server Features" at all... So if you can handle updating > statuses of that sections - that would be great. OK, I'll take care of the "Performance" and "Server Features" sections. I submitted my summaries of the entries triaged so far to the corresponding thread [1]. [1]: https://www.postgresql.org/message-id/CAJ7c6TN9SnYdq%3DkfP-txgo5AaT%2Bt9YU%2BvQHfLBZqOBiHwoipAg%40mail.gmail.com -- Best regards, Aleksander Alekseev
Re: CF entries for 17 to be reviewed
Hi, > > Aleksander, I would greatly appreciate if you join me in managing CF. > > Together we can move more stuff :) > > Currently, I'm going through "SQL Commands". And so far I had not come to > > "Performance" and "Server Features" at all... So if you can handle updating > > statuses of that sections - that would be great. > > Server Features: > > [...] I noticed that "Avoid mixing custom and OpenSSL BIO functions" had two entries [1][2]. I closed [1] and marked it as "Withdrawn" due to lack of a better status. Maybe we need an additional "Duplicate" status. [1]: https://commitfest.postgresql.org/47/4834/ [2]: https://commitfest.postgresql.org/47/4835/ -- Best regards, Aleksander Alekseev
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > Here's a last one for the cfbot. Thanks for committing the first 3 patches btw. Attached a tiny change to 0001, which adds "(backing struct for PGcancelConn)" to the comment on pg_cancel_conn. From d340fde6883a249fd7c1a90033675a3b5edb603e Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Thu, 14 Dec 2023 13:39:09 +0100 Subject: [PATCH v35 2/2] Start using new libpq cancel APIs A previous commit introduced new APIs to libpq for cancelling queries. This replaces the usage of the old APIs in most of the codebase with these newer ones. This specifically leaves out changes to psql and pgbench as those would need a much larger refactor to be able to call them, due to the new functions not being signal-safe. --- contrib/dblink/dblink.c | 30 +++-- contrib/postgres_fdw/connection.c | 105 +++--- .../postgres_fdw/expected/postgres_fdw.out| 15 +++ contrib/postgres_fdw/sql/postgres_fdw.sql | 7 ++ src/fe_utils/connect_utils.c | 11 +- src/test/isolation/isolationtester.c | 29 ++--- 6 files changed, 145 insertions(+), 52 deletions(-) diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 19a362526d2..98dcca3e6fd 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -1346,22 +1346,32 @@ PG_FUNCTION_INFO_V1(dblink_cancel_query); Datum dblink_cancel_query(PG_FUNCTION_ARGS) { - int res; PGconn *conn; - PGcancel *cancel; - char errbuf[256]; + PGcancelConn *cancelConn; + char *msg; dblink_init(); conn = dblink_get_named_conn(text_to_cstring(PG_GETARG_TEXT_PP(0))); - cancel = PQgetCancel(conn); + cancelConn = PQcancelCreate(conn); - res = PQcancel(cancel, errbuf, 256); - PQfreeCancel(cancel); + PG_TRY(); + { + if (!PQcancelBlocking(cancelConn)) + { + msg = pchomp(PQcancelErrorMessage(cancelConn)); + } + else + { + msg = "OK"; + } + } + PG_FINALLY(); + { + PQcancelFinish(cancelConn); + } + PG_END_TRY(); - if (res == 1) - PG_RETURN_TEXT_P(cstring_to_text("OK")); - else - PG_RETURN_TEXT_P(cstring_to_text(errbuf)); + PG_RETURN_TEXT_P(cstring_to_text(msg)); } diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 4931ebf5915..dcc13dc3b24 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -133,7 +133,7 @@ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue); static void pgfdw_reject_incomplete_xact_state_change(ConnCacheEntry *entry); static void pgfdw_reset_xact_state(ConnCacheEntry *entry, bool toplevel); static bool pgfdw_cancel_query(PGconn *conn); -static bool pgfdw_cancel_query_begin(PGconn *conn); +static bool pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime); static bool pgfdw_cancel_query_end(PGconn *conn, TimestampTz endtime, bool consume_input); static bool pgfdw_exec_cleanup_query(PGconn *conn, const char *query, @@ -1315,36 +1315,104 @@ pgfdw_cancel_query(PGconn *conn) endtime = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), CONNECTION_CLEANUP_TIMEOUT); - if (!pgfdw_cancel_query_begin(conn)) + if (!pgfdw_cancel_query_begin(conn, endtime)) return false; return pgfdw_cancel_query_end(conn, endtime, false); } static bool -pgfdw_cancel_query_begin(PGconn *conn) +pgfdw_cancel_query_begin(PGconn *conn, TimestampTz endtime) { - PGcancel *cancel; - char errbuf[256]; + bool timed_out = false; + bool failed = false; + PGcancelConn *cancel_conn = PQcancelCreate(conn); - /* - * Issue cancel request. Unfortunately, there's no good way to limit the - * amount of time that we might block inside PQgetCancel(). - */ - if ((cancel = PQgetCancel(conn))) + + if (!PQcancelStart(cancel_conn)) { - if (!PQcancel(cancel, errbuf, sizeof(errbuf))) + PG_TRY(); { ereport(WARNING, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("could not send cancel request: %s", - errbuf))); - PQfreeCancel(cancel); - return false; + pchomp(PQcancelErrorMessage(cancel_conn); } - PQfreeCancel(cancel); + PG_FINALLY(); + { + PQcancelFinish(cancel_conn); + } + PG_END_TRY(); + return false; } - return true; + /* In what follows, do not leak any PGcancelConn on an error. */ + PG_TRY(); + { + while (true) + { + TimestampTz now = GetCurrentTimestamp(); + long cur_timeout; + PostgresPollingStatusType pollres = PQcancelPoll(cancel_conn); + int waitEvents = WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH; + + if (pollres == PGRES_POLLING_OK) + { +break; + } + + /* If timeout has expired, give up, else get sleep time. */ + cur_timeout = TimestampDifferenceMilliseconds(now, endtime); + if (cur_timeout <= 0) + { +timed_out = true; +failed = true; +goto exit; + } + + switch (pollres) + { +case PGRES_POLLING_READING: + waitEvents |= WL_SOCKET_READABLE; + break; +case PGRES_POL
Re: confusing `case when` column name
"David G. Johnston" writes: > On Tuesday, March 12, 2024, adjk...@126.com wrote: >> Nee we change the title of the case-when output column? > Choosing either a or b as the label seems wrong and probably worth changing > to something that has no meaning and encourages the application of a column > alias. Yeah, we won't get any kudos for changing a rule that's stood for 25 years, even if it's not very good. This is one of the places where it's just hard to make a great choice automatically, and users are probably going to end up applying an AS clause most of the time if they care about the column name at all. regards, tom lane
Re: On disable_cost
On Thu, Aug 3, 2023 at 5:22 AM Jian Guo wrote: > I have write an initial patch to retire the disable_cost GUC, which labeled a > flag on the Path struct instead of adding up a big cost which is hard to > estimate. Though it involved in tons of plan changes in regression tests, I > have tested on some simple test cases such as eagerly generate a two-stage > agg plans and it worked. Could someone help to review? I took a look at this patch today. I believe that overall this may well be an approach worth pursuing. However, more work is going to be needed. Here are some comments: 1. You stated that it changes lots of plans in the regression tests, but you haven't provided any sort of analysis of why those plans changed. I'm kind of surprised that there would be "tons" of plan changes. You (or someone) should look into why that's happening. 2. The change to compare_path_costs_fuzzily() seems incorrect to me. When path1->is_disabled && path2->is_disabled, costs should be compared just as they are when neither path is disabled. Instead, the patch treats any two such paths as having equal cost. That seems catastrophically bad. Maybe it accounts for some of those plan changes, although that would only be true if those plans were created while using some disabling GUC. 3. Instead of adding is_disabled at the end of the Path structure, I suggest adding it between param_info and parallel_aware. I think if you do that, the space used by the new field will use up padding bytes that are currently included in the struct, instead of making it longer. 4. A critical issue for any patch of this type is performance. This concern was raised earlier on this thread, but your path doesn't address it. There's no performance analysis or benchmarking included in your email. One idea that I have is to write the cost-comparison test like this: if (unlikely(path1->is_disabled || path2->is_disabled)) { if (!path1->is_disabled) return COSTS_BETTER1; if (!path2->is_disabled) return COSTS_BETTER2; /* if both disabled, fall through */ } I'm not sure that would be enough to prevent the patch from adding noticeably to the cost of path comparison, but maybe it would help. 5. The patch changes only compare_path_costs_fuzzily() but I wonder whether compare_path_costs() and compare_fractional_path_costs() need similar surgery. Whether they do or don't, there should likely be some comments explaining the situation. 6. In fact, the patch changes no comments at all, anywhere. I'm not sure how many comment changes a patch like this needs to make, but the answer definitely isn't "none". 7. The patch doesn't actually remove disable_cost. I guess it should. 8. When you submit a patch, it's a good idea to also add it on commitfest.postgresql.org. It doesn't look like that was done in this case. -- Robert Haas EDB: http://www.enterprisedb.com
Re: confusing `case when` column name
> On 12 Mar 2024, at 15:19, Tom Lane wrote: > users are probably going to end up applying an AS clause most of > the time if they care about the column name at all. If anything, we could perhaps add a short note in the CASE documentation about column naming, the way it's phrased now a new user could very easily assume it would be "case". -- Daniel Gustafsson
Re: Disable LLVM bitcode generation with pgxs.mk framework.
> On 12 Mar 2024, at 14:38, Xing Guo wrote: > Would it be possible to add a new switch in the pgxs.mk framework to > allow users to disable this feature? Something like that doesn't seem unreasonable I think. -- Daniel Gustafsson
Re: A failure in t/038_save_logical_slots_shutdown.pl
On Tue, Mar 12, 2024 at 6:05 PM Amit Kapila wrote: > > The patch looks mostly good to me. I have changed the comments and > commit message in the attached. I am planning to push this tomorrow > unless you or others have any comments on it. LGTM. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: POC, WIP: OR-clause support for indexes
On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov wrote: > On 11/3/2024 18:31, Alexander Korotkov wrote: > >> I'm not convinced about this limit. The initial reason was to combine > >> long lists of ORs into the array because such a transformation made at > >> an early stage increases efficiency. > >> I understand the necessity of this limit in the array decomposition > >> routine but not in the creation one. > > > > The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because > > N^2 algorithms could be applied to arrays. Are you sure that's not > > true for our case? > When you operate an array, indeed. But when we transform ORs to an > array, not. Just check all the places in the optimizer and even the > executor where we would pass along the list of ORs. This is why I think > we should use this optimization even more intensively for huge numbers > of ORs in an attempt to speed up the overall query. Ok. > >>> 3) Better save the original order of clauses by putting hash entries and > >>> untransformable clauses to the same list. A lot of differences in > >>> regression tests output have gone. > >> I agree that reducing the number of changes in regression tests looks > >> better. But to achieve this, you introduced a hack that increases the > >> complexity of the code. Is it worth it? Maybe it would be better to make > >> one-time changes in tests instead of getting this burden on board. Or > >> have you meant something more introducing the node type? > > > > For me the reason is not just a regression test. The current code > > keeps the original order of quals as much as possible. The OR > > transformation code reorders quals even in cases when it doesn't > > eventually apply any optimization. I don't think that's acceptable. > > However, less hackery ways for this is welcome for sure. > Why is it unacceptable? Can the user implement some order-dependent > logic with clauses, and will it be correct? > Otherwise, it is a matter of taste, and generally, this decision is up > to you. I think this is an important property that the user sees the quals in the plan in the same order as they were in the query. And if some transformations are applied, then the order is saved as much as possible. I don't think we should sacrifice this property without strong reasons. A bit of code complexity is definitely not that reason for me. > >>> We don't make array values unique. That might make query execution > >>> performance somewhat worse, and also makes selectivity estimation > >>> worse. I suggest Andrei and/or Alena should implement making array > >>> values unique. > >> The fix Alena has made looks correct. But I urge you to think twice: > >> The optimizer doesn't care about duplicates, so why do we do it? > >> What's more, this optimization is intended to speed up queries with long > >> OR lists. Using the list_append_unique() comparator on such lists could > >> impact performance. I suggest sticking to the common rule and leaving > >> the responsibility on the user's shoulders. > > > > I don't see why the optimizer doesn't care about duplicates for OR > > lists. As I showed before in an example, it successfully removes the > > duplicate. So, currently OR transformation clearly introduces a > > regression in terms of selectivity estimation. I think we should > > evade that. > I think you are right. It is probably a better place than any other to > remove duplicates in an array. I just think we should sort and remove > duplicates from entry->consts in one pass. Thus, this optimisation > should be applied to sortable constants. Ok. > >> At least, we should do this optimization later, in one pass, with > >> sorting elements before building the array. But what if we don't have a > >> sort operator for the type? > > > > It was probably discussed before, but can we do our work later? There > > is a canonicalize_qual() which calls find_duplicate_ors(). This is > > the place where currently duplicate OR clauses are removed. Could our > > OR-to-ANY transformation be just another call from > > canonicalize_qual()? > Hmm, we already tried to do it at that point. I vaguely recall some > issues caused by this approach. Anyway, it should be done as quickly as > possible to increase the effect of the optimization. I think there were provided quite strong reasons why this shouldn't be implemented at the parse analysis stage [1], [2], [3]. The canonicalize_qual() looks quite appropriate place for that since it does similar transformations. Links. 1. https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com 3. https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com -- Regards, Alexander Korotkov
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 11, 2024 at 11:26 AM Amit Kapila wrote: > > > Hm. I get the concern. Are you okay with having inavlidation_reason > > separately for both logical and physical slots? In such a case, > > logical slots that got invalidated on the standby will have duplicate > > info in conflict_reason and invalidation_reason, is this fine? > > > > If we have duplicate information in two columns that could be > confusing for users. BTW, isn't the recovery conflict occur only > because of rows_removed and wal_level_insufficient reasons? The > wal_removed or the new reasons you are proposing can't happen because > of recovery conflict. Am, I missing something here? My understanding aligns with yours that the rows_removed and wal_level_insufficient invalidations can occur only upon recovery conflict. FWIW, a test named 'synchronized slot has been invalidated' in 040_standby_failover_slots_sync.pl inappropriately uses conflict_reason = 'wal_removed' logical slot on standby. As per the above understanding, it's inappropriate to use conflict_reason here because wal_removed invalidation doesn't conflict with recovery. > > Another idea is to make 'conflict_reason text' as a 'conflicting > > boolean' again (revert 007693f2a3), and have 'invalidation_reason > > text' for both logical and physical slots. So, whenever 'conflicting' > > is true, one can look at invalidation_reason for the reason for > > conflict. How does this sound? > > > > So, does this mean that conflicting will only be true for some of the > reasons (say wal_level_insufficient, rows_removed, wal_removed) and > logical slots but not for others? I think that will also not eliminate > the duplicate information as user could have deduced that from single > column. So, how about we turn conflict_reason to only report the reasons that actually cause conflict with recovery for logical slots, something like below, and then have invalidation_cause as a generic column for all sorts of invalidation reasons for both logical and physical slots? ReplicationSlotInvalidationCause cause = slot_contents.data.invalidated; if (slot_contents.data.database == InvalidOid || cause == RS_INVAL_NONE || cause != RS_INVAL_HORIZON || cause != RS_INVAL_WAL_LEVEL) { nulls[i++] = true; } else { Assert(cause == RS_INVAL_HORIZON || cause == RS_INVAL_WAL_LEVEL); values[i++] = CStringGetTextDatum(SlotInvalidationCauses[cause]); } -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: UUID v7
On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin wrote: > Sorry for this long and vague explanation, if it still seems too uncertain we > can have a chat or something like that. I don't think this number picking > stuff deserve to be commented, because it still is quite close to random. RFC > gives us too much freedom of choice. I thought your explanation was quite clear and I agree that this approach makes the most sense. I sent an email to the RFC authors to ask for their feedback with you (Andrey) in the CC, because even though it makes the most sense it does not comply with the either of method 1 or 2 as described in the RFC.
Re: Introduce XID age and inactive timeout based replication slot invalidation
Hi, On Tue, Mar 12, 2024 at 05:51:43PM +0530, Amit Kapila wrote: > On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote: > > > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila > > > wrote: > > > > > > > > You might want to consider its interaction with sync slots on standby. > > > > Say, there is no activity on slots in terms of processing the changes > > > > for slots. Now, we won't perform sync of such slots on standby showing > > > > them inactive as per your new criteria where as same slots could still > > > > be valid on primary as the walsender is still active. This may be more > > > > of a theoretical point as in running system there will probably be > > > > some activity but I think this needs some thougths. > > > > > > I believe the xmin and catalog_xmin of the sync slots on the standby > > > keep advancing depending on the slots on the primary, no? If yes, the > > > XID age based invalidation shouldn't be a problem. > > > > > > I believe there are no walsenders started for the sync slots on the > > > standbys, right? If yes, the inactive timeout based invalidation also > > > shouldn't be a problem. Because, the inactive timeouts for a slot are > > > tracked only for walsenders because they are the ones that typically > > > hold replication slots for longer durations and for real replication > > > use. We did a similar thing in a recent commit [1]. > > > > > > Is my understanding right? Do you still see any problems with it? > > > > Would that make sense to "simply" discard/prevent those kind of > > invalidations > > for "synced" slot on standby? I mean, do they make sense given the fact that > > those slots are not usable until the standby is promoted? > > > > AFAIR, we don't prevent similar invalidations due to > 'max_slot_wal_keep_size' for sync slots, Right, we'd invalidate them on the standby should the standby sync slot restart_lsn exceeds the limit. > so why to prevent it for > these new parameters? This will unnecessarily create inconsistency in > the invalidation behavior. Yeah, but I think wal removal has a direct impact on the slot usuability which is probably not the case with the new XID and Timeout ones. That's why I thought about handling them differently (but I'm also fine if that's not the case). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: UUID v7
On Tue, 12 Mar 2024 at 07:32, Michael Paquier wrote: > Sure, there is no problem in discussing a patch to implement a > behavior. But I disagree about taking a risk in merging something > that could become non-compliant with the approved RFC, if the draft is > approved at the end, of course. This just strikes me as a bad idea. I agree that we shouldn't release UUIDv7 support if the RFC describing that is not yet approved. But I do think it would be a shame if e.g. the RFC got approved 2 weeks after Postgres its feature freeze. Which would then mean we'd have to wait another 1.5 years before actually using uuidv7. Would it be a reasonable compromise to still merge the patch for PG17 (assuming the code is good to merge with regards to the current draft RFC), but revert the commit if the RFC is not approved before some deadline before the release date (e.g. before the first release candidate)?
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 12, 2024 at 5:51 PM Amit Kapila wrote: > > > Would that make sense to "simply" discard/prevent those kind of > > invalidations > > for "synced" slot on standby? I mean, do they make sense given the fact that > > those slots are not usable until the standby is promoted? > > AFAIR, we don't prevent similar invalidations due to > 'max_slot_wal_keep_size' for sync slots, so why to prevent it for > these new parameters? This will unnecessarily create inconsistency in > the invalidation behavior. Right. +1 to keep the behaviour consistent for all invalidations. However, an assertion that inactive_timeout isn't set for synced slots on the standby isn't a bad idea because we rely on the fact that walsenders aren't started for synced slots. Again, I think it misses the consistency in the invalidation behaviour. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 15:04, Jelte Fennema-Nio wrote: > Attached a tiny change to 0001 One more tiny comment change, stating that pg_cancel is used by the deprecated PQcancel function. v36-0001-libpq-Add-encrypted-and-non-blocking-query-cance.patch Description: Binary data v36-0002-Start-using-new-libpq-cancel-APIs.patch Description: Binary data
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot wrote: > > > AFAIR, we don't prevent similar invalidations due to > > 'max_slot_wal_keep_size' for sync slots, > > Right, we'd invalidate them on the standby should the standby sync slot > restart_lsn > exceeds the limit. Right. Help me understand this a bit - is the wal_removed invalidation going to conflict with recovery on the standby? Per the discussion upthread, I'm trying to understand what invalidation reasons will exactly cause conflict with recovery? Is it just rows_removed and wal_level_insufficient invalidations? My understanding on the conflict with recovery and invalidation reason has been a bit off track. Perhaps, we need to clarify these two things in the docs for the end users as well? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: CI speed improvements for FreeBSD
Hi! I looked at the changes and I liked them. Here are my thoughts: 0001: 1. I think, this is a good idea to use RAM. Since, it's still a UFS, and we lose nothing in terms of testing, but win in speed significantly. 2. Change from "swapoff -a || true" to "swapoff -a" is legit in my view, since it's better to explicitly fail than silent any possible problem. 3. Man says that lowercase suffixes should be used for the mdconfig. But in fact, you can use either lowercase or an appercase. Yep, it's in the mdconfig.c: "else if (*p == 'g' || *p == 'G')". 0002: 1. The resource usage should be a bit higher, this is for sure. But, if I'm not missing something, not drastically. Anyway, I do not know how to measure this increase to get concrete values. 2. And think of a potential benefits of increasing the number of test jobs: more concurrent processes, more interactions, better test coverage. Here are my runs: FreeBSD @master https://cirrus-ci.com/task/4934701194936320 Run test_world 05:56 FreeBSD @master + 0001 https://cirrus-ci.com/task/5921385306914816 Run test_world 05:06 FreeBSD @master + 0001, + 0002 https://cirrus-ci.com/task/5635288945393664 Run test_world 02:20 For comparison Debian @master https://cirrus-ci.com/task/5143705577848832 Run test_world 02:38 In the overall, I consider this changes useful. CI run faster, with better test coverage in exchange for presumably slight increase in resource usage, but I don't think this increase should be significant. -- Best regards, Maxim Orlov.
Re: type cache cleanup improvements
Got it. Here is an updated patch where I added a corresponding comment. Thank you! Playing around I found one more place which could easily modified with hash_seq_init_with_hash_value() call. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/diff --git a/src/backend/utils/cache/attoptcache.c b/src/backend/utils/cache/attoptcache.c index af978ccd4b1..28980620662 100644 --- a/src/backend/utils/cache/attoptcache.c +++ b/src/backend/utils/cache/attoptcache.c @@ -44,12 +44,10 @@ typedef struct /* * InvalidateAttoptCacheCallback - * Flush all cache entries when pg_attribute is updated. + * Flush cache entry (or entries) when pg_attribute is updated. * * When pg_attribute is updated, we must flush the cache entry at least - * for that attribute. Currently, we just flush them all. Since attribute - * options are not currently used in performance-critical paths (such as - * query execution), this seems OK. + * for that attribute. */ static void InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue) @@ -57,7 +55,16 @@ InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue) HASH_SEQ_STATUS status; AttoptCacheEntry *attopt; - hash_seq_init(&status, AttoptCacheHash); + /* + * By convection, zero hash value is passed to the callback as a sign + * that it's time to invalidate the cache. See sinval.c, inval.c and + * InvalidateSystemCachesExtended(). + */ + if (hashvalue == 0) + hash_seq_init(&status, AttoptCacheHash); + else + hash_seq_init_with_hash_value(&status, AttoptCacheHash, hashvalue); + while ((attopt = (AttoptCacheEntry *) hash_seq_search(&status)) != NULL) { if (attopt->opts) @@ -70,6 +77,17 @@ InvalidateAttoptCacheCallback(Datum arg, int cacheid, uint32 hashvalue) } } +/* + * Hash function compatible with two-arg system cache hash function. + */ +static uint32 +relatt_cache_syshash(const void *key, Size keysize) +{ + const AttoptCacheKey* ckey = key; + + return GetSysCacheHashValue2(ATTNUM, ckey->attrelid, ckey->attnum); +} + /* * InitializeAttoptCache * Initialize the attribute options cache. @@ -82,9 +100,17 @@ InitializeAttoptCache(void) /* Initialize the hash table. */ ctl.keysize = sizeof(AttoptCacheKey); ctl.entrysize = sizeof(AttoptCacheEntry); + + /* + * AttoptCacheEntry takes hash value from the system cache. For + * AttoptCacheHash we use the same hash in order to speedup search by hash + * value. This is used by hash_seq_init_with_hash_value(). + */ + ctl.hash = relatt_cache_syshash; + AttoptCacheHash = hash_create("Attopt cache", 256, &ctl, - HASH_ELEM | HASH_BLOBS); + HASH_ELEM | HASH_FUNCTION); /* Make sure we've initialized CacheMemoryContext. */ if (!CacheMemoryContext)
Re: Statistics Import and Export
> > No, we should be keeping the lock until the end of the transaction > (which in this case would be just the one statement, but it would be the > whole statement and all of the calls in it). See analyze.c:268 or > so, where we call relation_close(onerel, NoLock); meaning we're closing > the relation but we're *not* releasing the lock on it- it'll get > released at the end of the transaction. > > If that's the case, then changing the two table_close() statements to NoLock should resolve any remaining concern.
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 11, 2024 at 3:44 PM Amit Kapila wrote: > > Yes, your understanding is correct. I wanted us to consider having new > parameters like 'inactive_replication_slot_timeout' to be at > slot-level instead of GUC. I think this new parameter doesn't seem to > be the similar as 'max_slot_wal_keep_size' which leads to truncation > of WAL at global and then invalidates the appropriate slots. OTOH, the > 'inactive_replication_slot_timeout' doesn't appear to have a similar > global effect. last_inactive_at is tracked for each slot using which slots get invalidated based on inactive_replication_slot_timeout. It's like max_slot_wal_keep_size invalidating slots based on restart_lsn. In a way, both are similar, right? > The other thing we should consider is what if the > checkpoint happens at a timeout greater than > 'inactive_replication_slot_timeout'? In such a case, the slots get invalidated upon the next checkpoint as the (current_checkpointer_timeout - last_inactive_at) will then be greater than inactive_replication_slot_timeout. > Shall, we consider doing it via > some other background process or do we think checkpointer is the best > we can have? The same problem exists if we do it with some other background process. I think the checkpointer is best because it already invalidates slots for wal_removed cause, and flushes all replication slots to disk. Moving this new invalidation functionality into some other background process such as autovacuum will not only burden that process' work but also mix up the unique functionality of that background process. Having said above, I'm open to ideas from others as I'm not so sure if there's any issue with checkpointer invalidating the slots for new reasons. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-12, Jelte Fennema-Nio wrote: > On Tue, 12 Mar 2024 at 10:19, Alvaro Herrera wrote: > > Here's a last one for the cfbot. > > Thanks for committing the first 3 patches btw. Attached a tiny change > to 0001, which adds "(backing struct for PGcancelConn)" to the comment > on pg_cancel_conn. Thanks, I included it. I hope there were no other changes, because I didn't verify :-) but if there were, please let me know to incorporate them. I made a number of other small changes, mostly to the documentation, nothing fundamental. (Someday we should stop using to document the libpq functions and use refentry's instead ... it'd be useful to have manpages for these functions.) One thing I don't like very much is release_conn_addrinfo(), which is called conditionally in two places but unconditionally in other places. Maybe it'd make more sense to put this conditionality inside the function itself, possibly with a "bool force" flag to suppress that in the cases where it is not desired. In pqConnectDBComplete, we cast the PGconn * to PGcancelConn * in order to call PQcancelPoll, which is a bit abusive, but I don't know how to do better. Maybe we just accept this ... but if PQcancelStart is the only way to have pqConnectDBStart called from a cancel connection, maybe it'd be saner to duplicate pqConnectDBStart for cancel conns. Thanks! -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: UUID v7
Hi Jelte, I am one of the contributors to this RFC. Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits (Method 1). Andrey and you simply did not read the RFC a little further down in the text: __ The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters: - Fixed-Length Dedicated Counter Seeding: - Implementations utilizing the fixed-length counter method randomly initialize the counter with each new timestamp tick. However, when the timestamp has not increased, the counter is instead incremented by the desired increment logic. When utilizing a randomly seeded counter alongside Method 1, the random value MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion of the counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bit is initialized as zero for the sole purpose of guarding against counter rollovers. - - Fixed-Length Dedicated Counter Length: - Select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision generally requires a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care must be taken to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch. - The following sub-topics cover rollover handling with either type of counter method: - ... - - Counter Rollover Handling: - Counter rollovers MUST be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability should freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter. Sergey prokhorenkosergeyprokhore...@yahoo.com.au On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio wrote: On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin wrote: > Sorry for this long and vague explanation, if it still seems too uncertain we > can have a chat or something like that. I don't think this number picking > stuff deserve to be commented, because it still is quite close to random. RFC > gives us too much freedom of choice. I thought your explanation was quite clear and I agree that this approach makes the most sense. I sent an email to the RFC authors to ask for their feedback with you (Andrey) in the CC, because even though it makes the most sense it does not comply with the either of method 1 or 2 as described in the RFC.
Re: Introduce XID age and inactive timeout based replication slot invalidation
On Mon, Mar 11, 2024 at 4:09 PM Amit Kapila wrote: > > I don't see how it will be easier for the user to choose the default > value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But, > I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be > another parameter to allow vacuum to proceed removing the rows which > otherwise it wouldn't have been as those would be required by some > slot. Now, if this understanding is correct, we should probably make > this invalidation happen by (auto)vacuum after computing the age based > on this new parameter. Currently, the patch computes the XID age in the checkpointer using the next XID (gets from ReadNextFullTransactionId()) and slot's xmin and catalog_xmin. I think the checkpointer is best because it already invalidates slots for wal_removed cause, and flushes all replication slots to disk. Moving this new invalidation functionality into some other background process such as autovacuum will not only burden that process' work but also mix up the unique functionality of that background process. Having said above, I'm open to ideas from others as I'm not so sure if there's any issue with checkpointer invalidating the slots for new reasons. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
perl: unsafe empty pattern behavior
Moved from discussion on -committers: https://postgr.es/m/0ef325fa06e7a1605c4e119c4ecb637c67e5fb4e.ca...@j-davis.com Summary: Do not use perl empty patterns like // or qr// or s//.../, the behavior is too surprising for perl non-experts. There are a few such uses in our tests; patch attached. Unfortunately, there is no obvious way to automatically detect them so I am just relying on grep. I'm sure there are others here who know more about perl than I do, so suggestions/corrections are welcome. Long version: Some may know this already, but we just discovered the dangers of using empty patterns in perl: "If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead... If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match)." https://perldoc.perl.org/perlop#The-empty-pattern-// In other words, if you have code like: if ('xyz' =~ //) { print "'xyz' matches //\n"; } The match will succeed and print, because there's no previous pattern, so // is a "genuine" empty pattern, which is treated like /.*/ (I think?). Then, if you add some other code before it: if ('abc' =~ /abc/) { print "'abc' matches /abc/\n"; } if ('xyz' =~ //) { print "'xyz' matches //\n"; } The first match will succeed, but the second match will fail, because // is treated like /abc/. On reflection, that does seem very perl-like. But it can cause surprising action-at-a-distance if not used carefully, especially for those who aren't experts in perl. It's much safer to just not use the empty pattern. If you use qr// instead: https://perldoc.perl.org/perlop#qr/STRING/msixpodualn like: if ('abc' =~ qr/abc/) { print "'abc' matches /abc/\n"; } if ('xyz' =~ qr//) { print "'xyz' matches //\n"; } Then the second match may succeed or may fail, and it's not clear from the documentation what precise circumstances matter. It seems to fail on older versions of perl (like 5.16.3) and succeed on newer versions (5.38.2). However, it may also depend on when the qr// is [re]compiled, or regex flags, or locale, or may just be undefined. Regards, Jeff Davis From be5aa677e37180a8c1b0faebcceab5506b1c8130 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Mon, 11 Mar 2024 16:44:56 -0700 Subject: [PATCH v1] perl: avoid empty regex patterns Empty patterns have special behavior that uses the last successful pattern match. This behavior can be surprising, so remove empty patterns and instead match against exactly what is intended (e.g. /^$/ or /.*/). Unfortunately there's not an easy way to check for this in an automated way, so it's likely that some cases have been missed and will be missed in the future. This commit just cleans up known instances. Discussion: https://postgr.es/m/1548559.1710188...@sss.pgh.pa.us --- src/bin/pg_upgrade/t/003_logical_slots.pl | 4 ++-- src/bin/pg_upgrade/t/004_subscription.pl| 2 +- src/bin/psql/t/001_basic.pl | 8 src/bin/psql/t/010_tab_completion.pl| 2 +- src/test/recovery/t/037_invalid_database.pl | 2 +- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_upgrade/t/003_logical_slots.pl b/src/bin/pg_upgrade/t/003_logical_slots.pl index 83d71c3084..256dfd53b1 100644 --- a/src/bin/pg_upgrade/t/003_logical_slots.pl +++ b/src/bin/pg_upgrade/t/003_logical_slots.pl @@ -78,7 +78,7 @@ command_checks_all( [ qr/max_replication_slots \(1\) must be greater than or equal to the number of logical replication slots \(2\) on the old cluster/ ], - [qr//], + [qr/^$/], 'run of pg_upgrade where the new cluster has insufficient max_replication_slots' ); ok( -d $newpub->data_dir . "/pg_upgrade_output.d", @@ -118,7 +118,7 @@ command_checks_all( [ qr/Your installation contains logical replication slots that can't be upgraded./ ], - [qr//], + [qr/^$/], 'run of pg_upgrade of old cluster with slots having unconsumed WAL records' ); diff --git a/src/bin/pg_upgrade/t/004_subscription.pl b/src/bin/pg_upgrade/t/004_subscription.pl index df5d6dffbc..c8ee2390d1 100644 --- a/src/bin/pg_upgrade/t/004_subscription.pl +++ b/src/bin/pg_upgrade/t/004_subscription.pl @@ -68,7 +68,7 @@ command_checks_all( [ qr/max_replication_slots \(0\) must be greater than or equal to the number of subscriptions \(1\) on the old cluster/ ], - [qr//], + [qr/^$/], 'run of pg_upgrade where the new cluster has insufficient max_replication_slots' ); diff --git a/src/bin/psql/t/001_basic.pl b/src/bin/psql/t/001_basic.pl index 9f0b6cf8ca..ce875ce316 100644 --- a/src/bin/psql/t/001_basic.pl +++ b/src/bin/psql/t/001_basic.pl @@ -412,23 +412,23 @@ my $perlbin = $^X; $perlbin =~ s!\\!/!g if $PostgreSQL::Test::Utils::windows_os; my $pipe_cmd = "$perlbin -pe '' >$g_file"; -psql_like($node, "SELECT 'one' \\g | $pipe_cmd", qr//, "one command \\g"); +psql_like($node, "SELECT 'one' \\g
Re: UUID v7
On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko wrote: > Andrey and you simply did not read the RFC a little further down in the text: You're totally right, sorry about that. Maybe it would be good to move those subsections around a bit in the RFC though, so that anything related to only one method is included in the section for that method.
Re: On disable_cost
Robert Haas writes: > On Thu, Aug 3, 2023 at 5:22 AM Jian Guo wrote: >> I have write an initial patch to retire the disable_cost GUC, which labeled >> a flag on the Path struct instead of adding up a big cost which is hard to >> estimate. Though it involved in tons of plan changes in regression tests, I >> have tested on some simple test cases such as eagerly generate a two-stage >> agg plans and it worked. Could someone help to review? > I took a look at this patch today. I believe that overall this may > well be an approach worth pursuing. However, more work is going to be > needed. Here are some comments: > 1. You stated that it changes lots of plans in the regression tests, > but you haven't provided any sort of analysis of why those plans > changed. I'm kind of surprised that there would be "tons" of plan > changes. You (or someone) should look into why that's happening. I've not read the patch, but given this description I would expect there to be *zero* regression changes --- I don't think we have any test cases that depend on disable_cost being finite. If there's more than zero changes, that very likely indicates a bug in the patch. Even if there are places where the output legitimately changes, you need to justify each one and make sure that you didn't invalidate the intent of that test case. BTW, having written that paragraph, I wonder if we couldn't get the same end result with a nearly one-line change that consists of making disable_cost be IEEE infinity. Years ago we didn't want to rely on IEEE float semantics in this area, but nowadays I don't see why we shouldn't. regards, tom lane
Re: perl: unsafe empty pattern behavior
On 2024-Mar-12, Jeff Davis wrote: > Do not use perl empty patterns like // or qr// or s//.../, the behavior > is too surprising for perl non-experts. Yeah, nasty. > There are a few such uses in > our tests; patch attached. Unfortunately, there is no obvious way to > automatically detect them so I am just relying on grep. I'm sure there > are others here who know more about perl than I do, so > suggestions/corrections are welcome. I suggest that pg_dump/t/002_pg_dump.pl could use a verification that the ->{regexp} thing is not empty. I also tried grepping (for things like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you have ... but I only looked for the "qr" literal, not other ways to get regexes. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Hmm, buildfarm member kestrel (which uses -fsanitize=undefined,alignment) failed: # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc dbname='postgres' test cancellations... libpq_pipeline:260: query did not fail when it was expected https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2024-03-12%2016%3A41%3A27 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom." (Isaac Asimov)
Re: On disable_cost
On Tue, Mar 12, 2024 at 1:32 PM Tom Lane wrote: > BTW, having written that paragraph, I wonder if we couldn't get > the same end result with a nearly one-line change that consists of > making disable_cost be IEEE infinity. Years ago we didn't want > to rely on IEEE float semantics in this area, but nowadays I don't > see why we shouldn't. I don't think so, because I think that what will happen in that case is that we'll pick a completely random plan if we can't pick a plan that avoids incurring disable_cost. Every plan that contains one disabled node anywhere in the plan tree will look like it has exactly the same cost as any other such plan. IMHO, this is actually one of the problems with disable_cost as it works today. I think the semantics that we want are: if it's possible to pick a plan where nothing is disabled, then pick the cheapest such plan; if not, pick the cheapest plan overall. But treating disable_cost doesn't really do that. It does the first part -- picking the cheapest plan where nothing is disabled -- but it doesn't do the second part, because once you add disable_cost into the cost of some particular plan node, it screws up the rest of the planning, because the cost estimates for the disabled nodes have no bearing in reality. Fast-start plans, for example, will look insanely good compared to what would be the case in normal planning (and we lean too much toward fast-start plans even normally). (I don't think we should care how MANY disabled nodes appear in a plan, particularly. This is a more arguable point. Is a plan with 1 disabled node and 10% more cost better or worse than a plan with 2 disabled nodes and 10% less cost? I'd argue that counting the number of disabled nodes isn't particularly meaningful.) -- Robert Haas EDB: http://www.enterprisedb.com
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On 2024-Mar-12, Alvaro Herrera wrote: > Hmm, buildfarm member kestrel (which uses > -fsanitize=undefined,alignment) failed: > > # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc > dbname='postgres' > test cancellations... > libpq_pipeline:260: query did not fail when it was expected Hm, I tried using the same compile flags, couldn't reproduce. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Pido que me den el Nobel por razones humanitarias" (Nicanor Parra)
Support json_errdetail in FRONTEND builds
Hello, Both the incremental JSON [1] and OAuth [2] patchsets would be improved by json_errdetail(), which was removed from FRONTEND builds in b44669b2ca: >The routine providing a detailed error message based on the error code >is made backend-only, the existing code being unsafe to use in the >frontend as the error message may finish by being palloc'd or point to a >static string, so there is no way to know if the memory of the message >should be pfree'd or not. Attached is a patch to undo this, by attaching any necessary allocations to the JsonLexContext so the caller doesn't have to keep track of them. This is based on the first patch of the OAuth patchset, which additionally needs json_errdetail() to be safe to use from libpq itself. Alvaro pointed out offlist that we don't have to go that far to re-enable this function for the utilities, so this patch is a sort of middle ground between what we have now and what OAuth implements. (There is some additional minimization that could be done to this patch, but I'm hoping to keep the code structure consistent between the two, if the result is acceptable.) Two notes that I wanted to point out explicitly: - On the other thread, Daniel contributed a destroyStringInfo() counterpart for makeStringInfo(), which is optional but seemed useful to include here. - After this patch, if a frontend client calls json_errdetail() without calling freeJsonLexContext(), it will leak memory. Thanks, --Jacob [1] https://www.postgresql.org/message-id/682c8fff-355c-a04f-57ac-81055c4ccda8%40dunslane.net [2] https://www.postgresql.org/message-id/CAOYmi%2BkKNZCL7uz-LHyBggM%2BfEcf4285pFWwm7spkUb8irY7mQ%40mail.gmail.com 0001-common-jsonapi-support-json_errdetail-in-FRONTEND.patch Description: Binary data
Broken EXPLAIN output for SubPlan in MERGE
While playing around with EXPLAIN and SubPlans, I noticed that there's a bug in how this is handled for MERGE. For example: drop table if exists src, tgt, ref; create table src (a int, b text); create table tgt (a int, b text); create table ref (a int); explain (verbose, costs off) merge into tgt t using (select (select r.a from ref r where r.a = s.a) a, b from src s) s on t.a = s.a when not matched then insert values (s.a, s.b); QUERY PLAN --- Merge on public.tgt t -> Merge Left Join Output: t.ctid, s.a, s.b, s.ctid Merge Cond: (((SubPlan 1)) = t.a) -> Sort Output: s.a, s.b, s.ctid, ((SubPlan 1)) Sort Key: ((SubPlan 1)) -> Seq Scan on public.src s Output: s.a, s.b, s.ctid, (SubPlan 1) SubPlan 1 -> Seq Scan on public.ref r Output: r.a Filter: (r.a = s.a) -> Sort Output: t.ctid, t.a Sort Key: t.a -> Seq Scan on public.tgt t Output: t.ctid, t.a SubPlan 2 -> Seq Scan on public.ref r_1 Output: r_1.a Filter: (r_1.a = t.ctid) The final filter condition "(r_1.a = t.ctid)" is incorrect, and should be "(r_1.a = s.a)". What's happening is that the right hand side of that filter expression is an input Param node which get_parameter() tries to display by calling find_param_referent() and then drilling down through the ancestor node (the ModifyTable node) to try to find the real name of the variable (s.a). However, that isn't working properly for MERGE because the inner_plan and inner_tlist of the corresponding deparse_namespace aren't set correctly. Actually the inner_tlist is correct, but the inner_plan is set to the ModifyTable node, whereas it needs to be the outer child node -- in a MERGE, any references to the source relation will be INNER_VAR references to the targetlist of the join node immediately under the ModifyTable node. So I think we want to do something like the attached. Regards, Dean diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c new file mode 100644 index 2a1ee69..2231752 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -4988,8 +4988,11 @@ set_deparse_plan(deparse_namespace *dpns * For a WorkTableScan, locate the parent RecursiveUnion plan node and use * that as INNER referent. * - * For MERGE, make the inner tlist point to the merge source tlist, which - * is same as the targetlist that the ModifyTable's source plan provides. + * For MERGE, pretend the ModifyTable's source plan (its outer plan) is + * INNER referent. This is the join from the target relation to the data + * source, and all INNER_VAR Vars in other parts of the query refer to its + * targetlist. + * * For ON CONFLICT .. UPDATE we just need the inner tlist to point to the * excluded expression's tlist. (Similar to the SubqueryScan we don't want * to reuse OUTER, it's used for RETURNING in some modify table cases, @@ -5004,17 +5007,17 @@ set_deparse_plan(deparse_namespace *dpns dpns->inner_plan = find_recursive_union(dpns, (WorkTableScan *) plan); else if (IsA(plan, ModifyTable)) - dpns->inner_plan = plan; - else - dpns->inner_plan = innerPlan(plan); - - if (IsA(plan, ModifyTable)) { if (((ModifyTable *) plan)->operation == CMD_MERGE) - dpns->inner_tlist = dpns->outer_tlist; + dpns->inner_plan = outerPlan(plan); else - dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist; + dpns->inner_plan = plan; } + else + dpns->inner_plan = innerPlan(plan); + + if (IsA(plan, ModifyTable) && ((ModifyTable *) plan)->operation == CMD_INSERT) + dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist; else if (dpns->inner_plan) dpns->inner_tlist = dpns->inner_plan->targetlist; else diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out new file mode 100644 index e3ebf46..1a6f6ad --- a/src/test/regress/expected/merge.out +++ b/src/test/regress/expected/merge.out @@ -1473,6 +1473,56 @@ WHEN MATCHED AND t.a < 10 THEN DROP TABLE ex_msource, ex_mtarget; DROP FUNCTION explain_merge(text); +-- EXPLAIN SubPlans and InitPlans +CREATE TABLE src (a int, b int, c int, d int); +CREATE TABLE tgt (a int, b int, c int, d int); +CREATE TABLE ref (ab int, cd int); +EXPLAIN (verbose, costs off) +MERGE INTO tgt t +USING (SELECT *, (SELECT count(*) FROM ref r + WHERE r.ab = s.a + s.b + AND r.cd = s.c - s.d) cnt + FROM src s) s +ON t.a = s.a AND t.b < s.cnt +WHEN MATCHED AND t.c > s.cnt THEN + UPDATE SET (b, c) = (SELECT s.b, s.cnt); + QUERY PLAN +-
typo in paths.h
Hello I noticed that the comment for declaring create_tidscan_paths() in src/include/optimizer/paths.h has a typo. The function is implemented in tidpath.c, not tidpath.h as stated, which does not exist. Made a small patch to correct it. Thank you Cary Huang - HighGo Software Inc. (Canada) mailto:cary.hu...@highgo.ca http://www.highgo.ca correct_source_filename.patch Description: Binary data
Re: On disable_cost
Robert Haas writes: > On Tue, Mar 12, 2024 at 1:32 PM Tom Lane wrote: >> BTW, having written that paragraph, I wonder if we couldn't get >> the same end result with a nearly one-line change that consists of >> making disable_cost be IEEE infinity. > I don't think so, because I think that what will happen in that case > is that we'll pick a completely random plan if we can't pick a plan > that avoids incurring disable_cost. Every plan that contains one > disabled node anywhere in the plan tree will look like it has exactly > the same cost as any other such plan. Good point. > IMHO, this is actually one of the problems with disable_cost as it > works today. Yeah. I keep thinking that the right solution is to not generate disabled paths in the first place if there are any other ways to produce the same relation. That has obvious order-of-operations problems though, and I've not been able to make it work. regards, tom lane
Re: On disable_cost
On Tue, Mar 12, 2024 at 3:36 PM Tom Lane wrote: > Yeah. I keep thinking that the right solution is to not generate > disabled paths in the first place if there are any other ways to > produce the same relation. That has obvious order-of-operations > problems though, and I've not been able to make it work. I've expressed the same view in the past. It would be nice not to waste planner effort on paths that we're just going to throw away, but I'm not entirely sure what you mean by "obvious order-of-operations problems." To me, it seems like what we'd need is to be able to restart the whole planner process if we run out of steam before we get done. For example, suppose we're planning a 2-way join where index and index-only scans are disabled, sorts are disabled, and nested loops and hash joins are disabled. There's no problem generating just the non-disabled scan types at the baserel level, but when we reach the join, we're going to find that the only non-disabled join type is a merge join, and we're also going to find that we have no paths that provide pre-sorted input, so we need to sort, which we're also not allowed to do. If we could give up at that point and restart planning, disabling all of the plan-choice constraints and now creating all paths for each RelOptInfo, then everything would, I believe, be just fine. We'd end up needing neither disable_cost nor the mechanism proposed by this patch. But in the absence of that, we need some way to privilege the non-disabled paths over the disabled ones -- and I'd prefer to have something more principled than disable_cost, if we can work out the details. -- Robert Haas EDB: http://www.enterprisedb.com
Re: typo in paths.h
On Wed, 13 Mar 2024 at 07:58, Cary Huang wrote: > I noticed that the comment for declaring create_tidscan_paths() in > src/include/optimizer/paths.h has a typo. The function is implemented in > tidpath.c, not tidpath.h as stated, which does not exist. Thank you. Pushed. David
Re: remaining sql/json patches
About 0002: I think we should just drop it. Look at the changes it produces in the plans for aliases XMLTABLE: > @@ -1556,7 +1556,7 @@ SELECT f.* FROM xmldata, LATERAL > xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COU > Output: f."COUNTRY_NAME", f."REGION_ID" > -> Seq Scan on public.xmldata > Output: xmldata.data > - -> Table Function Scan on "xmltable" f > + -> Table Function Scan on "XMLTABLE" f > Output: f."COUNTRY_NAME", f."REGION_ID" > Table Function Call: XMLTABLE(('/ROWS/ROW[COUNTRY_NAME="Japan" or > COUNTRY_NAME="India"]'::text) PASSING (xmldata.data) COLUMNS "COUNTRY_NAME" > text, "REGION_ID" integer) > Filter: (f."COUNTRY_NAME" = 'Japan'::text) Here in text-format EXPLAIN, we already have the alias next to the "xmltable" moniker, when an alias is present. This matches the query itself as well as the labels used in the "Output:" display. If an alias is not present, then this says just 'Table Function Scan on "xmltable"' and the rest of the plans refers to this as "xmltable", so it's also fine. > @@ -1591,7 +1591,7 @@ SELECT f.* FROM xmldata, LATERAL > xmltable('/ROWS/ROW[COUNTRY_NAME="Japan" or COU > "Parent Relationship": "Inner", > > + > "Parallel Aware": false, > > + > "Async Capable": false, > > + > - "Table Function Name": "xmltable", > > + > + "Table Function Name": "XMLTABLE", > > + > "Alias": "f", > > + > "Output": ["f.\"COUNTRY_NAME\"", "f.\"REGION_ID\""], > > + > "Table Function Call": > "XMLTABLE(('/ROWS/ROW[COUNTRY_NAME=\"Japan\" or > COUNTRY_NAME=\"India\"]'::text) PASSING (xmldata.data) COLUMNS > \"COUNTRY_NAME\" text, \"REGION_ID\" integer)",+ This is the JSON-format explain. Notice that the "Alias" member already shows the alias "f", so the only thing this change is doing is uppercasing the "xmltable" to "XMLTABLE". We're not really achieving anything here. I think the only salvageable piece from this, **if anything**, is making the "xmltable" literal string into uppercase. That might bring a little clarity to the fact that this is a keyword and not a user-introduced name. In your 0003 I think this would only have relevance in this query, +-- JSON_TABLE() with alias +EXPLAIN (COSTS OFF, VERBOSE) +SELECT * FROM + JSON_TABLE( + jsonb 'null', 'lax $[*]' PASSING 1 + 2 AS a, json '"foo"' AS "b c" + COLUMNS ( + id FOR ORDINALITY, + "int" int PATH '$', + "text" text PATH '$' + )) json_table_func; + QUERY PLAN +-- -- + Table Function Scan on "JSON_TABLE" json_table_func + Output: id, "int", text + Table Function Call: JSON_TABLE('null'::jsonb, '$[*]' AS json_table_path_0 PASSING 3 AS a, '"foo"'::jsonb AS "b c" COLUMNS (id FOR ORDINALITY, "int" integer PATH '$', text text PATH '$') PLAN (json_table_path_0)) +(3 rows) and I'm curious to see what this would output if this was to be run without the 0002 patch. If I understand things correctly, the alias would be displayed anyway, meaning 0002 doesn't get us anything. Please do add a test with EXPLAIN (FORMAT JSON) in 0003. Thanks -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La vida es para el que se aventura"
Re: On disable_cost
On Wed, 13 Mar 2024 at 08:55, Robert Haas wrote: > But in the absence of that, we need some way to privilege the > non-disabled paths over the disabled ones -- and I'd prefer to have > something more principled than disable_cost, if we can work out the > details. The primary place I see issues with disabled_cost is caused by STD_FUZZ_FACTOR. When you add 1.0e10 to a couple of modestly costly paths, it makes them appear fuzzily the same in cases where one could be significantly cheaper than the other. If we were to bump up the disable_cost it would make this problem worse. I think we do still need some way to pick the cheapest disabled path when there are no other options. One way would be to set fuzz_factor to 1.0 when either of the paths costs in compare_path_costs_fuzzily() is >= disable_cost. clamp_row_est() does cap row estimates at MAXIMUM_ROWCOUNT (1e100), so I think there is some value of disable_cost that could almost certainly ensure we don't reach it because the path is truly expensive rather than disabled. So maybe the fix could be to set disable_cost to something like 1.0e110 and adjust compare_path_costs_fuzzily to not apply the fuzz_factor for paths >= disable_cost. However, I wonder if that risks the costs going infinite after a couple of cartesian joins. David
Re: un-revert the MAINTAIN privilege and the pg_maintain predefined role
On Thu, Mar 07, 2024 at 10:50:00AM -0600, Nathan Bossart wrote: > Given all of this code was previously reviewed and committed, I am planning > to forge ahead and commit this early next week, provided no objections or > additional feedback materialize. Jeff Davis and I spent some additional time looking at this patch. There are existing inconsistencies among the privilege checks for the various maintenance commands, and the MAINTAIN privilege just builds on the status quo, with one exception. In the v1 patch, I proposed skipping privilege checks when VACUUM recurses to TOAST tables, which means that a user may be able to process a TOAST table for which they've concurrent lost privileges on the main relation (since each table is vacuumed in a separate transaction). It's easy enough to resolve this inconsistency by sending down the parent OID when recursing to a TOAST table and using that for the privilege checks. AFAICT this avoids any kind of cache lookup hazards because we hold a session lock on the main relation in this case. I've done this in the attached v2. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1d58f64b1dcded53bd95c926c5476e129352c753 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 12 Mar 2024 10:55:33 -0500 Subject: [PATCH v2 1/1] Reintroduce MAINTAIN privilege and pg_maintain predefined role. Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation. Roles with privileges of pg_maintain may run those same commands on all relations. This was previously committed for v16, but it was reverted in commit 151c22deee due to concerns about search_path tricks that could be used to escalate privileges to the table owner. Commits 2af07e2f74, 59825d1639, and c7ea3f4229 resolved these concerns by restricting search_path when running maintenance commands. XXX: NEEDS CATVERSION BUMP Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13 --- doc/src/sgml/ddl.sgml | 35 -- doc/src/sgml/func.sgml| 2 +- .../sgml/ref/alter_default_privileges.sgml| 4 +- doc/src/sgml/ref/analyze.sgml | 6 +- doc/src/sgml/ref/cluster.sgml | 10 +- doc/src/sgml/ref/grant.sgml | 3 +- doc/src/sgml/ref/lock.sgml| 4 +- .../sgml/ref/refresh_materialized_view.sgml | 5 +- doc/src/sgml/ref/reindex.sgml | 23 ++-- doc/src/sgml/ref/revoke.sgml | 2 +- doc/src/sgml/ref/vacuum.sgml | 6 +- doc/src/sgml/user-manag.sgml | 12 ++ src/backend/catalog/aclchk.c | 15 +++ src/backend/commands/analyze.c| 13 +- src/backend/commands/cluster.c| 43 +-- src/backend/commands/indexcmds.c | 34 ++--- src/backend/commands/lockcmds.c | 2 +- src/backend/commands/matview.c| 3 +- src/backend/commands/tablecmds.c | 18 +-- src/backend/commands/vacuum.c | 76 +++- src/backend/postmaster/autovacuum.c | 1 + src/backend/utils/adt/acl.c | 8 ++ src/bin/pg_dump/dumputils.c | 1 + src/bin/pg_dump/t/002_pg_dump.pl | 2 +- src/bin/psql/tab-complete.c | 6 +- src/include/catalog/pg_authid.dat | 5 + src/include/commands/tablecmds.h | 5 +- src/include/commands/vacuum.h | 5 +- src/include/nodes/parsenodes.h| 3 +- src/include/utils/acl.h | 5 +- .../expected/cluster-conflict-partition.out | 8 +- .../specs/cluster-conflict-partition.spec | 2 +- .../perl/PostgreSQL/Test/AdjustUpgrade.pm | 11 ++ src/test/regress/expected/cluster.out | 7 ++ src/test/regress/expected/create_index.out| 4 +- src/test/regress/expected/dependency.out | 22 ++-- src/test/regress/expected/privileges.out | 116 ++ src/test/regress/expected/rowsecurity.out | 34 ++--- src/test/regress/sql/cluster.sql | 5 + src/test/regress/sql/dependency.sql | 2 +- src/test/regress/sql/privileges.sql | 67 ++ 41 files changed, 456 insertions(+), 179 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 9d7e2c756b..8616a8e9cc 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1868,8 +1868,8 @@ ALTER TABLE products RENAME TO items; INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER, CREATE, CONNECT, TEMPORARY, - EXECUTE, USAGE, SET - and ALTER SYSTEM. + EXECUTE, USAGE, SET, + ALTER SYSTEM, and MAINTAIN. The privileges applicable to a particular object vary depending on the object's type (table, function, etc.). More detail about the meanings
Re: On disable_cost
David Rowley writes: > So maybe the fix could be to set disable_cost to something like > 1.0e110 and adjust compare_path_costs_fuzzily to not apply the > fuzz_factor for paths >= disable_cost. However, I wonder if that > risks the costs going infinite after a couple of cartesian joins. Perhaps. It still does nothing for Robert's point that once we're forced into using a "disabled" plan type, it'd be better if the disabled-ness didn't skew subsequent planning choices. On the whole I agree that getting rid of disable_cost entirely would be the way to go, if we can replace that with a separate boolean without driving up the cost of add_path too much. regards, tom lane
Re: Vectored I/O in bulk_write.c
One more observation while I'm thinking about bulk_write.c... hmm, it writes the data out and asks the checkpointer to fsync it, but doesn't call smgrwriteback(). I assume that means that on Linux the physical writeback sometimes won't happen until the checkpointer eventually calls fsync() sequentially, one segment file at a time. I see that it's difficult to decide how to do that though; unlike checkpoints, which have rate control/spreading, bulk writes could more easily flood the I/O subsystem in a burst. I expect most non-Linux systems' write-behind heuristics to fire up for bulk sequential writes, but on Linux where most users live, there is no write-behind heuristic AFAIK (on the most common file systems anyway), so you have to crank that handle if you want it to wake up and smell the coffee before it hits internal limits, but then you have to decide how fast to crank it. This problem will come into closer focus when we start talking about streaming writes. For the current non-streaming bulk_write.c coding, I don't have any particular idea of what to do about that, so I'm just noting the observation here. Sorry for the sudden wall of text/monologue; this is all a sort of reaction to bulk_write.c that I should perhaps have sent to the bulk_write.c thread, triggered by a couple of debugging sessions.
Recent 027_streaming_regress.pl hangs
Hi, Several animals are timing out while waiting for catchup, sporadically. I don't know why. The oldest example I have found so far by clicking around is: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-02-23%2015%3A44%3A35 So perhaps something was committed ~3 weeks ago triggered this. There are many examples since, showing as recoveryCheck failures. Apparently they are all on animals wrangled by Andres. Hmm. I think some/all share a physical host, they seem to have quite high run time variance, and they're using meson.
Re: Spurious pgstat_drop_replslot() call
On Tue, Mar 12, 2024 at 12:00:00PM +0300, Alexander Lakhin wrote: > Sorry for a bit off-topic, but I've remembered an anomaly with a similar > assert: > https://www.postgresql.org/message-id/17947-b9554521ad963c9c%40postgresql.org Thanks for the reminder. The invalidation path with the stats drop is only in 16~. > Maybe you would find it worth considering while working in this area... > (I've just run that reproducer on b36fbd9f8 and confirmed that the > assertion failure is still here.) Indeed, something needs to happen. I am not surprised that it still reproduces; nothing has changed with the locking of the stats entries. :/ -- Michael signature.asc Description: PGP signature
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Tue, 12 Mar 2024 at 19:28, Alvaro Herrera wrote: > > On 2024-Mar-12, Alvaro Herrera wrote: > > > Hmm, buildfarm member kestrel (which uses > > -fsanitize=undefined,alignment) failed: > > > > # Running: libpq_pipeline -r 700 cancel port=49975 host=/tmp/dFh46H7YGc > > dbname='postgres' > > test cancellations... > > libpq_pipeline:260: query did not fail when it was expected > > Hm, I tried using the same compile flags, couldn't reproduce. Okay, it passed now it seems so I guess this test is flaky somehow. The error message and the timing difference between failed and succeeded buildfarm run clearly indicates that the pg_sleep ran its 180 seconds to completion (so cancel was never processed for some reason). **failed case** 282/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline ERROR 191.56s exit status 1 **succeeded case** 252/285 postgresql:libpq_pipeline / libpq_pipeline/001_libpq_pipeline OK 10.01s 21 subtests passed I don't see any obvious reason for how this test can be flaky, but I'll think a bit more about it tomorrow.
Re: perl: unsafe empty pattern behavior
On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote: > I suggest that pg_dump/t/002_pg_dump.pl could use a verification that > the ->{regexp} thing is not empty. I'm not sure how exactly to test for an empty pattern. The problem is, we don't really want to test for an empty pattern, because /(?^:)/ is fine. The problem is //, which gets turned into an actual pattern (perhaps empty or perhaps not), and by the time it's in the %tests hash, I think it's too late to distinguish. Again, I'm not a perl expert, so suggestions welcome. > I also tried grepping (for things > like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you > have ... but I only looked for the "qr" literal, not other ways to > get > regexes. I think that's fine. qr// seems the most dangerous, because it seems to behave differently in different versions of perl. Grepping for regexes in perl code is an "interesting" exercise. Regards, Jeff Davis
Re: perl: unsafe empty pattern behavior
Jeff Davis writes: > On Tue, 2024-03-12 at 18:53 +0100, Alvaro Herrera wrote: >> I also tried grepping (for things >> like qr{}, qr[], qr||, qr!!) and didn't find anything beyond what you >> have ... but I only looked for the "qr" literal, not other ways to >> get regexes. > I think that's fine. qr// seems the most dangerous, because it seems to > behave differently in different versions of perl. I wonder whether perlcritic has sufficiently deep understanding of Perl code that it could find these hazards. I already checked, and found that there's no built-in filter for this (at least not in the perlcritic version I have), but maybe we could write one? The rules seem to be plug-in modules, so you can make your own in principle. regards, tom lane
Re: Add new error_action COPY ON_ERROR "log"
On Tue, Mar 12, 2024 at 12:54:29PM +0530, Bharath Rupireddy wrote: > +1. But, do you want to add them now or later as a separate > patch/discussion altogether? The attached 0003 is what I had in mind: - Simplification of the LOG generated with quotes applied around the column name, don't see much need to add the relation name, either, for consistency and because the knowledge is known in the query. - A few more tests. - Some doc changes. >> Wouldn't it be better to squash the patches together, by the way? > > I guess not. They are two different things IMV. Well, 0001 is sitting doing nothing because the COPY code does not make use of it internally. -- Michael From c45474726e084faf876a319485995ce84eef8293 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Mon, 11 Mar 2024 11:37:49 + Subject: [PATCH v6 1/3] Add LOG_VERBOSITY option to COPY command This commit adds a new option LOG_VERBOSITY to set the verbosity of logged messages by COPY command. A value of 'verbose' can be used to emit more informative messages by the command, while the value of 'default (which is the default) can be used to not log any additional messages. More values such as 'terse', 'row_details' etc. can be added based on the need to the LOG_VERBOSITY option. An upcoming commit for emitting more info on soft errors by COPY FROM command with ON_ERROR 'ignore' uses this. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Masahiko Sawada Reviewed-by: Atsushi Torikoshi Discussion: https://www.postgresql.org/message-id/CALj2ACXNA0focNeriYRvQQaCGc4CsTuOnFbzF9LqTKNWxuJdhA%40mail.gmail.com --- src/include/commands/copy.h | 10 src/backend/commands/copy.c | 38 + src/bin/psql/tab-complete.c | 6 - src/test/regress/expected/copy2.out | 8 ++ src/test/regress/sql/copy2.sql | 2 ++ doc/src/sgml/ref/copy.sgml | 14 +++ src/tools/pgindent/typedefs.list| 1 + 7 files changed, 78 insertions(+), 1 deletion(-) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index b3da3cb0be..99d183fa4d 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -40,6 +40,15 @@ typedef enum CopyOnErrorChoice COPY_ON_ERROR_IGNORE, /* ignore errors */ } CopyOnErrorChoice; +/* + * Represents verbosity of logged messages by COPY command. + */ +typedef enum CopyLogVerbosityChoice +{ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ +} CopyLogVerbosityChoice; + /* * A struct to hold COPY options, in a parsed form. All of these are related * to formatting, except for 'freeze', which doesn't really belong here, but @@ -73,6 +82,7 @@ typedef struct CopyFormatOptions bool *force_null_flags; /* per-column CSV FN flags */ bool convert_selectively; /* do selective binary conversion? */ CopyOnErrorChoice on_error; /* what to do when error happened */ + CopyLogVerbosityChoice log_verbosity; /* verbosity of logged messages */ List *convert_select; /* list of column names (can be NIL) */ } CopyFormatOptions; diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 056b6733c8..23eb8c9c79 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -428,6 +428,36 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from) return COPY_ON_ERROR_STOP; /* keep compiler quiet */ } +/* + * Extract a CopyLogVerbosityChoice value from a DefElem. + */ +static CopyLogVerbosityChoice +defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) +{ + char *sval; + + /* + * If no parameter value given, assume the default value. + */ + if (def->arg == NULL) + return COPY_LOG_VERBOSITY_DEFAULT; + + /* + * Allow "default", or "verbose" values. + */ + sval = defGetString(def); + if (pg_strcasecmp(sval, "default") == 0) + return COPY_LOG_VERBOSITY_DEFAULT; + if (pg_strcasecmp(sval, "verbose") == 0) + return COPY_LOG_VERBOSITY_VERBOSE; + + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY LOG_VERBOSITY \"%s\" not recognized", sval), + parser_errposition(pstate, def->location))); + return COPY_LOG_VERBOSITY_DEFAULT; /* keep compiler quiet */ +} + /* * Process the statement option list for COPY. * @@ -454,6 +484,7 @@ ProcessCopyOptions(ParseState *pstate, bool freeze_specified = false; bool header_specified = false; bool on_error_specified = false; + bool log_verbosity_specified = false; ListCell *option; /* Support external use for option sanity checking */ @@ -613,6 +644,13 @@ ProcessCopyOptions(ParseState *pstate, on_error_specified = true; opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from); } + else if (strcmp(defel->defname, "log_verbosity") == 0) + { + if (log_verbosity_specified) +errorConflictingDefElem(defel, pstate); + log_verbosity_specified = true; + opts_out->
Re: meson: Specify -Wformat as a common warning flag for extensions
Hi, In "Re: meson: Specify -Wformat as a common warning flag for extensions" on Fri, 08 Mar 2024 10:05:27 -0600, "Tristan Partin" wrote: > Ok, I figured this out. -Wall implies -Wformat=1. We set warning_level > to 1 in the Meson project() call, which implies -Wall, and set -Wall > in CFLAGS for autoconf. That's the reason we don't get issues building > Postgres. A user making use of the pg_config --cflags option, as Sutou > is, *will* run into the aforementioned issues, since we don't > propogate -Wall into pg_config. > > $ gcc $(pg_config --cflags) -E - < /dev/null > /dev/null > cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’ > [-Wformat-security] > $ gcc -Wall $(pg_config --cflags) -E - < /dev/null > /dev/null > (nothing printed) Thanks for explaining this. You're right. This is the reason why we don't need this for PostgreSQL itself but we need this for PostgreSQL extensions. Sorry. I should have explained this in the first e-mail... What should we do to proceed this patch? Thanks, -- kou
Re: Disable LLVM bitcode generation with pgxs.mk framework.
> On Tue, Mar 12, 2024 at 10:40 PM Daniel Gustafsson wrote: > > > On 12 Mar 2024, at 14:38, Xing Guo wrote: > > > Would it be possible to add a new switch in the pgxs.mk framework to > > allow users to disable this feature? > > Something like that doesn't seem unreasonable I think. Thanks. I added a new option NO_LLVM_BITCODE to pgxs. I'm not sure if the name is appropriate. > -- > Daniel Gustafsson > From e19a724fad4949bef9bc4d0f8e58719607d979be Mon Sep 17 00:00:00 2001 From: Xing Guo Date: Wed, 13 Mar 2024 07:56:46 +0800 Subject: [PATCH v1] Add NO_LLVM_BITCODE option to pgxs. This patch adds a new option NO_LLVM_BITCODE to pgxs to allow user to disable LLVM bitcode generation completely even if the PostgreSQL installation is configured with --with-llvm. --- doc/src/sgml/extend.sgml | 9 + src/makefiles/pgxs.mk| 6 ++ 2 files changed, 15 insertions(+) diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml index 218940ee5c..6fe69746c2 100644 --- a/doc/src/sgml/extend.sgml +++ b/doc/src/sgml/extend.sgml @@ -1719,6 +1719,15 @@ include $(PGXS) + + NO_LLVM_BITCODE + + +don't generate LLVM bitcode even if the current PostgreSQL installation is configured with --with-llvm + + + + EXTRA_CLEAN diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index 0de3737e78..ec6a3c1f09 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -53,6 +53,8 @@ # that don't need their build products to be installed # NO_INSTALLCHECK -- don't define an installcheck target, useful e.g. if # tests require special configuration, or don't use pg_regress +# NO_LLVM_BITCODE -- don't generate LLVM bitcode even if the current +# PostgreSQL installation is configured with --with-llvm # EXTRA_CLEAN -- extra files to remove in 'make clean' # PG_CPPFLAGS -- will be prepended to CPPFLAGS # PG_CFLAGS -- will be appended to CFLAGS @@ -218,6 +220,10 @@ endef all: $(PROGRAM) $(DATA_built) $(HEADER_allbuilt) $(SCRIPTS_built) $(addsuffix $(DLSUFFIX), $(MODULES)) $(addsuffix .control, $(EXTENSION)) +ifdef NO_LLVM_BITCODE +with_llvm := no +endif # NO_LLVM_BITCODE + ifeq ($(with_llvm), yes) all: $(addsuffix .bc, $(MODULES)) $(patsubst %.o,%.bc, $(OBJS)) endif -- 2.44.0
Re: Add basic tests for the low-level backup method.
On 2/29/24 16:55, Michael Paquier wrote: On Thu, Feb 29, 2024 at 10:30:52AM +1300, David Steele wrote: On 11/12/23 08:21, David Steele wrote: As promised in [1], attached are some basic tests for the low-level backup method. Added to the 2024-03 CF. There is already 040_standby_failover_slots_sync.pl in recovery/ that uses the number of your test script. You may want to bump it, that's a nit. Renamed to 042_low_level_backup.pl. +unlink("$backup_dir/postmaster.pid") + or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid"); +unlink("$backup_dir/postmaster.opts") + or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts"); +unlink("$backup_dir/global/pg_control") + or BAIL_OUT("unable to unlink $backup_dir/global/pg_control"); RELCACHE_INIT_FILENAME as well? I'm not trying to implement the full exclusion list here, just enough to get the test working. Since exclusions are optional according to the docs I don't think we need them for a valid tests. +# Rather than writing out backup_label, try to recover the backup without +# backup_label to demonstrate that recovery will not work correctly without it, +# i.e. the canary table will be missing and the cluster will be corrupt. Provide +# only the WAL segment that recovery will think it needs. Okay, why not. No objections to this addition. I am a bit surprised that this is not scanning some of the logs produced by the startup process for particular patterns. Not sure what to look for here. There are no distinct messages for crash recovery. Perhaps there should be? +# Save backup_label into the backup directory and recover using the primary's +# archive. This time recovery will succeed and the canary table will be +# present. Here are well, I think that we should add some log_contains() with pre-defined patterns to show that recovery has completed the way we want it with a backup_label up to the end-of-backup record. Sure, I added a check for the new log message when recovering with a backup_label. Regards, -DavidFrom 076429a578ece3c213525220a4961cb5b56c190a Mon Sep 17 00:00:00 2001 From: David Steele Date: Wed, 13 Mar 2024 00:01:55 + Subject: Add basic tests for the low-level backup method. There are currently no tests for the low-level backup method. pg_backup_start() and pg_backup_stop() are called but not exercised in any real fashion. There is a lot more that can be done, but this at least provides some basic tests and provides a place for future improvement. --- src/test/recovery/t/042_low_level_backup.pl | 108 1 file changed, 108 insertions(+) create mode 100644 src/test/recovery/t/042_low_level_backup.pl diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl new file mode 100644 index 00..22668bdc0d --- /dev/null +++ b/src/test/recovery/t/042_low_level_backup.pl @@ -0,0 +1,108 @@ + +# Copyright (c) 2023, PostgreSQL Global Development Group + +# Test low-level backup method by using pg_backup_start() and pg_backup_stop() +# to create backups. + +use strict; +use warnings; + +use File::Copy qw(copy); +use File::Path qw(remove_tree); +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# Start primary node with archiving +my $node_primary = PostgreSQL::Test::Cluster->new('primary'); +$node_primary->init(has_archiving => 1, allows_streaming => 1); +$node_primary->start; + +# Start backup +my $backup_name = 'backup1'; +my $psql = $node_primary->background_psql('postgres'); + +$psql->query_safe("SET client_min_messages TO WARNING"); +$psql->set_query_timer_restart(); +$psql->query_safe("select pg_backup_start('test label')"); + +# Copy files +my $backup_dir = $node_primary->backup_dir() . '/' . $backup_name; + +PostgreSQL::Test::RecursiveCopy::copypath( + $node_primary->data_dir(), $backup_dir); + +# Cleanup some files/paths that should not be in the backup. There is no +# attempt to all the exclusions done by pg_basebackup here, in part because +# they are not required, but also to keep the test simple. +# +# Also remove pg_control because it needs to be copied later and it will be +# obvious if the copy fails. +unlink("$backup_dir/postmaster.pid") + or BAIL_OUT("unable to unlink $backup_dir/postmaster.pid"); +unlink("$backup_dir/postmaster.opts") + or BAIL_OUT("unable to unlink $backup_dir/postmaster.opts"); +unlink("$backup_dir/global/pg_control") + or BAIL_OUT("unable to unlink $backup_dir/global/pg_control"); + +remove_tree("$backup_dir/pg_wal", {keep_root => 1}) + or BAIL_OUT("unable to unlink contents of $backup_dir/pg_wal"); + +# Create a table that will be used to verify that recovery started at the +# correct location, rather than the location recorded in pg_control +$psql->query_safe("create table canary (id int)"); + +# Advance the checkpoint location in pg_control past the location where the +# backup started. Switch the WAL to make it really ob
Re: Transaction timeout
On Tue, Mar 12, 2024 at 10:28 AM Andrey M. Borodin wrote: > > On 11 Mar 2024, at 16:18, Alexander Korotkov wrote: > > > > I think if checking psql stderr is problematic, checking just logs is > > fine. Could we wait for the relevant log messages one by one with > > $node->wait_for_log() just like 040_standby_failover_slots_sync.pl do? > > PFA version with $node->wait_for_log() I've slightly revised the patch. I'm going to push it if no objections. -- Regards, Alexander Korotkov v5-0001-Add-TAP-tests-for-timeouts.patch Description: Binary data
Re: [PROPOSAL] Skip test citext_utf8 on Windows
On 2024-03-11 Mo 22:50, Thomas Munro wrote: On Tue, Mar 12, 2024 at 2:56 PM Andrew Dunstan wrote: On 2024-03-11 Mo 04:21, Oleg Tselebrovskiy wrote: Greetings, everyone! While running "installchecks" on databases with UTF-8 encoding the test citext_utf8 fails because of Turkish dotted I like this: SELECT 'i'::citext = 'İ'::citext AS t; t --- - t + f (1 row) I tried to replicate the test's results by hand and with any collation that I tried (including --locale="Turkish") this test failed Also an interesing result of my tesing. If you initialize you DB with -E utf-8 --locale="Turkish" and then run select LOWER('İ'); the output will be this: lower --- İ (1 row) Which I find strange since lower() uses collation that was passed (default in this case but still) Wouldn't we be better off finding a Windows fix for this, instead of sweeping it under the rug? Given the sorry state of our Windows locale support, I've started wondering about deleting it and telling users to adopt our nascent built-in support or ICU[1]. This other thread [2] says the sorting is intransitive so I don't think it really meets our needs anyway. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGJhV__g_TJ0jVqPbnTuqT%2B%2BM6KFv2wj%2B9AV-cABNCXN6Q%40mail.gmail.com#bc35c0b88962ff8c24c27aecc1bca72e [2] https://www.postgresql.org/message-id/flat/1407a2c0-062b-4e4c-b728-438fdff5cb07%40manitou-mail.org Makes more sense than just hacking the tests to avoid running them on Windows. (I also didn't much like doing it by parsing the version string, although I know there's at least one precedent for doing that.) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Support json_errdetail in FRONTEND builds
On 2024-03-12 Tu 14:43, Jacob Champion wrote: Hello, Both the incremental JSON [1] and OAuth [2] patchsets would be improved by json_errdetail(), which was removed from FRONTEND builds in b44669b2ca: The routine providing a detailed error message based on the error code is made backend-only, the existing code being unsafe to use in the frontend as the error message may finish by being palloc'd or point to a static string, so there is no way to know if the memory of the message should be pfree'd or not. Attached is a patch to undo this, by attaching any necessary allocations to the JsonLexContext so the caller doesn't have to keep track of them. This is based on the first patch of the OAuth patchset, which additionally needs json_errdetail() to be safe to use from libpq itself. Alvaro pointed out offlist that we don't have to go that far to re-enable this function for the utilities, so this patch is a sort of middle ground between what we have now and what OAuth implements. (There is some additional minimization that could be done to this patch, but I'm hoping to keep the code structure consistent between the two, if the result is acceptable.) Seems reasonable. Two notes that I wanted to point out explicitly: - On the other thread, Daniel contributed a destroyStringInfo() counterpart for makeStringInfo(), which is optional but seemed useful to include here. yeah, although maybe worth a different patch. - After this patch, if a frontend client calls json_errdetail() without calling freeJsonLexContext(), it will leak memory. Not too concerned about this: 1. we tend to be a bit more relaxed about that in frontend programs, especially those not expected to run for long times and especially on error paths, where in many cases the program will just exit anyway. 2. the fix is simple where it's needed. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Improve eviction algorithm in ReorderBuffer
On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada wrote: > > On Fri, Mar 8, 2024 at 12:58 PM Peter Smith wrote: > > ... > > > > 5. > > > > + * > > > > + * If 'indexed' is true, we create a hash table to track of each node's > > > > + * index in the heap, enabling to perform some operations such as > > > > removing > > > > + * the node from the heap. > > > > */ > > > > binaryheap * > > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void > > > > *arg) > > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare, > > > > + bool indexed, void *arg) > > > > > > > > BEFORE > > > > ... enabling to perform some operations such as removing the node from > > > > the heap. > > > > > > > > SUGGESTION > > > > ... to help make operations such as removing nodes more efficient. > > > > > > > > > > But these operations literally require the indexed binary heap as we > > > have an assertion: > > > > > > void > > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d) > > > { > > > bh_nodeidx_entry *ent; > > > > > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property); > > > Assert(heap->bh_indexed); > > > > > > > I didn’t quite understand -- the operations mentioned are "operations > > such as removing the node", but binaryheap_remove_node() also removes > > a node from the heap. So I still felt the comment wording of the patch > > is not quite correct. > > Now I understand your point. That's a valid point. > > > > > Now, if the removal of a node from an indexed heap can *only* be done > > using binaryheap_remove_node_ptr() then: > > - the other removal functions (binaryheap_remove_*) probably need some > > comments to make sure nobody is tempted to call them directly for an > > indexed heap. > > - maybe some refactoring and assertions are needed to ensure those > > *cannot* be called directly for an indexed heap. > > > > If the 'index' is true, the caller can not only use the existing > functions but also newly added functions such as > binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about > something like below? > You said: "can not only use the existing functions but also..." Hmm. Is that right? IIUC those existing "remove" functions should NOT be called directly if the heap was "indexed" because they'll delete the node from the heap OK, but any corresponding index for that deleted node will be left lying around -- i.e. everything gets out of sync. This was the reason for my original concern. > * If 'indexed' is true, we create a hash table to track each node's > * index in the heap, enabling to perform some operations such as > * binaryheap_remove_node_ptr() etc. > Yeah, something like that... I'll wait for the next patch version before commenting further. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Tue, Mar 12, 2024 at 7:34 PM John Naylor wrote: > > On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada wrote: > > > > On Mon, Mar 11, 2024 at 12:20 PM John Naylor > > wrote: > > > > > > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada > > > wrote: > > > > + ts->context = CurrentMemoryContext; > > > > > > As far as I can tell, this member is never accessed again -- am I > > > missing something? > > > > You're right. It was used to re-create the tidstore in the same > > context again while resetting it, but we no longer support the reset > > API. Considering it again, would it be better to allocate the iterator > > struct in the same context as we store the tidstore struct? > > That makes sense. > > > > + /* DSA for tidstore will be detached at the end of session */ > > > > > > No other test module pins the mapping, but that doesn't necessarily > > > mean it's wrong. Is there some advantage over explicitly detaching? > > > > One small benefit of not explicitly detaching dsa_area in > > tidstore_destroy() would be simplicity; IIUC if we want to do that, we > > need to remember the dsa_area using (for example) a static variable, > > and free it if it's non-NULL. I've implemented this idea in the > > attached patch. > > Okay, I don't have a strong preference at this point. I'd keep the update on that. > > > > +-- Add tids in random order. > > > > > > I don't see any randomization here. I do remember adding row_number to > > > remove whitespace in the output, but I don't remember a random order. > > > On that subject, the row_number was an easy trick to avoid extra > > > whitespace, but maybe we should just teach the setting function to > > > return blocknumber rather than null? > > > > Good idea, fixed. > > + test_set_block_offsets > + > + 2147483647 > + 0 > + 4294967294 > + 1 > + 4294967295 > > Hmm, was the earlier comment about randomness referring to this? I'm > not sure what other regression tests do in these cases, or how > relibale this is. If this is a problem we could simply insert this > result into a temp table so it's not output. I didn't address the comment about randomness. I think that we will have both random TIDs tests and fixed TIDs tests in test_tidstore as we discussed, and probably we can do both tests with similar steps; insert TIDs into both a temp table and tidstore and check if the tidstore returned the results as expected by comparing the results to the temp table. Probably we can have a common pl/pgsql function that checks that and raises a WARNING or an ERROR. Given that this is very similar to what we did in test_radixtree, why do we really want to implement it using a pl/pgsql function? When we discussed it before, I found the current way makes sense. But given that we're adding more tests and will add more tests in the future, doing the tests in C will be more maintainable and faster. Also, I think we can do the debug-build array stuff in the test_tidstore code instead. > > > > +Datum > > > +tidstore_create(PG_FUNCTION_ARGS) > > > +{ > > > ... > > > + tidstore = TidStoreCreate(max_bytes, dsa); > > > > > > +Datum > > > +tidstore_set_block_offsets(PG_FUNCTION_ARGS) > > > +{ > > > > > > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs); > > > > > > These names are too similar. Maybe the test module should do > > > s/tidstore_/test_/ or similar. > > > > Agreed. > > Mostly okay, although a couple look a bit generic now. I'll leave it > up to you if you want to tweak things. > > > > In general, the .sql file is still very hard-coded. Functions are > > > created that contain a VALUES statement. Maybe it's okay for now, but > > > wanted to mention it. Ideally, we'd have some randomized tests, > > > without having to display it. That could be in addition to (not > > > replacing) the small tests we have that display input. (see below) > > > > > > > Agreed to add randomized tests in addition to the existing tests. > > I'll try something tomorrow. > > > Sounds a good idea. In fact, if there are some bugs in tidstore, it's > > likely that even initdb would fail in practice. However, it's a very > > good idea that we can test the tidstore anyway with such a check > > without a debug-build array. > > > > Or as another idea, I wonder if we could keep the debug-build array in > > some form. For example, we use the array with the particular build > > flag and set a BF animal for that. That way, we can test the tidstore > > in more real cases. > > I think the purpose of a debug flag is to help developers catch > mistakes. I don't think it's quite useful enough for that. For one, it > has the same 1GB limitation as vacuum's current array. For another, > it'd be a terrible way to debug moving tidbitmap.c from its hash table > to use TID store -- AND/OR operations and lossy pages are pretty much > undoable with a copy of vacuum's array. Valid points. As I mentioned above, if we im
Re: Improve eviction algorithm in ReorderBuffer
On Wed, Mar 13, 2024 at 10:15 AM Peter Smith wrote: > > On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada wrote: > > > > On Fri, Mar 8, 2024 at 12:58 PM Peter Smith wrote: > > > > ... > > > > > 5. > > > > > + * > > > > > + * If 'indexed' is true, we create a hash table to track of each > > > > > node's > > > > > + * index in the heap, enabling to perform some operations such as > > > > > removing > > > > > + * the node from the heap. > > > > > */ > > > > > binaryheap * > > > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, > > > > > void *arg) > > > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare, > > > > > + bool indexed, void *arg) > > > > > > > > > > BEFORE > > > > > ... enabling to perform some operations such as removing the node > > > > > from the heap. > > > > > > > > > > SUGGESTION > > > > > ... to help make operations such as removing nodes more efficient. > > > > > > > > > > > > > But these operations literally require the indexed binary heap as we > > > > have an assertion: > > > > > > > > void > > > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d) > > > > { > > > > bh_nodeidx_entry *ent; > > > > > > > > Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property); > > > > Assert(heap->bh_indexed); > > > > > > > > > > I didn’t quite understand -- the operations mentioned are "operations > > > such as removing the node", but binaryheap_remove_node() also removes > > > a node from the heap. So I still felt the comment wording of the patch > > > is not quite correct. > > > > Now I understand your point. That's a valid point. > > > > > > > > Now, if the removal of a node from an indexed heap can *only* be done > > > using binaryheap_remove_node_ptr() then: > > > - the other removal functions (binaryheap_remove_*) probably need some > > > comments to make sure nobody is tempted to call them directly for an > > > indexed heap. > > > - maybe some refactoring and assertions are needed to ensure those > > > *cannot* be called directly for an indexed heap. > > > > > > > If the 'index' is true, the caller can not only use the existing > > functions but also newly added functions such as > > binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about > > something like below? > > > > You said: "can not only use the existing functions but also..." > > Hmm. Is that right? IIUC those existing "remove" functions should NOT > be called directly if the heap was "indexed" because they'll delete > the node from the heap OK, but any corresponding index for that > deleted node will be left lying around -- i.e. everything gets out of > sync. This was the reason for my original concern. > All existing binaryheap functions should be available even if the binaryheap is 'indexed'. For instance, with the patch, binaryheap_remote_node() is: void binaryheap_remove_node(binaryheap *heap, int n) { int cmp; Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property); Assert(n >= 0 && n < heap->bh_size); /* compare last node to the one that is being removed */ cmp = heap->bh_compare(heap->bh_nodes[--heap->bh_size], heap->bh_nodes[n], heap->bh_arg); /* remove the last node, placing it in the vacated entry */ replace_node(heap, n, heap->bh_nodes[heap->bh_size]); /* sift as needed to preserve the heap property */ if (cmp > 0) sift_up(heap, n); else if (cmp < 0) sift_down(heap, n); } The replace_node(), sift_up() and sift_down() update node's index as well if the binaryheap is indexed. When deleting the node from the binaryheap, it will also delete its index from the hash table. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: CI speed improvements for FreeBSD
On Wed, Mar 13, 2024 at 4:50 AM Maxim Orlov wrote: > I looked at the changes and I liked them. Here are my thoughts: Thanks for looking! Pushed.