Re: Reduce "Var IS [NOT] NULL" quals during constant folding
On Sat, Mar 22, 2025 at 2:21 AM Tom Lane wrote: > Ugh, no, that is *completely* unworkable. Suppose that the user > does CREATE VIEW, and the parse tree recorded for that claims that > column X is not-nullable. Then the user drops the not-null > constraint, and then asks to execute the view. We'll optimize on > the basis of stale information. Thanks for pointing this out. > The way to make this work is what I said before: move the planner's > collection of relation information to somewhere a bit earlier in > the planner. But not to outside the planner. I'm considering moving the collection of attnotnull information before pull_up_sublinks, in hopes of leveraging this info to pull up NOT IN in the future, something like attached. Maybe we could also collect the attgenerated information in the same routine, making life easier for expand_virtual_generated_columns. Another issue I found is that in convert_EXISTS_to_ANY, we pass the parent's root to eval_const_expressions, which can cause problems when reducing "Var IS [NOT] NULL" quals. To fix, v2 patch constructs up a dummy PlannerInfo with "glob" link set to the parent's and "parser" link set to the subquery. I believe these are the only fields used. Thanks Richard v2-0001-Reduce-Var-IS-NOT-NULL-quals-during-constant-folding.patch Description: Binary data
Re: Add Postgres module info
On 3/22/25 23:49, Tom Lane wrote: I spent awhile reviewing the v5 patch, and here's a proposed v6. Some notes: * I didn't like depending on offsetof(Pg_magic_struct, module_extra) to determine which parts of the struct are checked for compatibility. It just seems way too easy to break that with careless insertion of new fields, and such breakage might not cause obvious failures. I think the right thing is to break out the ABI-checking fields as their own sub-struct, rather than breaking out the new fields as a sub-struct. Agree. It is a clear approach. I like it. * I renamed the inquiry function to pg_get_loaded_modules, since it only works on loaded modules but that's hardly clear from the previous name. +1 * It is not clear to me what permission restrictions we should put on pg_get_loaded_modules, ... I vote for the idea of stripping the full path to just a filename. My initial use cases were: 1. User reports the issue and need to provide me all loaded modules at the moment of query execution. Higher privileges needs administrative procedures that is a long way and not all the time possible. 2. A module needs to detect another loaded module - it is not a frequent case so far, but concurrency on queryId with pg_stat_statements is at least one of my examples happening sometimes. Also, permissions here should be in agreement with permissions on pg_available_extensions(), right? * I didn't like anything about the test setup. ... Ok, thanks. I just played with alternatives. Still TBD: * I'm not happy with putting pg_get_loaded_modules into dfmgr.c. It feels like the wrong layer to have a SQL-callable function, and the large expansion in its #include list is evidence that we're adding functionality that doesn't belong there. But I'm not quite sure where to put it instead. Also, the naive way to do that would require exporting DynamicFileList which doesn't feel nice either. Maybe we could make dfmgr.c export some sort of iterator function? I just attempted to reduce number of exported objects here. If it is ok to introduce an iterator, the pg_get_loaded_modules() may live in extension.c * Should we convert our existing modules to use PG_MODULE_MAGIC_EXT? I'm mildly in favor of that, but I think we'd need some automated way to manage their version strings, and I don't know what that ought to look like. Maybe it'd be enough to make all the in-core modules use PG_VERSION as their version string, but I think that might put a dent in the idea of the version strings following semantic versioning rules. Yes, additional burden to bump version string was a stopper for me to propose such a brave idea. -- regards, Andrei Lepikhov
Fix infinite loop from setting scram_iterations to INT_MAX
Hi, I stumbled upon a problem with the scram_iterations GUC where setting scram_iterations to INT_MAX and then creating a user causes the command to hang indefinitely. postgres=# SET scram_iterations=2147483647; SET postgres=# CREATE ROLE maxscram WITH PASSWORD 'forever'; I looked into the relevant code and found the issue. Each SCRAM iteration after the first is done in a loop with the following condition: int i; ... for (i = 2; i <= iterations; i++) { ... } For iterations = INT_MAX, the loop will never terminate since the condition is <= and adding 1 to INT_MAX will lead to i wrapping around to INT_MIN. I've fixed this by modifying the loop condition to be i < iterations. I've attached a patch with the fix. I considered adding a test as well, but since generating a password with a high number of iterations is very time-consuming, I'm not sure if that would be practical. I also tried adding this to the current CommitFest, but my account hasn't passed the cooldown period yet. Thanks, Kevin fix_max_scram_iterations.patch Description: Binary data
Regression test postgres_fdw might fail due to autovacuum
Hello hackers, A recent buildfarm failure [1] with the following diagnostics: 72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 19.04s exit status 1 postgres_fdw-running/regress/results/postgres_fdw.out --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597 + +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 2025-03-14 04:02:32.573999799 + @@ -6392,6 +6392,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 --++-+++++ + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6401,7 +6402,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 | (10 rows) EXPLAIN (verbose, costs off) shows that the UPDATE ... RETURNING query might be unstable due to lack of ORDER BY. I managed to reproduce this failure locally with the attached patch applied to make the test repeatable and with: sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i contrib/postgres_fdw/Makefile (Running 10 instances of the test in parallel eases reproducing as well.) I also added SELECT ctid, * FROM ft2; just after the query in question and found out that the results of the SELECT more unstable, but for the UPDATE to produce the difference, the tuple with c1 == 2010 must reside on a new page. For example: --- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 + +++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 + @@ -6399,6 +6399,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 --++-+++++ + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6408,7 +6409,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 | SELECT ctid, * FROM ft2; @@ -6470,15 +6470,6 @@ (0,103) | 57 | 407 | 00057_update7 | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7 | 7 | foo (0,104) | 67 | 407 | 00067_update7 | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7 | 7 | foo (0,105) | 77 | 407 | 00077_update7 | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7 | 7 | foo - (0,106) | 9 | 509 | 9_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo - (0,107) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo - (0,108) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo - (0,109) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo - (0,110) | 49 | 509 | 00049_update9 | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9 | ft2 | foo - (0,111) | 59 | 509 | 00059_update9 | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9 | ft2 | foo - (0,112) | 69 | 509 | 00069_update9 | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9 | ft2 | foo - (0,113) | 79 | 509 | 00079_update9 | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9 | ft2 | foo - (0,114) | 89 | 509 | 00089_update9 | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9 | ft2 | foo (1,1) | 98 | 8 | 00098 | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8 | 8 | foo (1,3) | 100 | 0 | 00100 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo (1,4) | 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo ... - (12,137) | 2007 | 7 | bar | | | | ft2 | - (12,138) | 2008 | 8 | bar | | | | ft2 | - (12,139) | 2009 | 9 | bar | | | | ft2 | - (12,140) | 2010 | 0 | bar |
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
On 2025/03/21 21:29, torikoshia wrote: Hi, On 2025-03-21 02:15, Fujii Masao wrote: Thanks for your review! Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback. Agreed. - errdetail("Consistent recovery state has not been yet reached."))); + errdetail("Consistent recovery state has not been yet reached, or snappshot is pending because subtransaction is overflowed."), + errhint("In the latter case, find and close the transaction with more than %d subtransactions", PGPROC_MAX_CACHED_SUBXIDS))); This message might be too detailed. Instead, how about simplifying it to something like: "Consistent recovery state has not been reached, or snapshot is not ready for hot standby." Agreed. Do you also think the errhint message is unnecessary? I agree with your idea to add a description of the overflowed subtransaction in the manual, but I'm not sure all users will be able to find it. Some people may not understand what needs to be done to make the snapshot ready for hot standby. I think adding an errhint may help those users. I see your concern that users might overlook the documentation and struggle to find a solution. However, I still believe it's better to include this information in the documentation rather than logging it as a hint. Since the scenario where the hint would be useful is relatively rare, logging it every time might be more confusing than helpful. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Proposal - Allow extensions to set a Plan Identifier
> > but the opposite may be > > not necessarily true: a query ID could be associated with multiple > > plan patterns (aka multiple plan IDs). What this offers is that we > > know about which plan one query is currently using in live, for a > > given query ID. > Okay, as I've said before, it seems practical. I just worry because I > see that people don't understand that queryId is a more or less > arbitrary value that may or may not group queries that differ only by > constants to a single "Query Class". I think the same attitude will be > paid to planId. It is okay for statistics. However, non-statistical > extensions need more guarantees and want to put different logic into > these values. > For now, we have examples of only statistical extensions in open-source > and may live with a proposed solution. statistical/monitoring reasons are an important use case. Detecting plan changes, load attributed to a specific plan, etc. However, I also do see other extensions that could implement a plan_id that has meaning beyond statistical/monitoring purposes. Plan management/enforcement is another use-case. > >> So, it seems that extensions may conflict with the same field. Are we sure > >> that will happen if they are loaded in different orders in neighbouring > >> backends? > > > > Depends on what kind of computation once wants to do, and surely > > without some coordination across the extensions you are using these > > cannot be shared, but it's no different from the concept of a query > > ID. > Hmm, queryId generation code lies in the core and we already came to > terms that this field has only a statistical purpose. Do you want to > commit planId generation code? But, extensions don't necessarily need to rely on the core queryId. They can generate their own queryId. We have it documented for compute_query_id as such [0] "Note that an external module can alternatively be used if the in-core query identifier computation method is not acceptable" [0] https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-COMPUTE-QUERY-ID -- Sami Imseih Amazon Web Services (AWS)
Re: doc patch: wrong descriptions for dropping replication slots
On 2025/03/21 10:07, Hayato Kuroda (Fujitsu) wrote: Dear Fujii-san, Unless there are any objections, I plan to push your patch with the following commit message and back-patch it to all supported versions. ... Thanks for updating the commit message. LGTM. I've committed the patch with that message. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: AIO v2.5
Hi, On 2025-03-22 17:20:56 -0700, Noah Misch wrote: > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote: > > Not sure yet how to best disable testing io_uring in this case. We can't > > just query EXEC_BACKEND from pg_config.h unfortunately. I guess making > > the > > initdb not fail and checking the error log would work, but that doesn't > > work > > nicely with Cluster.pm. > > How about "postgres -c io_method=io_uring -C ": > > --- a/src/test/modules/test_aio/t/001_aio.pl > +++ b/src/test/modules/test_aio/t/001_aio.pl > @@ -29,7 +29,13 @@ $node_worker->stop(); > # Test io_method=io_uring > ### > > -if ($ENV{with_liburing} eq 'yes') > +sub have_io_uring > +{ > + local %ENV = $node_worker->_get_env(); # any node works > + return run_log [qw(postgres -c io_method=io_uring -C io_method)]; > +} > + > +if (have_io_uring()) > { > my $node_uring = create_node('io_uring'); > $node_uring->start(); Yea, that's a good idea. One thing that doesn't seem great is that it requires a prior node - what if we do -c io_method=invalid' that would report the list of valid GUC options, so we could just grep for io_uring? It's too bad that postgres --describe-config a) doesn't report the possible enum values b) doesn't apply/validate -c options > > Subject: [PATCH v2.11 01/27] aio, bufmgr: Comment fixes > > Ready to commit, though other comment fixes might come up in later reviews. I'll reorder it to a bit later in the series, to accumulate a few more. > One idea so far is to comment on valid states after some IoMethodOps > callbacks: > > --- a/src/include/storage/aio_internal.h > +++ b/src/include/storage/aio_internal.h > @@ -310,6 +310,9 @@ typedef struct IoMethodOps > /* >* Start executing passed in IOs. >* > + * Shall advance state to PGAIO_HS_SUBMITTED. (By the time this > returns, > + * other backends might have advanced the state further.) > + * >* Will not be called if ->needs_synchronous_execution() returned true. >* >* num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE. > @@ -321,6 +324,12 @@ typedef struct IoMethodOps > /* >* Wait for the IO to complete. Optional. >* > + * On return, state shall be PGAIO_HS_COMPLETED_IO, > + * PGAIO_HS_COMPLETED_SHARED or PGAIO_HS_COMPLETED_LOCAL. (The callback > + * need not change the state if it's already one of those.) If state is > + * PGAIO_HS_COMPLETED_IO, state will reach PGAIO_HS_COMPLETED_SHARED > + * without further intervention. > + * >* If not provided, it needs to be guaranteed that the IO method calls >* pgaio_io_process_completion() without further interaction by the >* issuing backend. I think these are a good idea. I added those to the copy-edit patch, with a few more tweaks: @@ -315,6 +315,9 @@ typedef struct IoMethodOps /* * Start executing passed in IOs. * + * Shall advance state to at least PGAIO_HS_SUBMITTED. (By the time this + * returns, other backends might have advanced the state further.) + * * Will not be called if ->needs_synchronous_execution() returned true. * * num_staged_ios is <= PGAIO_SUBMIT_BATCH_SIZE. @@ -323,12 +326,24 @@ typedef struct IoMethodOps */ int (*submit) (uint16 num_staged_ios, PgAioHandle **staged_ios); -/* +/* --- * Wait for the IO to complete. Optional. * + * On return, state shall be on of + * - PGAIO_HS_COMPLETED_IO + * - PGAIO_HS_COMPLETED_SHARED + * - PGAIO_HS_COMPLETED_LOCAL + * + * The callback must not block if the handle is already in one of those + * states, or has been reused (see pgaio_io_was_recycled()). If, on + * return, the state is PGAIO_HS_COMPLETED_IO, state will reach + * PGAIO_HS_COMPLETED_SHARED without further intervention by the IO + * method. + * * If not provided, it needs to be guaranteed that the IO method calls * pgaio_io_process_completion() without further interaction by the * issuing backend. + * --- */ void(*wait_one) (PgAioHandle *ioh, uint64 ref_generation); > > Subject: [PATCH v2.11 03/27] Redefine max_files_per_process to control > > additionally opened files > > Ready to commit Cool! > > Subject: [PATCH v2.11 04/27] aio: Add liburing dependency > > > --- a/meson.build > > +++ b/meson.build > > @@ -944,6 +944,18 @@ endif > > > > > > > > +### > > +# Library: liburing > > +### > > + > > +liburingopt = get_option('liburing') > > +liburing = dependency('liburing', required: liburingopt) > > +if liburing.found() > > + cdata.set('USE_LIBURING', 1) > > +endif > > This is a different style from other deps; is it equivalent to our standard > style? Yes - the only reason to
Re: Proposal - Allow extensions to set a Plan Identifier
On 3/23/25 15:56, Sami Imseih wrote: Hmm, queryId generation code lies in the core and we already came to terms that this field has only a statistical purpose. Do you want to commit planId generation code? But, extensions don't necessarily need to rely on the core queryId. They can generate their own queryId. We have it documented for compute_query_id as such [0] "Note that an external module can alternatively be used if the in-core query identifier computation method is not acceptable" In theory they don't, but in practice they must. This mess especially boring because one of problematic extensions at the same time the most popular one - pg_stat_statements. People forget to follow strict instructions about load order of extensions and think it is the extension instability when one of their query plans isn't captured, but should. So, it may be not an issue in a cloud provider predefined installations, but a headache for custom configurations. -- regards, Andrei Lepikhov
Re: BitmapHeapScan streaming read user and prelim refactoring
On Sat, Mar 22, 2025 at 5:04 PM Andres Freund wrote: > > The problem is that sometimes recheck is performed for pending empty > tuples. The comment about empty tuples says: > > /* > * Bitmap is exhausted. Time to emit empty tuples if relevant. We emit > * all empty tuples at the end instead of emitting them per block we > * skip fetching. This is necessary because the streaming read API > * will only return TBMIterateResults for blocks actually fetched. > * When we skip fetching a block, we keep track of how many empty > * tuples to emit at the end of the BitmapHeapScan. We do not recheck > * all NULL tuples. > */ > *recheck = false; > return bscan->rs_empty_tuples_pending > 0; > > But we actually emit empty tuples at each page boundary: > > /* > * Out of range? If so, nothing more to look at on this page > */ > while (hscan->rs_cindex >= hscan->rs_ntuples) > { > /* > * Emit empty tuples before advancing to the next block > */ > if (bscan->rs_empty_tuples_pending > 0) > { > /* > * If we don't have to fetch the tuple, just return > nulls. > */ > ExecStoreAllNullTuple(slot); > bscan->rs_empty_tuples_pending--; > return true; > } > > /* > * Returns false if the bitmap is exhausted and there are no > further > * blocks we need to scan. > */ > if (!BitmapHeapScanNextBlock(scan, recheck, lossy_pages, > exact_pages)) > return false; > } > > So we don't just process tuples at the end of the scan, but at each page > boundary. Whoops! I think an earlier version of the patch did emit all the empty tuples at the end, but I changed the control flow without moving the recheck = false and removing/modifying that comment. Quite embarrassing. > This leads to wrong results whenever there is a rs_empty_tuples_pending > 0 > after a previous page that needed rechecks, because nothing will reset > *recheck = false. > > And indeed, if I add a *recheck = false in that inside the > /* > * Emit empty tuples before advancing to the next block > */ > if (bscan->rs_empty_tuples_pending > 0) > { > the problem goes away. > > > A fix should do slightly more, given that the "at the end" comment is wrong. > > But I'm wondering if it's worth fixing it, or if we should just rip the logic > out alltogether, due to the whole VM checking being unsound as we learned in > the thread I referenced earlie, without even bothering to fix the bug here. Perhaps it is better I just fix it since ripping out the skip fetch optimization has to be backported and even though that will look very different on master than on backbranches, I wonder if people will look for a "clean" commit on master which removes the feature. - Melanie
Re: Fwd: [BUG]: the walsender does not update its IO statistics until it exits
On Wed, Mar 19, 2025 at 04:00:49PM +0800, Xuneng Zhou wrote: > Hi, > Moving the other two provides a more complete view of the settings. For > newcomers(like me) to the codebase, seeing all three related values in one > place helps avoid a narrow view of the settings. > > But I am not sure that I understand the cons of this well. While I don't disagree with the use of a hardcoded interval of time to control timing the flush of the WAL sender stats, do we really need to rely on the timing defined by pgstat.c? Wouldn't it be simpler to assign one in walsender.c and pick up a different, perhaps higher, value? At the end the timestamp calculations are free because we can rely on the existing call of GetCurrentTimestamp() for the physical WAL senders to get an idea of the current time. For the logical WAL senders, perhaps we'd better do the reports in WalSndWaitForWal(), actually. There is also a call to GetCurrentTimestamp() that we can rely on in this path. -- Michael signature.asc Description: PGP signature
Re: Proposal - Allow extensions to set a Plan Identifier
On 3/23/25 01:01, Michael Paquier wrote: On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote: planId actually looks less controversial than queryId or plan_node_id. At the same time, it is not free from the different logic that extensions may incorporate into this value: I can imagine, for example, the attempt of uniquely numbering plans related to the same queryId, plain hashing of the plan tree, hashing without constants, etc. One plan ID should have one single query ID, I'm not sure I understand what do you mean by that. QueryId reflects syntactic structure of the query, but planId - semantics. For example: SELECT relname FROM pg_class JOIN pg_am USING (oid); SELECT relname FROM pg_am JOIN pg_class USING (oid); have different queryIds: -6493293269785547447 and 4243743071053231833 but the same plan: Hash Join Hash Cond: (pg_class.oid = pg_am.oid) -> Seq Scan on pg_class -> Hash -> Seq Scan on pg_am You can invent multiple other cases here - remember query tree transformations we made before optimisation. but the opposite may be not necessarily true: a query ID could be associated with multiple plan patterns (aka multiple plan IDs). What this offers is that we know about which plan one query is currently using in live, for a given query ID. Okay, as I've said before, it seems practical. I just worry because I see that people don't understand that queryId is a more or less arbitrary value that may or may not group queries that differ only by constants to a single "Query Class". I think the same attitude will be paid to planId. It is okay for statistics. However, non-statistical extensions need more guarantees and want to put different logic into these values. For now, we have examples of only statistical extensions in open-source and may live with a proposed solution. So, it seems that extensions may conflict with the same field. Are we sure that will happen if they are loaded in different orders in neighbouring backends? Depends on what kind of computation once wants to do, and surely without some coordination across the extensions you are using these cannot be shared, but it's no different from the concept of a query ID. Hmm, queryId generation code lies in the core and we already came to terms that this field has only a statistical purpose. Do you want to commit planId generation code? -- regards, Andrei Lepikhov
Re: Proposal - Allow extensions to set a Plan Identifier
On 3/23/25 04:22, Sami Imseih wrote: On Sat, Mar 22, 2025 at 11:50:06PM +0100, Andrei Lepikhov wrote: planId actually looks less controversial than queryId or plan_node_id. At the same time, it is not free from the different logic that extensions may incorporate into this value: I can imagine, for example, the attempt of uniquely numbering plans related to the same queryId, plain hashing of the plan tree, hashing without constants, etc. I think plan_node_id is probably the least controversial because that value comes straight from core, and different extensions cannot have their own interpretation of what that value could be. I meant the proposal to let extensions calculate this field. Someone may want node_id to be unique inside the plan; for someone - it should reflect the nature of the node, and it may be the same even inside one plan, etc. In this case, it is highly controversial. Computing a plan_id is even more open for interpretation than it is for query_id perhaps, which is why giving this ability to extensions will be useful. Different extensions may choose to compute a plan_id different ways, but they all should be able to push this value into core, and this is what this patch will allow for. That's why I worry about allowing extensions to compute it. Remember: for now, an extension can't be sure that a conflicting one is not already loaded in the backend. The code [1] may relieve this problem. [1] https://www.postgresql.org/message-id/flat/441661.1742683...@sss.pgh.pa.us#7640020deb6497fe049ac4839eaeb33d -- regards, Andrei Lepikhov
Re: AIO v2.5
Hi, On 2025-03-19 18:11:18 -0700, Noah Misch wrote: > On Wed, Mar 19, 2025 at 06:17:37PM -0400, Andres Freund wrote: > > On 2025-03-19 14:25:30 -0700, Noah Misch wrote: > > Hm, we retry more frequently that that if there are new connections... > > Maybe > > just "try again next time"? > > Works for me. > > > > And these individual lines from "git grep BM_IO_IN_PROGRESS": > > > > > > *i.e at most one BM_IO_IN_PROGRESS bit is set per proc. > > > > > > The last especially. > > > > Huh - yea. This isn't a "new" issue, I think I missed this comment in 16's > > 12f3867f5534. I think the comment can just be deleted? > > Hmm, yes, it's orthogonal to $SUBJECT and deletion works fine. > > > >* I/O already in progress. We already hold BM_IO_IN_PROGRESS > > > for the > > >* only one process at a time can set the BM_IO_IN_PROGRESS bit. > > >* only one process at a time can set the BM_IO_IN_PROGRESS bit. > > > > > For the other three lines and the paragraph, the notion > > > of a process "holding" BM_IO_IN_PROGRESS or being the process to "set" it > > > or > > > being the process "doing a read" becomes less significant when one process > > > starts the IO and another completes it. > > > > Hm. I think they'd be ok as-is, but we can probably improve them. Maybe > > Looking again, I agree they're okay. > > > > > * Now it's safe to write buffer to disk. Note that no one else should > > * have been able to write it while we were busy with log flushing > > because > > * we got the exclusive right to perform I/O by setting the > > * BM_IO_IN_PROGRESS bit. > > That's fine too. Maybe s/perform/stage/ or s/perform/start/. I put these comment changes into their own patch, as it seemed confusing to change them as part of one of the already queued commits. > > > I see this relies on md_readv_complete having converted "result" to > > > blocks. > > > Was there some win from doing that as opposed to doing the division here? > > > Division here ("blocks_read = prior_result.result / BLCKSZ") would feel > > > easier > > > to follow, to me. > > > > It seemed like that would be wrong layering - what if we had an smgr that > > could store data in a compressed format? The raw read would be of a smaller > > size. The smgr API deals in BlockNumbers, only the md.c layer should know > > about bytes. > > I hadn't thought of that. That's a good reason. I thought that was better documented, but alas, it wasn't. How about updating the documentation of smgrstartreadv to the following: /* * smgrstartreadv() -- asynchronous version of smgrreadv() * * This starts an asynchronous readv IO using the IO handle `ioh`. Other than * `ioh` all parameters are the same as smgrreadv(). * * Completion callbacks above smgr will be passed the result as the number of * successfully read blocks if the read [partially] succeeds. This maintains * the abstraction that smgr operates on the level of blocks, rather than * bytes. */ I briefly had a bug in test_aio's injection point that lead to *increasing* the number of bytes successfully read. That triggered an assertion failure in bufmgr.c, but not closer to the problem. Is it worth adding an assert against that to md_readv_complete? Can't quite decide. Greetings, Andres Freund
Re: wrong error message related to unsupported feature
čt 20. 3. 2025 v 22:30 odesílatel Pavel Stehule napsal: > Hi > > út 18. 3. 2025 v 21:33 odesílatel Álvaro Herrera > napsal: > >> On 2025-Mar-18, Pavel Stehule wrote: >> >> > Maybe I found a bug >> > >> > (2025-03-18 19:28:06) postgres=# create table foo(a int constraint gzero >> > check(a > 10) NOT ENFORCED); >> > CREATE TABLE >> > (2025-03-18 19:29:37) postgres=# insert into foo values(0); >> > INSERT 0 1 >> > (2025-03-18 19:29:49) postgres=# insert into foo values(6); >> > INSERT 0 1 >> > (2025-03-18 19:29:55) postgres=# alter table foo alter constraint gzero >> > enforced; >> > ERROR: FOREIGN KEY constraints cannot be marked ENFORCED >> > LINE 1: alter table foo alter constraint gzero enforced; >> > >> > I know so altering enforcing constraint is not supported yet, but the >> error >> > message is surely wrong >> >> Yep, this is a bug all right -- I reported this and related problems a >> few days ago [1]. There's a proposal in that thread for how to fix this >> (see Amul's email [2] and my followup), but I haven't had time to fully >> implement it. If you want to give it a couple of hours to complete it, >> that'd be great. I have a couple of patches that I need to handle >> before coming back to that. >> >> [1] https://postgr.es/m/202503041822.aobpqke3igvb@alvherre.pgsql >> [2] >> https://postgr.es/m/caaj_b97hd-jmts7ajgu6tdbczdx_kyukxg+k-dtymoieg+g...@mail.gmail.com > > > I am looking this issue, and I am not sure if proposed way is the best > > cannot we change processCASbits just like ? > > > if (cas_bits & (CAS_DEFERRABLE | CAS_INITIALLY_DEFERRED)) > { > if (deferrable) > *deferrable = true; > else if (constrType) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > /* translator: %s is CHECK, UNIQUE, or similar */ > constrType ? > errmsg("%s constraints cannot be marked DEFERRABLE", > constrType) : > errmsg("constraint cannot be marked DEFERRABLE"), > parser_errposition(location))); > } > ... > > Probably can be better to not try to read from catalog in this moment, and > then we can accept so we don't know constraint type > something like attached patch Regards Pavel > > Regards > > Pavel > > > > > > > > >> >> -- >> Álvaro Herrera PostgreSQL Developer — >> https://www.EnterpriseDB.com/ >> > diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 271ae26cbaf..7b0e66a8940 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2663,7 +2663,7 @@ alter_table_cmd: n->def = (Node *) c; c->conname = $3; c->alterDeferrability = true; - processCASbits($4, @4, "FOREIGN KEY", + processCASbits($4, @4, NULL, &c->deferrable, &c->initdeferred, NULL, NULL, NULL, yyscanner); @@ -19531,9 +19531,11 @@ processCASbits(int cas_bits, int location, const char *constrType, else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - /* translator: %s is CHECK, UNIQUE, or similar */ + constrType ? + /* translator: %s is CHECK, UNIQUE, TRIGGER, or similar */ errmsg("%s constraints cannot be marked DEFERRABLE", - constrType), + constrType) : + errmsg("constraint cannot be marked DEFERRABLE"), parser_errposition(location))); } @@ -19544,9 +19546,11 @@ processCASbits(int cas_bits, int location, const char *constrType, else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - /* translator: %s is CHECK, UNIQUE, or similar */ + constrType ? + /* translator: %s is CHECK, UNIQUE, TRIGGER, or similar */ errmsg("%s constraints cannot be marked DEFERRABLE", - constrType), + constrType) : + errmsg("constraint cannot be marked DEFERRABLE"), parser_errposition(location))); } @@ -19557,9 +19561,11 @@ processCASbits(int cas_bits, int location, const char *constrType, else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - /* translator: %s is CHECK, UNIQUE, or similar */ + constrType ? + /* translator: %s is CHECK, UNIQUE, TRIGGER, or similar */ errmsg("%s constraints cannot be marked NOT VALID", - constrType), + constrType) : + errmsg("constraint cannot be marked NOT VALID"), parser_errposition(location))); } @@ -19570,9 +19576,11 @@ processCASbits(int cas_bits, int location, const char *constrType, else ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - /* translator: %s is CHECK, UNIQUE, or similar */ + constrType ? + /* translator: %s is CHECK, UNIQUE, TRIGGER, or similar */ errmsg("%s constraints cannot be marked NO INHERIT", - constrType), + constrType) : + errmsg("constraint cannot be marked NO INHERIT"), parser_errposition(location))); } @@ -19583,9 +19591,1
Re: Parallel heap vacuum
On Sat, Mar 22, 2025 at 7:16 AM Melanie Plageman wrote: > > On Thu, Mar 20, 2025 at 4:36 AM Masahiko Sawada wrote: > > > > When testing the multi passes of table vacuuming, I found an issue. > > With the current patch, both leader and parallel workers process stop > > the phase 1 as soon as the shared TidStore size reaches to the limit, > > and then the leader resumes the parallel heap scan after heap > > vacuuming and index vacuuming. Therefore, as I described in the patch, > > one tricky part of this patch is that it's possible that we launch > > fewer workers than the previous time when resuming phase 1 after phase > > 2 and 3. In this case, since the previous parallel workers might have > > already allocated some blocks in their chunk, newly launched workers > > need to take over their parallel scan state. That's why in the patch > > we store workers' ParallelBlockTableScanWorkerData in shared memory. > > However, I found my assumption is wrong; in order to take over the > > previous scan state properly we need to take over not only > > ParallelBlockTableScanWorkerData but also ReadStream data as parallel > > workers might have already queued some blocks for look-ahead in their > > ReadStream. Looking at ReadStream codes, I find that it's not > > realistic to store it into the shared memory. > > It seems like one way to solve this would be to add functionality to > the read stream to unpin the buffers it has in the buffers queue > without trying to continue calling the callback until the stream is > exhausted. > > We have read_stream_reset(), but that is to restart streams that have > already been exhausted. Exhausted streams are where the callback has > returned InvalidBlockNumber. In the read_stream_reset() cases, the > read stream user knows there are more blocks it would like to scan or > that it would like to restart the scan from the beginning. > > Your case is you want to stop trying to exhaust the read stream and > just unpin the remaining buffers. As long as the worker which paused > phase I knows exactly the last block it processed and can communicate > this to whatever worker resumes phase I later, it can initialize > vacrel->current_block to the last block processed. If we use ParallelBlockTableScanDesc with streaming read like the patch did, we would also need to somehow rewind the number of blocks allocated to workers. The problem I had with such usage was that a parallel vacuum worker allocated a new chunk of blocks when doing look-ahead reading and therefore advanced ParallelBlockTableScanDescData.phs_nallocated. In this case, even if we unpin the remaining buffers in the queue by a new functionality and a parallel worker resumes the phase 1 from the last processed block, we would lose some blocks in already allocated chunks unless we rewind ParallelBlockTableScanDescData and ParallelBlockTableScanWorkerData data. However, since a worker might have already allocated multiple chunks it would not be easy to rewind these scan state data. Another idea is that parallel workers don't exit phase 1 until it consumes all pinned buffers in the queue, even if the memory usage of TidStore exceeds the limit. It would need to add new functionality to the read stream to disable the look-ahead reading. Since we could use much memory while processing these buffers, exceeding the memory limit, we can trigger this mode when the memory usage of TidStore reaches 70% of the limit or so. On the other hand, it means that we would not use the streaming read for the blocks in this mode, which is not efficient. > > > One plausible solution would be that we don't use ReadStream in > > parallel heap vacuum cases but directly use > > table_block_parallelscan_xxx() instead. It works but we end up having > > two different scan methods for parallel and non-parallel lazy heap > > scan. I've implemented this idea in the attached v12 patches. > > One question is which scenarios will parallel vacuum phase I without > AIO be faster than read AIO-ified vacuum phase I. Without AIO writes, > I suppose it would be trivial for phase I parallel vacuum to be faster > without using read AIO. But it's worth thinking about the tradeoff. As Andres pointed out, there are major downsides. So we would need to invent a way to stop and resume the read stream in the middle during parallel scan. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Have postgres.bki self-identify
On Thu, Mar 20, 2025 at 3:47 PM David G. Johnston wrote: > While trying to find postgres.bki in my build directory searching for the > file name didn't work because there is no comment in the file containing the > file name; like there is in every other file we write or generate, including > the related *_d.h files. Add it. I'm not a fan of making it a policy that everyone has to do this. I'd rather see us remove filenames from some places where they cause maintenance overhead for little benefit. If somebody wants to find postgres.bki, I guess you can just "find . -name postgres.bki -print" -- Robert Haas EDB: http://www.enterprisedb.com
Re: Proposal - Allow extensions to set a Plan Identifier
On Sun, Mar 23, 2025 at 04:30:04PM +0100, Andrei Lepikhov wrote: > So, it may be not an issue in a cloud provider predefined installations, but > a headache for custom configurations. Sure, I mean, but it's not really related to the issue discussed on this thread, so.. It sounds like we could improve the documentation about the GUCs that hold a list of library names and load them in the order specified, so as we could point somewhere rather that just saying "don't do that". Another thing is extensions that have the idea of not tracking a hook previously registered, and missing to call it. Another one is that the previous hook can be called before a module's look, or after it. Depending on how an out-of-core extension is maintained, there are a lot of things that can be done. -- Michael signature.asc Description: PGP signature
Re: Improve monitoring of shared memory allocations
Hi Bilal, I have a couple of comments, I have only reviewed 0001 so far. > Thank you for reviewing! > > You may need to run pgindent, it makes some changes. > Attached v4-patch has been updated after running pgindent. + * If table is shared, calculate the offset at which to find the > the > + * first partition of elements > + */ > + > +nsegs = compute_buckets_and_segs(nelem, &nbuckets, > hctl->num_partitions, hctl->ssize); > > Blank line between the comment and the code. > Removed this. > /* > * allocate some new elements and link them into the indicated free list > */ > -static bool > -element_alloc(HTAB *hashp, int nelem, int freelist_idx) > +static HASHELEMENT * > +element_alloc(HTAB *hashp, int nelem) > > Comment needs an update. This function no longer links elements into > the free list. > Updated this and few other comments in the attached v4-patch. > > +static int > +compute_buckets_and_segs(long nelem, int *nbuckets, long > num_partitions, long ssize) > +{ > ... > +/* > + * In a partitioned table, nbuckets must be at least equal to > + * num_partitions; were it less, keys with apparently different > partition > + * numbers would map to the same bucket, breaking partition > independence. > + * (Normally nbuckets will be much bigger; this is just a safety > check.) > + */ > +while ((*nbuckets) < num_partitions) > +(*nbuckets) <<= 1; > > I have some worries about this function, I am not sure what I said > below has real life implications as you already said 'Normally > nbuckets will be much bigger; this is just a safety check.'. > > 1- num_partitions is long and nbuckets is int, so could there be any > case where num_partition is bigger than MAX_INT and cause an infinite > loop? > 2- Although we assume both nbuckets and num_partition initialized as > the same type, (*nbuckets) <<= 1 will cause an infinite loop if > num_partition is bigger than MAX_TYPE / 2. > > So I think that the solution is to confirm that num_partition < > MAX_NBUCKETS_TYPE / 2, what do you think? > > Your concern is valid. This has been addressed in the existing code by calling next_pow2_int() on num_partitions before running the function. Additionally, I am not adding any new code to the compute_buckets_and_segs function. I am simply moving part of the init_tab() code into a separate function for reuse. Please find attached the updated and rebased patches. Thank you, Rahila Syed v4-0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch Description: Binary data v4-0001-Account-for-initial-shared-memory-allocated-by-hash_.patch Description: Binary data
Re: Reduce "Var IS [NOT] NULL" quals during constant folding
On Sat, Mar 22, 2025 at 1:12 AM Robert Haas wrote: > However, I'm a bit concerned about the overall premise of the patch > set. It feels like it is moving something that really ought to happen > at optimization time back to parse time. I have a feeling that's going > to break something, although I am not sure right now exactly what. > Wouldn't it be better to have this still happen in the planner, but > sooner than it does now? You're right. It's just flat wrong to collect catalog information in the parser and use it in the planner. As Tom pointed out, the catalog information could change in between, which would cause us to use stale data. Yeah, this should still happen in the planner, perhaps before pull_up_sublinks, if we plan to leverage that info to convert NOT IN to anti-join. Thanks Richard
Re: Make COPY format extendable: Extract COPY TO format implementations
On Wed, Mar 19, 2025 at 6:25 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Wed, 19 Mar 2025 17:49:49 -0700, > "David G. Johnston" wrote: > > >> And could someone help (take over if possible) writing a > >> document for this feature? I'm not good at writing a > >> document in English... 0009 in the attached v37 patch set > >> has a draft of it. It's based on existing documents in > >> doc/src/sgml/ and *.h. > >> > >> > > I haven't touched the innards of the structs aside from changing > > programlisting to synopsis. And redoing the two section opening paragraphs > > to better integrate with the content in the chapter opening. > > > > The rest I kinda went to town on... > > Thanks!!! It's very helpful!!! > > I've applied your patch. 0009 is only changed. Thank you for updating the patches. I've reviewed the main part of supporting the custom COPY format. Here are some random comments: --- +/* + * Process the "format" option. + * + * This function checks whether the option value is a built-in format such as + * "text" and "csv" or not. If the option value isn't a built-in format, this + * function finds a COPY format handler that returns a CopyToRoutine (for + * is_from == false) or CopyFromRountine (for is_from == true). If no COPY + * format handler is found, this function reports an error. + */ I think this comment needs to be updated as the part "If the option value isn't ..." is no longer true. I think we don't necessarily need to create a separate function ProcessCopyOptionFormat for processing the format option. We need more regression tests for handling the given format name. For example, - more various input patterns. - a function with the specified format name exists but it returns an unexpected Node. - looking for a handler function in a different namespace. etc. --- I think that we should accept qualified names too as the format name like tablesample does. That way, different extensions implementing the same format can be used. --- +if (routine == NULL || !IsA(routine, CopyFromRoutine)) +ereport( +ERROR, +(errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " +"%u did not return " +"CopyFromRoutine struct", +opts->handler))); It's not conventional to put a new line between 'ereport(' and 'ERROR' (similarly between 'errcode(' and 'ERRCODE_...'. Also, we don't need to split the error message into multiple lines as it's not long. --- +if (routine == NULL || !IsA(routine, CopyToRoutine)) +ereport( +ERROR, +(errcode( + ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("COPY handler function " +"%u did not return " +"CopyToRoutine struct", +opts->handler))); Same as the above comment. --- + descr => 'pseudo-type for the result of a copy to/from method function', s/method function/format function/ --- +Oidhandler;/* handler function for custom format routine */ 'handler' is used also for built-in formats. --- +static void +CopyFromInFunc(CopyFromState cstate, Oid atttypid, + FmgrInfo *finfo, Oid *typioparam) +{ +ereport(NOTICE, (errmsg("CopyFromInFunc: atttypid=%d", atttypid))); +} OIDs could be changed across major versions even for built-in types. I think it's better to avoid using it for tests. --- +static void +CopyToOneRow(CopyToState cstate, TupleTableSlot *slot) +{ +ereport(NOTICE, (errmsg("CopyToOneRow: tts_nvalid=%u", slot->tts_nvalid))); +} Similar to the above comment, the field name 'tts_nvalid' might also be changed in the future, let's use another name. --- +static const CopyFromRoutine CopyFromRoutineTestCopyFormat = { +.type = T_CopyFromRoutine, +.CopyFromInFunc = CopyFromInFunc, +.CopyFromStart = CopyFromStart, +.CopyFromOneRow = CopyFromOneRow, +.CopyFromEnd = CopyFromEnd, +}; I'd suggest not using the same function names as the fields. --- +/* + * Export CopySendEndOfRow() for extensions. We want to keep + * CopySendEndOfRow() as a static function for + * optimization. CopySendEndOfRow() calls in this file may be optimized by a + * compiler. + */ +void +CopyToStateFlush(CopyToState cstate) +{ +CopySendEndOfRow(cstate); +} Is there any reason to use a different name for public functions? --- +/* + * Export CopyGetData() for extensions. We want to keep CopyGetData() as a + * static function for optimization. CopyGet
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
hi. you may like the attached. it's based on your idea: attnotnullvalid. I came across a case, not sure if it's a bug. CREATE TABLE ttchk (a INTEGER); ALTER TABLE ttchk ADD CONSTRAINT cc check (a is NOT NULL) NOT VALID; CREATE TABLE ttchk_child(a INTEGER) INHERITS(ttchk); ttchk_child's constraint cc will default to valid, but pg_dump && pg_restore will make ttchk_child's constraint invalid. since it's an existing behavior, so not-null constraint will align with it. -the following text is copied from the commit message NOT NULL NOT VALID * TODO: In doc/src/sgml/ref/alter_table.sgml, under the Compatibility section, clarify how the "NOT NULL NOT VALID" syntax conforms with the standard. * TODO: Should CREATE TABLE LIKE copy an existing invalid not-null constraint to the new table, and if so, the new table's not-null will be marked as valid. description entry of pg_attribute.attnotnullvalid: + + + attnotnullvalid bool + + + The not-null constraint validity status of the column. + If true, it means this column has a valid not-null constraint, + false means this column doesn't have a not-null constraint or has an unvalidated one. + If attnotnull is false, this must be false. * attnotnull means that a not-null constraint exists; it doesn't imply anything regarding the constraint being valid or not. attnotnullvalid will indicate whether the constraint is valid; this column can only be true if attnotnull is already true. attnotnullvalid only added to FormData_pg_attribute, didn't add to CompactAttribute. mainly because invalid not-null is not being commonly used. TupleDesc->TupleConstr->has_not_null now also represents invalid not-null constraint. * For table in pg_catalog schema, if that column attnotnull attribute is true, then attnotnullvalid attribute is also true. Similarly, if attnotnull is false, then attnotnullvalid is false. I added an SQL check at the end of src/test/regress/sql/constraints.sql (not sure it's necessary) * CREATE TABLE specifying not valid not-null constraint will be set to valid, a warning is issued within function transformCreateStmt. that means InsertPgAttributeTuples can not insert attribute that is (attnotnull && !attnotnullvalid). I added an Assert in InsertPgAttributeTuples. (also added to other places, to demo i didn't mess something, maybe it's necessary). * table rewrite won't validate invalid not-null constraint, that is aligned with check constraint. * attnotnullvalid mainly changed in these two places: 1. ATAddCheckNNConstraint, if you specified "NOT NULL NOT VALID", it will change it from false to false, but will set attnotnull to true. 2. QueueNNConstraintValidation, subroutine of ATExecValidateConstraint. when validing an not valid not-null constraint, toggle it from false to true, also set attnotnull to true. * A partitioned table can have an invalid NOT NULL constraint while its partitions have a valid one, but not the other way around. but pg_dump/pg_restore may not preserve the constraint name properly, but that's fine for not-null constraint, i think. * regular table invalid not null constraint pg_dump also works fine. From fc4bf954772d25dfbf60774429d875f78e4fd69e Mon Sep 17 00:00:00 2001 From: jian he Date: Mon, 24 Mar 2025 09:21:10 +0800 Subject: [PATCH v5 1/1] NOT NULL NOT VALID MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * TODO: In doc/src/sgml/ref/alter_table.sgml, under the Compatibility section, clarify how the "NOT NULL NOT VALID" syntax conforms with the standard. * TODO: Should CREATE TABLE LIKE copy an existing invalid not-null constraint to the new table, and if so, the new table's not-null will be marked as valid. * attnotnull means that a not-null constraint exists; it doesn't imply anything regarding the constraint being valid or not. attnotnullvalid will indicate whether the constraint is valid; this column can only be true if attnotnull is already true. attnotnullvalid only added to FormData_pg_attribute, didn't add to CompactAttribute. mainly because invalid not-null is not being commonly used. TupleDesc->TupleConstr->has_not_null now also represents invalid not-null constraint. * For table in pg_catalog schema, if that column attnotnull attribute is true, then attnotnullvalid attribute is also true. Similarly, if attnotnull is false, then attnotnullvalid is false. I added an SQL check at the end of src/test/regress/sql/constraints.sql (not sure it's necessary) * CREATE TABLE specifying not valid not-null constraint will be set to valid, a warning is issued within function transformCreateStmt. that means InsertPgAttributeTuples can not insert attribute that is (attnotnull && !attnotnullvalid). I added an Assert in InsertPgAttributeTuples. * table rewrite won't
Re: Add missing tab completion for VACUUM and ANALYZE with ONLY option
vignesh C writes: > On Wed, 19 Mar 2025 at 18:12, David Rowley wrote: >> While VACUUM ONLY on a partitioned table has no effect, the same isn't >> true for inheritance parents. 62ddf7ee9 did change the behaviour of >> VACUUM for these so that vacuuming the inheritance parent now vacuums >> all of its children, unless ONLY is used. So, I'd say if the tab >> completion is only added to ANALYZE, then it'd be incomplete still. >> (I've not looked at the v3 patch to see which of those it handles.) > I also felt it is necessary for both ANALYZE and VACUUM, and the v3 > patch includes changes for both. Agreed. Pushed with some cosmetic adjustments. I made the order of the options match the syntax diagrams in the comments, which is just neatnik-ism (it changes no behavior) but seemed to read better. I also realized that we could simplify the match patterns for the various VACUUM cases by using MatchAnyN --- the existing code was getting rather contorted there. That wasn't something your patch introduced, but might as well fix it while we're here. regards, tom lane
Re: AIO v2.5
commit 247ce06b wrote: > + pgaio_io_reopen(ioh); > + > + /* > + * To be able to exercise the reopen-fails path, allow > injection > + * points to trigger a failure at this point. > + */ > + pgaio_io_call_inj(ioh, "AIO_WORKER_AFTER_REOPEN"); > + > + error_errno = 0; > + error_ioh = NULL; > + > + /* > + * We don't expect this to ever fail with ERROR or > FATAL, no need > + * to keep error_ioh set to the IO. > + * pgaio_io_perform_synchronously() contains a critical > section to > + * ensure we don't accidentally fail. > + */ > + pgaio_io_perform_synchronously(ioh); A CHECK_FOR_INTERRUPTS() could close() the FD that pgaio_io_reopen() callee smgr_aio_reopen() stores. Hence, I think smgrfd() should assert that interrupts are held instead of doing its own HOLD_INTERRUPTS(), and a HOLD_INTERRUPTS() should surround the above region of code. It's likely hard to reproduce a problem, because pgaio_io_call_inj() does nothing in many builds, and pgaio_io_perform_synchronously() starts by entering a critical section. On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote: > Attached v2.11 > Subject: [PATCH v2.11 06/27] aio: Implement support for reads in smgr/md/fd > +int > +FileStartReadV(PgAioHandle *ioh, File file, > +int iovcnt, off_t offset, > +uint32 wait_event_info) > +{ > + int returnCode; > + Vfd*vfdP; > + > + Assert(FileIsValid(file)); > + > + DO_DB(elog(LOG, "FileStartReadV: %d (%s) " INT64_FORMAT " %d", > +file, VfdCache[file].fileName, > +(int64) offset, > +iovcnt)); > + > + returnCode = FileAccess(file); > + if (returnCode < 0) > + return returnCode; > + > + vfdP = &VfdCache[file]; > + > + pgaio_io_prep_readv(ioh, vfdP->fd, iovcnt, offset); FileStartReadV() and pgaio_io_prep_readv() advance the IO to PGAIO_HS_STAGED w/ batch mode, PGAIO_HS_SUBMITTED w/o batch mode. I didn't expect that from functions so named. The "start" verb sounds to me like unconditional PGAIO_HS_SUBMITTED, and the "prep" verb sounds like PGAIO_HS_DEFINED. I like the "stage" verb, because it matches PGAIO_HS_STAGED, and the comment at PGAIO_HS_STAGED succinctly covers what to expect. Hence, I recommend names FileStageReadV, pgaio_io_stage_readv, mdstagereadv, and smgrstageread. How do you see it? > +/* > + * AIO error reporting callback for mdstartreadv(). > + * > + * Errors are encoded as follows: > + * - PgAioResult.error_data != 0 encodes IO that failed with errno != 0 I recommend replacing "errno != 0" with either "that errno" or "errno == error_data". > Subject: [PATCH v2.11 07/27] aio: Add README.md explaining higher level design Ready for commit apart from some trivia: > +if (ioret.result.status == PGAIO_RS_ERROR) > +pgaio_result_report(aio_ret.result, &aio_ret.target_data, ERROR); I think ioret and aio_ret are supposed to be the same object. If that's right, change one of the names. Likewise elsewhere in this file. > +The central API piece for postgres' AIO abstraction are AIO handles. To > +execute an IO one first has to acquire an IO handle (`pgaio_io_acquire()`) > and > +then "defined", i.e. associate an IO operation with the handle. s/"defined"/"define" it/ or similar > +The "solution" to this the ability to associate multiple completion callbacks s/this the/this is the/ > Subject: [PATCH v2.11 08/27] localbuf: Track pincount in BufferDesc as well > @@ -5350,6 +5350,18 @@ ConditionalLockBufferForCleanup(Buffer buffer) > Assert(refcount > 0); > if (refcount != 1) > return false; > + > + /* > + * Check that the AIO subsystem doesn't have a pin. Likely not > + * possible today, but better safe than sorry. > + */ > + bufHdr = GetLocalBufferDescriptor(-buffer - 1); > + buf_state = pg_atomic_read_u32(&bufHdr->state); > + refcount = BUF_STATE_GET_REFCOUNT(buf_state); > + Assert(refcount > 0); > + if (refcount != 1) > + return false; > + LockBufferForCleanup() should get code like this ConditionalLockBufferForCleanup() code, either now or when "not possible today" ends. Currently, it just assumes all local buffers are cleanup-lockable: /* Nobody else to wait for */ if (BufferIsLocal(buffer)) return; > @@ -570,7 +577,13 @@ InvalidateLocalBuffer(BufferDesc *bufHdr, bool > check_unreferenced) > > buf_state = pg_atomic_read_u32(&bufHdr->state); > > - if (check_unrefer
Re: AIO v2.5
On Mon, Mar 24, 2025 at 5:59 AM Andres Freund wrote: > On 2025-03-23 08:55:29 -0700, Noah Misch wrote: > > An IO in PGAIO_HS_STAGED clearly blocks closing the IO's FD, and an IO in > > PGAIO_HS_COMPLETED_IO clearly doesn't block that close. For > > io_method=worker, > > closing in PGAIO_HS_SUBMITTED is okay. For io_method=io_uring, is there a > > reference about it being okay to close during PGAIO_HS_SUBMITTED? I looked > > awhile for an authoritative view on that, but I didn't find one. If we can > > rely on io_uring_submit() returning only after the kernel has given the > > io_uring its own reference to all applicable file descriptors, I expect it's > > okay to close the process's FD. If the io_uring acquires its reference > > later > > than that, I expect we shouldn't close before that later time. > > I'm fairly sure io_uring has its own reference for the file descriptor by the > time io_uring_enter() returns [1]. What io_uring does *not* reliably tolerate > is the issuing process *exiting* before the IO completes, even if there are > other processes attached to the same io_uring instance. It is a bit strange that the documentation doesn't say that explicitly. You can sorta-maybe-kinda infer it from the fact that io_uring didn't originally support cancelling requests at all, maybe a small clue that it also didn't cancel them when you closed the fd :-) The only sane alternative would seem to be that they keep running and have their own reference to the *file* (not the fd), which is the actual case, and might also be inferrable at a stretch from the io_uring_register() documentation that says it reduces overheads with a "long term reference" reducing "per-I/O overhead". (The distant third option/non-option is a sort of late/async binding fd as seen in the Glibc user space POSIX AIO implementation, but that sort of madness doesn't seem to be the sort of thing anyone working in the kernel would entertain for a nanosecond...) Anyway, there are also public discussions involving Mr Axboe that discuss the fact that async operations continue to run when the associated fd is closed, eg from people who were surprised by that when porting stuff from other systems, which might help fill in the documentation gap a teensy bit if people want to see something outside the source code: https://github.com/axboe/liburing/issues/568 > AIO v1 had a posix_aio backend, which, on several platforms, did *not* > tolerate the FD being closed before the IO completes. Because of that > IoMethodOps had a closing_fd callback, which posix_aio used to wait for the > IO's completion [2]. Just for the record while remembering this stuff: Windows is another system that took the cancel-on-close approach, so the Windows IOCP proof-of-concept patches also used that AIO v1 callback and we'll have to think about that again if/when we want to get that stuff going on AIO v2. I recall also speculating that it might be better to teach the vfd system to pick another victim to close instead if an fd was currently tied up with an asynchronous I/O for the benefit of those cancel-on-close systems, hopefully without any happy-path book-keeping. But just submitting staged I/O is a nice and cheap solution for now, without them in the picture.
Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
On Mon, 24 Mar 2025 at 15:23, Michael Paquier wrote: > And here we get 18564.06 (head) vs 16667.92 (patch) so a 10.7% > difference in this run. Hence the automatic addition of NULL to the > jumbling is disappointing here, even if this is what I'd see this as a > worst case scenario, unlikely what one would see for real if they care > about monitoring. Can you share which patch you tested here? There are many patches on this thread and I don't want to make assumptions about which one these are the results for. It sounds like it wasn't your latest patch as you mentioned "automatic addition of NULL to the jumbling". That doesn't sound like the correct description for your latest patch. David
Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
On Fri, 21 Mar 2025 at 22:02, Andrei Lepikhov wrote: > In cases > I have seen before, the main question is how effective was (or maybe) a > Memoize node == how often the incoming key fits the cache. In that case, > the hit ratio fraction is more understandable for a broad audience. > That's why according to my experience in case of a good cache > reusability factor, users are usually okay with increasing the cache > size to the necessary numbers and avoiding evictions at all costs. So, > the predicted evict_ratio also tells us about incrementing work_mem to > enhance the chances of Memoisation. Can you explain why "Estimated Capacity" and "Estimated Distinct Lookup Keys" don't answer that? If there are more distinct lookup keys than there is capacity to store them, then some will be evicted. Once again, I'm not necessarily objecting to hit and evict ratios being shown, I just want to know they're actually useful enough to show and don't just bloat the EXPLAIN output needlessly. So far your arguments aren't convincing me that they are. > Having written the last sentence I came back to the point why work_mem > is so universal and is used at each node as a criteria of memory > allocation size? But it is a different story, I think. We have to set the limit somehow. We could have done this by having a GUC per node type that uses memory, but it looks like something more universal was decided, perhaps to save on GUCs. I don't know the exact history, but once upon a time, sort_mem existed. Perhaps that disappeared because we grew more node types that needed to allocate large, otherwise unbounded amounts of memory. We did more recently grow a hash_mem_multiplier GUC, so it's not true to say that work_mem solely controls the limits of each node's memory allocation sizes. David
Re: Query ID Calculation Fix for DISTINCT / ORDER BY and LIMIT / OFFSET
On Mon, Mar 24, 2025 at 07:42:21PM +1300, David Rowley wrote: > Can you share which patch you tested here? There are many patches on > this thread and I don't want to make assumptions about which one these > are the results for. The patch tested is the latest one that has been posted on this thread, for a comparison of HEAD at 8a3e4011f02d and this 0001 (no 0002 which was forcing a loop to make the jumbling heavier): https://www.postgresql.org/message-id/z9z85ui5lpkpn...@paquier.xyz The message holding the patch tested is the one I've replied to when posting the results shared here: https://www.postgresql.org/message-id/z-db83o5azmgn...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: Add mention in docs about locking all partitions for generic plans
David Rowley 于2025年3月24日周一 05:28写道: > Over in [1], there was some uncertainty about whether locking an > unrelated partition was expected behaviour or not for the particular > use-case which used a generic plan to scan a partitioned table and all > of the partitions. > > I noticed that we don't mention this in the docs and though that > perhaps we should. I thought a good place might be in [2] at the end > of the following paragraph: > > "During initialization of the query plan. Partition pruning can be > performed here for parameter values which are known during the > initialization phase of execution. Partitions which are pruned during > this stage will not show up in the query's EXPLAIN or EXPLAIN ANALYZE. > It is possible to determine the number of partitions which were > removed during this phase by observing the “Subplans Removed” property > in the EXPLAIN output." > > Perhaps adding something like "Because all relations which are part of > the plan are locked at the beginning of execution, any partitions > pruned at this stage are already locked and will remain so until the > end of the transaction.". > > or: > > "It's important to note that any partitions removed by the partition > pruning done at this stage are still locked at the beginning of > execution". > I prefer this. > > This is no longer true in master, so if we do something here it's only > v17 and earlier. > In the case of [1], we still have AccessShareLock on entity_2, even though it is pruned during initial partition pruning. This seems to contradict what you said. "This is no longer true in master" . [1] https://postgr.es/m/01020195b987abd3-a008b77d-8c63-4931-80a4-be36a351c8b2-000...@eu-west-1.amazonses.com -- Thanks, Tender Wang
Re: Commit fest 2025-03
On Mon, 17 Mar 2025 at 09:26, vignesh C wrote: > Here's a quick commitfest status report as of today: status | start | 10th | 17th | 24th +-+-+-+- Needs review: | 198 | 182 | 134 | 120 Waiting on Author: | 37 | 35 |69 |62 Ready for Committer:| 33 | 33 |34 |34 Committed:| 52 | 67 |75 |90 Moved to next CF: | 7 | 7 |11 |14 Withdrawn:| 12 | 15 |15 |18 Rejected: | 1 | 1 | 3 | 3 RWF: | 2 | 2 | 2 | 2 Total: | 343 | 343 | 343| 343 There are currently 34 patches in the "Ready for Committer" status at [1]. If any committers have bandwidth, please consider reviewing and advancing them. If you have submitted a patch that is in the "Waiting for Author" state, please update it to "Needs Review" as soon as possible. This is where reviewers are most likely to focus their efforts. There are 62 patches that are in "Waiting for Author" state at [2]. Additionally, 13 Needs review patches require a rebase due to recent commits at [3]. Patch owners are requested to rebase and submit updated versions. 15 patches have been committed in the last week. [1] - https://commitfest.postgresql.org/52/?status=3 [2] - https://commitfest.postgresql.org/52/?status=2 [3] - https://commitfest.postgresql.org/52/?status=1 Regards, Vignesh
Re: Orphaned users in PG16 and above can only be managed by Superusers
Thank you, Robert and Tom, for sharing your valuable insights, and apologies for the slight delay in my response. From the discussion, what I understand is that we aim to extend the current DROP ROLE syntax to include the CASCADE/RESTRICT option, which has been introduced in the latest SQL standard, as mentioned by Tom. However, as Robert pointed out, implementing the CASCADE option for handling dependent objects that span multiple databases is not feasible at the moment. The RESTRICT option, as I understand it, is already the default behavior. Therefore, we will proceed with implementing the CASCADE/RESTRICT syntax specifically for handling dependent roles, rather than the dependent database objects like tables, views, etc., which can span multiple databases. Please correct me if I’m mistaken or if there’s anything I’ve missed in my understanding. thanks. -- With Regards, Ashutosh Sharma.
Re: Patch: Cover POSITION(bytea,bytea) with tests
On Tue, 18 Mar 2025 at 03:14, Aleksander Alekseev wrote: > Here is the corrected patch. I had a look at this and it all seems good to me. Pushed. David
Re: Parallel safety for extern params
On Sat, Mar 22, 2025 at 1:25 AM Tom Lane wrote: > > Robert Haas writes: > > I'm happy to have you tidy up here in whatever way seems best to you. > > Cool, done. > Thanks for taking care of this, and sorry for not digging deeper to find the appropriate fix. -- With Regards, Amit Kapila.
RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.
Dear Shubham, > I have reviewed and merged the proposed changes into the patch. The > attached patches contain the suggested changes. Thanks for updating the patch! Few comments: 01. ``` + /* +* Fetch all databases from the source (publisher) if --all is specified. +*/ + if (opt.all_dbs) + fetch_source_databases(&opt); + if (opt.database_names.head == NULL) ``` I feel we can convert "if" -> "else if" for the opt.database_names.head case, because fetch_source_databases() ensures databases are listed. 02. ``` +# Set up node U as standby linking to node ``` To confirm: why can we use "U" here? 03. ``` +$node_u->append_conf( + 'postgresql.conf', qq[ +primary_conninfo = '$pconnstr dbname=postgres' +]); ``` I think this part is not required. init_from_backup() with has_streaming sets the primary_conninfo. 04. ``` +# Stop $node_s +$node_s->stop; ``` The comment does not describe anything. I feel you can say "Stop node S because it won't be used anymore", or just remove it. 05. ``` +# Drop one of the databases (e.g., $db2) +$node_p->safe_psql('postgres', "DROP DATABASE \"$db2\""); ``` "e.g." means "for example", so it is not suitable to put here. Please explain why it must be removed. 06. ``` +# Get subscription names +$result = $node_u->safe_psql( + 'postgres', qq( + SELECT subname FROM pg_subscription WHERE subname ~ '^pg_createsubscriber_' +)); +my @subnames1 = split("\n", $result); + +# Wait subscriber to catch up +$node_u->wait_for_subscription_sync($node_p, $subnames1[0]); +$node_u->wait_for_subscription_sync($node_p, $subnames1[1]); ``` wait_for_subscription_sync() is used to wait until the initial synchronization is done, so not suitable here. wait_for_catchup() is more appropriate. 07. Also, the similar part exists on the pre-existing part (wait_for_subscription_sync was used there, but I feel this may not be correct). How about unifing them like attached? 08. ``` +# Verify logical replication works for all databases +# Insert rows on node P +$node_p->safe_psql('postgres', "INSERT INTO tbl1 VALUES('first row')"); ``` This is confusing because 'fourth row' is inserted to $db1 just after it. How about the 'row in database postgres' or something? Best regards, Hayato Kuroda FUJITSU LIMITED kuroda.diffs Description: kuroda.diffs
Re: Doc: fix the rewrite condition when executing ALTER TABLE ADD COLUMN
hi. https://git.postgresql.org/cgit/postgresql.git/commit/?id=11bd8318602fc2282a6201f714c15461dc2009c6 + Adding a column with a volatile DEFAULT + (e.g., clock_timestamp()), a generated column + (e.g., GENERATED BY DEFAULT AS IDENTITY), a domain + data type with constraints will require the entire table and its + indexes to be rewritten, as will changing the type of an existing + column. As an exception, when changing the type of an existing column, + if the USING clause does not change the column + contents and the old type is either binary coercible to the new type + or an unconstrained domain over the new type, a table rewrite is not + needed. In the current development branch, virtual generated columns still do not support the domain. you can not change the generation expression if it contains a check constraint on it. so virtual generated columns don't need rewriting. IMHO, the committed doc description didn't mention this exception. we need some text to cover this exception?
Re: [PoC] Reducing planning time when tables have many partitions
Thank you for addressing those comments. On Mon, 24 Mar 2025 at 12:24, Yuya Watari wrote: > It is true that currently, indexes for EquivalenceMembers do not store > information about base rels. However, the subsequent commit (v35-0004) > introduces indexes for base rels to enable faster RestrictInfo > lookups. Therefore, if we commit the later patch as well, the comment > will remain accurate. What do you think about this? I understand Ashutosh would like to handle the RestrictInfo speedup another way, so there's additional review work to do there to determine the merits of each method and figure out the best method. I'm worried that means we don't get to fix this part for v18 and if that happens and 0002 goes in alone, then we'd be left with a struct with a single field. Maybe you should adjust the patch series and only introduce the new struct in 0004 where it's required. > I agree it's challenging to redesign the union planner at this stage > of the release cycle. For now, I have proposed a workaround solution > (v35-0003, previously v34-0002). Would this workaround be acceptable > for the current release cycle? I think something like that is probably ok. You have a problem with your implementation as you're trying to add the AppendRelInfo once for each child_tlist element rather than once per union child. Can you fix this and incorporate into the 0002 patch please? Here are some more review comments for v35-0002: 1. I don't think the header comment for eclass_member_iterator_next() needs to mention setup_eclass_member_iterator_with_children(). The renaming you did in v35 is meant to make it so the eclass_member_iterator_next and dispose_eclass_member_iterator() functions don't care about what set up the iterator. We might end up with new ones in the future and this seems like a comment that might not get updated when that happens. 2. You should use list_free() in the following: /* * XXX Should we use list_free()? I decided to use this style to take * advantage of speculative execution. */ if (unlikely(it->list_is_copy)) pfree(it->ec_members); The reason is that you're wrongly assuming that calling pfree on the List pointer is enough to get rid of all memory used by the list. The List may have a separately allocated elements[] array (this happens when there's > 5 elements) which you're leaking with the current code. I assume the speculative execution comment is there because you want to omit the "list == NULL" check in list_free_private. Is this measurable, performance-wise? 3. Maybe I'm missing something, but I'm confused about the need for the eclass_indexes_array field in PlannerInfo. This array is indexed by the relid, so why can't we get rid of the array and add a field to RelOptInfo to store the EquivalenceClassIndexes? 4. Could you also please run another set of benchmarks against current master with the the v36 patches: master, master + v36-0001 + 0002, master + v36-0001 + 0002 + 0003 (0003 will be the v34-0004 patch), and then also with v36-0004 (which is the same as v35-0005). The main thing I'd like to understand here is if there's not enough time to get the entire patch set committed, is there much benefit to just having the EquivalenceMember index stuff in by itself without the RestrictInfo changes. David
RE: Fix 035_standby_logical_decoding.pl race conditions
Dear Bertrand, > > SIGSTOP signal for pg_recvlogical may do the idea, > > Yeah, but would we be "really" testing an "active" slot? Yeah, this is also a debatable point. > At the end we want to produce an invalidation that may or not happen on a real > environment. The corner case is in the test, not an issue of the feature to > fix. I also think this is the test-issue, not the codebase. > So, I'm not sure I like the idea that much, but thinking out loud: I wonder if > we could bypass the "active" slot checks in 16 and 17 and use injection > points as > proposed as of 18 (as we need the injection points changes proposed in 0001 > up-thread). Thoughts? I do not have other idea neither. I checked your patch set could solve the issue. Comments for the patch: I'm not sure whether new API is really needed. Isn't it enough to use both injection_points_wakeup() and injection_points_detach()? This approach does not require bumping the version, and can be backported to PG17. Also, another check whether the extension can be installed for the node is required. Please see 041_checkpoint_at_promote.pl. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: AIO v2.5
On Sun, Mar 23, 2025 at 11:11:53AM -0400, Andres Freund wrote: > On 2025-03-22 17:20:56 -0700, Noah Misch wrote: > > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote: > > > Not sure yet how to best disable testing io_uring in this case. We can't > > > just query EXEC_BACKEND from pg_config.h unfortunately. I guess making > > > the > > > initdb not fail and checking the error log would work, but that doesn't > > > work > > > nicely with Cluster.pm. > > > > How about "postgres -c io_method=io_uring -C ": > > > > --- a/src/test/modules/test_aio/t/001_aio.pl > > +++ b/src/test/modules/test_aio/t/001_aio.pl > > @@ -29,7 +29,13 @@ $node_worker->stop(); > > # Test io_method=io_uring > > ### > > > > -if ($ENV{with_liburing} eq 'yes') > > +sub have_io_uring > > +{ > > + local %ENV = $node_worker->_get_env(); # any node works > > + return run_log [qw(postgres -c io_method=io_uring -C io_method)]; > > +} > > + > > +if (have_io_uring()) > > { > > my $node_uring = create_node('io_uring'); > > $node_uring->start(); > > Yea, that's a good idea. > > One thing that doesn't seem great is that it requires a prior node - what if > we do -c io_method=invalid' that would report the list of valid GUC options, > so we could just grep for io_uring? Works for me. > > One idea so far is to comment on valid states after some IoMethodOps > > callbacks: > I think these are a good idea. I added those to the copy-edit patch, with a > few more tweaks: The tweaks made it better. > > > Subject: [PATCH v2.11 04/27] aio: Add liburing dependency > > > + [AC_DEFINE([USE_LIBURING], 1, [Define to build with > > > io_uring support. (--with-liburing)])]) > > > +AC_MSG_RESULT([$with_liburing]) > > > +AC_SUBST(with_liburing) > > > > > > # > > > # UUID library > > > @@ -1463,6 +1471,9 @@ elif test "$with_uuid" = ossp ; then > > > fi > > > AC_SUBST(UUID_LIBS) > > > > > > +if test "$with_liburing" = yes; then > > > + PKG_CHECK_MODULES(LIBURING, liburing) > > > +fi > > > > We usually put this right after the AC_MSG_CHECKING ... AC_SUBST block. > > We don't really seem to do that for "dependency checks" in general, e.g. > PGAC_CHECK_PERL_CONFIGS, PGAC_CHECK_PYTHON_EMBED_SETUP, PGAC_CHECK_READLINE, > dependency dependent AC_CHECK_LIB calls, .. later in configure.ac than the > defnition of the option. AC_CHECK_LIB stays far away, yes. > But you're right that the PKG_CHECK_MODULES calls are closer-by. And I'm happy > to move towards having the code for each dep all in one place, so moved. > > > A related thing: We seem to have no order of the $with_ checks that I can > discern. Should the liburing check be at a different place? No opinion on that one. It's fine. > > This currently has unrelated stuff separating them. Also, with the > > exception of icu, we follow PKG_CHECK_MODULES uses by absorbing flags from > > pkg-config and use AC_CHECK_LIB to add the actual "-l". > > I think for liburing I was trying to follow ICU's example - injecting CFLAGS > and LIBS just in the parts of the build dir that needs them. > > For LIBS I think I did so: > > diff --git a/src/backend/Makefile b/src/backend/Makefile > ... > +# The backend conditionally needs libraries that most executables don't need. > +LIBS += $(LDAP_LIBS_BE) $(ICU_LIBS) $(LIBURING_LIBS) > > But ugh, for some reason I didn't do that for LIBURING_CFLAGS. In the v1.x > version of aio I had > aio:src/backend/storage/aio/Makefile:override CPPFLAGS += $(LIBURING_CFLAGS) > > but somehow lost that somewhere along the way to v2.x > > > I think I like targetting where ${LIB}_LIBS and ${LIB}_CFLAGS are applied more > narrowly better than just adding to the global CFLAGS, CPPFLAGS, LDFLAGS. Agreed. > somewhat inclined to add it LIBURING_CFLAGS in src/backend rather than > src/backend/storage/aio/ though. > > But I'm also willing to do it entirely differently. The CPPFLAGS addition, located wherever makes sense, resolves that point. > > > --- a/doc/src/sgml/installation.sgml > > > +++ b/doc/src/sgml/installation.sgml > > > > lz4 and other deps have a mention in , in > > addition to sections edited here. > > Good point. > > Although once more I feel defeated by the ordering used :) > > Hm, that list is rather incomplete. At least libxml, libxslt, selinux, curl, > uuid, systemd, selinux and bonjour aren't listed. > > Not sure if it makes sense to add liburing, given that? That's a lot of preexisting incompleteness. I withdraw the point about . Unrelated to the above, another question about io_uring: commit da722699 wrote: > +/* > + * Need to submit staged but not yet submitted IOs using the fd, otherwise > + * the IO would end up targeting something bogus. > + */ > +void > +pgaio_closing_fd(int fd) An IO in PGAIO_HS_STAGED clearly blocks closing the IO's FD, and an IO in PGAIO_HS_COMPLETED_IO clearly doesn't block that close. For io_method=worker, closing in PGAIO_HS_SUBMITTED is okay. For io_method=io_uring, is
Re: Fix infinite loop from setting scram_iterations to INT_MAX
On Mon, Mar 24, 2025 at 09:50:36AM +0900, Richard Guo wrote: > Nice catch. The fix looks good to me. It seems to me that it's fine > to go without a test case, since the fix is quite straightforward. One could argue about using an injection point to force trick the iteration loop to be cheaper, but I could not really get excited about the coverage vs the cycles spent for it.. -- Michael signature.asc Description: PGP signature
Re: Fix infinite loop from setting scram_iterations to INT_MAX
On Sun, Mar 23, 2025 at 10:41 PM Kevin K Biju wrote: > int i; > ... > for (i = 2; i <= iterations; i++) > { > ... > } > > For iterations = INT_MAX, the loop will never terminate since the condition > is <= and adding 1 to INT_MAX will lead to i wrapping around to INT_MIN. > > I've fixed this by modifying the loop condition to be i < iterations. I've > attached a patch with the fix. I considered adding a test as well, but since > generating a password with a high number of iterations is very > time-consuming, I'm not sure if that would be practical. Nice catch. The fix looks good to me. It seems to me that it's fine to go without a test case, since the fix is quite straightforward. Thanks Richard
Re: Fix infinite loop from setting scram_iterations to INT_MAX
On Mon, Mar 24, 2025 at 9:54 AM Michael Paquier wrote: > On Mon, Mar 24, 2025 at 09:50:36AM +0900, Richard Guo wrote: > > Nice catch. The fix looks good to me. It seems to me that it's fine > > to go without a test case, since the fix is quite straightforward. > > One could argue about using an injection point to force trick the > iteration loop to be cheaper, but I could not really get excited about > the coverage vs the cycles spent for it.. Exactly. Thanks Richard
Re: Add semi-join pushdown to postgres_fdw
Hi, Alexander! On Tue, Mar 18, 2025 at 6:04 PM Alexander Pyhalov wrote: > This shouldn't. When semi-join is found below left/right join, it's > deparsed as subquery. > Interesting enough, this mechanics (deparsing as subquery) is used > 1) for semi-joins under left/right join, > 2) for full outer joins when inner or outer part has some remote_conds. > > The issue here is that after subquery is deparsed, we don't consider if > its target attributes are available to the upper level > join . As for semi-join itself, all conditions are still deparsed on > left/right join boundary, they are just not propagated further. > This shouldn't be a problem, as they are evaluated in subquery. As for > left/right join without semi-join beneath it - its behavior is not > affected > (as hidden_subquery_rels is empty). Thank you for the explanation. But I have another question. Aren't the checks you've proposed too strict? hidden_subquery_rels are propagated all the way to the join tree. So, pulling conditions would be disables all the way to the join tree too. Is it enough to just disable pulling conditions directly from semi-joins, then their further pulls will be disabled automatically? See the attached patch. It also contains other (mostly cosmetic improvements). -- Regards, Alexander Korotkov Supabase v2-0001-Avoid-pulling-up-restrict-infos-from-subqueries.patch Description: Binary data
Re: [PATCH] SVE popcount support
Looks good, the code is more readable now. > For both Neon and SVE, I do see improvements with looping over 4 > registers at a time, so IMHO it's worth doing so even if it performs the > same as 2-register blocks on some hardware. There was no regression on Graviton 3 when using the 4-register version so can keep it. -Chiranmoy
Re: Proposal - Allow extensions to set a Plan Identifier
On Sat, Mar 22, 2025 at 10:22:37PM -0500, Sami Imseih wrote: > I think plan_node_id is probably the least controversial because that value > comes straight from core, and different extensions cannot have their own > interpretation of what that value could be. Depends. An extension can plug in what they want. The point is that the key used to identify a single plan is up to what extensions think is relevant in a plan. That's heavily subject to interpretation. What's not really subject to interpretation is that an extension cannot know it should set and/or use as key identifier without something that some portion pf the code structures knows about, or these extensions have an inter-dependency. Anyway, there are also the arguments about the set timing, reset timing, the extended protocol argument, etc. So I've applied the patch for now, to start with something. > FWIW, Lukas did start a Wiki [0] to open the discussion for what parts > of the plan should be used to compute a plan_id, and maybe we can > in the future compite a plan_id in core by default. Let's see where this leads.. I suspect that this is going to take some time, assuming that we're ever able to settle on a clear definition. Perhaps we will, or perhaps we will not. -- Michael signature.asc Description: PGP signature