Re: broken JIT support on Fedora 40
On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > Yep, I think this is it. I've spent some hours trying to understand why > > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > > this explains it and the proposed change fixes the crash. One thing that > > > is still not clear to me though is why this copied attribute doesn't > > > show up in the bitcode dumped right before running inline pass (I've > > > added this to troubleshoot the issue). > > > > One thing to consider in this context is indeed adding "verify" pass as > > suggested in the PR, at least for the debugging configuration. Without the > > fix > > it immediately returns: > > > > Running analysis: VerifierAnalysis on deform_0_1 > > Attribute 'noundef' applied to incompatible type! > > > > llvm error: Broken function found, compilation aborted! > > Here is what I have in mind. Interestingly enough, it also shows few > more errors besides "noundef": > > Intrinsic name not mangled correctly for type arguments! Should be: > llvm.lifetime.end.p0 > ptr @llvm.lifetime.end.p0i8 > > It refers to the function from create_LifetimeEnd, not sure how > important is this. Would it be too slow to run the verify pass always, in assertion builds? Here's a patch for the original issue, and a patch to try that idea + a fix for that other complaint it spits out. The latter would only run for LLVM 17+, but that seems OK. From 57af42d9a1b47b7361c7200a17a210f2ca37bd5d Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 9 Apr 2024 18:10:30 +1200 Subject: [PATCH 1/2] Fix illegal attribute LLVM attribute propagation. Commit 72559438 started copying more attributes from AttributeTemplate to the functions we generate on the fly. In the case of deform functions, which return void, this meant that "noundef", from AttributeTemplate's return value (a Datum) was copied to a void type. Older LLVM releases were OK with that, but LLVM 18 crashes. "noundef" is rejected for void by the verify pass, which would have alerted us to this problem (something we'll consider checking automatically in a later commit). Update our llvm_copy_attributes() function to skip copying the attribute for the return value, if the target function returns void. Thanks to Dmitry Dolgov for help chasing this down. Back-patch to all supported releases, like 72559438. Reported-by: Pavel Stehule Reviewed-by: Dmitry Dolgov <9erthali...@gmail.com> Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index ec0fdd49324..92b4993a98a 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -552,8 +552,11 @@ llvm_copy_attributes(LLVMValueRef v_from, LLVMValueRef v_to) /* copy function attributes */ llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeFunctionIndex); - /* and the return value attributes */ - llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + if (LLVMGetTypeKind(LLVMGetFunctionReturnType(v_to)) != LLVMVoidTypeKind) + { + /* and the return value attributes */ + llvm_copy_attributes_at_index(v_from, v_to, LLVMAttributeReturnIndex); + } /* and each function parameter's attribute */ param_count = LLVMCountParams(v_from); -- 2.44.0 From 91d8b747686aa07341337e1cd0b7c2244e955adb Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 9 Apr 2024 18:57:48 +1200 Subject: [PATCH 2/2] Run LLVM verify pass on all generated IR. We fixed a recent problem that crashed LLVM 18, but Dmitry pointed out that we'd have known about that all along if we'd automatically run the verify pass on the IR we generate. Turn that on in assertion builds. That reveals one other complaint about a name, which is also fixed here. Suggested-by: Dmitry Dolgov <9erthali...@gmail.com> Discussion: https://postgr.es/m/CAFj8pRACpVFr7LMdVYENUkScG5FCYMZDDdSGNU-tch%2Bw98OxYg%40mail.gmail.com --- src/backend/jit/llvm/llvmjit.c | 11 +-- src/backend/jit/llvm/llvmjit_expr.c | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c index 92b4993a98a..bc429e970f2 100644 --- a/src/backend/jit/llvm/llvmjit.c +++ b/src/backend/jit/llvm/llvmjit.c @@ -703,10 +703,17 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module) LLVMErrorRef err; const char *passes; + /* In assertion builds, run the LLVM verify pass. */ +#ifdef USE_ASSERT_CHECKING +#define PASSES_PREFIX "verify," +#else +#define PASSES_PREFIX "" +#endif + if (context->base.flags & PGJIT_OPT3) - passes = "default"; + passes = PASSES_PREFIX "default"; else - passes = "default,mem2reg"; + passes = PASSES_PREFIX "default,mem2reg"; options = LLVMCreatePassBuilderOptions(); diff --git a
Re: Commitfest Manager for March
> On 1 Apr 2024, at 09:05, Andrey M. Borodin wrote: > > As of April 1 there are 97 committed patches. Hello everyone! March Commitfest is closed. ~40 CF entries were committed since April 1st. Despite some drawbacks discussed in nearby threads, its still huge amount of work and significant progress for the project. Thanks to everyone involved! The number is approximate, because in some cases I could not clearly determine status. I pinged authors to do so. 26 entries are RWF\Rejected\Withdrawn. Michael noted that I moved to next CF some entries that wait for the author more then couple of weeks. I'm going to revise WoA items in 2024-07 and RwF some of them to reduce CF bloat. If authors are still interested in continuing work of returned items, they are free to provide an answer and to change status back to Needs Review. I've removed all "target version = 17" attribute and switched statuses of a lot of entries. I'm afraid there might be some errors. If I determined status of your patch erroneously, please accept my apologise and correct the error. Thanks! Best regards, Andrey Borodin.
Re: NLS doesn't work for pg_combinebackup
At Tue, 9 Apr 2024 15:00:27 +0900, Michael Paquier wrote in > I've checked the whole tree, and the two you are pointing at are the > only incorrect paths. So applied, thanks! Thank you for cross-checking and committing! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
sql/json remaining issue
hi. ` | NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS ( json_table_column [, ...] ) NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS ( json_table_column [, ...] ) ` "json_path_specification" should be "path_expression"? -- drop table s1; create or replace function random_text_1000() returns text as $$select string_agg(md5(random()::text),'') from generate_Series(1,1000) s $$ LANGUAGE SQL; create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb); insert into s1(js) select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g || ']},{"z22": [32, 204,145]}]},"c": ' || g || ',"id": "' || random_text_1000() || '"}') from generate_series(1_000_000, 1_000_000) g; insert into s1(js) select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g || ']},{"z22": [32, 204,145]}]},"c": ' || g || ',"id": "' || random_text_1000() || '"}') from generate_series(235, 235 + 20,1) g; select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1; count | pg_size_pretty + 22 | 6398 MB -- explain(analyze, costs off,buffers, timing) SELECT sub.*, s.a as s_a FROM s, (values(23)) x(x), generate_series(13, 13) y, JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y COLUMNS ( xx1 int PATH '$.c', NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS (c int PATH '$')), NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'), NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x" -2))' as z1 COLUMNS (a int PATH '$')), NESTED PATH '$.a.za[1]' COLUMNS (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int PATH '$ ? (@ <= ($"x" + 76))' default -1000 ON EMPTY)) )) sub; for one jsonb, it can expand to 7 rows, the above query will return around 1.4 million rows. i use the above query, and pg_log_backend_memory_contexts in another session to check the memory usage. didn't find memory over consumed issue. but I am not sure this is the right way to check the memory consumption. -- begin; SELECT sub.*, s.a as s_a FROM s, (values(23)) x(x), generate_series(13, 13) y, JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y COLUMNS ( xx1 int PATH '$.c', NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS (c int PATH '$')), NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'), NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x" -2))' as z1 COLUMNS (a int PATH '$')), NESTED PATH '$.a.za[1]' COLUMNS (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int PATH '$ ? (@ <= ($"x" + 76))' error ON EMPTY)) )) sub; rollback; only the last row will fail, because of "error ON EMPTY", (1_000_000 <= 23 + 76) is false. I remember the very previous patch, because of error cleanup, it took a lot of resources. does our current implementation, only the very last row fail, will it be easy to clean up the transaction? the last query error message is: ` ERROR: no SQL/JSON item ` we are in ExecEvalJsonExprPath, can we output it to be: ` ERROR: after applying json_path "5s", no SQL/JSON item found ` in a json_table query, we can have multiple path_expressions, like the above query. it's not easy to know applying which path_expression failed.
Add notes to pg_combinebackup docs
Hello, While doing some work/research on the new incremental backup feature some limitations were not listed in the docs. Mainly the fact that pg_combienbackup works with plain format and not tar. Around the same time, Tomas Vondra tested incremental backups with a cluster where he enabled checksums after taking the previous full backup. After combining the backups the synthetic backup had pages with checksums and other pages without checksums which ended in checksum errors. I've attached two patches, the first one is just neat-picking things I found when I first read the docs. The second has a note on the two limitations listed above. The limitation on incremental backups of a cluster that had checksums enabled after the previous backup, I was not sure if that should go in pg_basebackup or pg_combienbackup reference documentation. Or maybe somewhere else. Kind regards, Martín -- Martín Marqués It’s not that I have something to hide, it’s that I have nothing I want you to see From eda9f0c811ba115edf47b4f81200073a41d10cc3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= Date: Sat, 6 Apr 2024 19:30:23 +0200 Subject: [PATCH 1/2] Remove unneeded wording in pg_combinebackup documentation --- doc/src/sgml/ref/pg_combinebackup.sgml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml index 658e9a759c..19b6d159ce 100644 --- a/doc/src/sgml/ref/pg_combinebackup.sgml +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -37,10 +37,10 @@ PostgreSQL documentation - Specify all of the required backups on the command line from oldest to newest. + Specify all required backups on the command line from oldest to newest. That is, the first backup directory should be the path to the full backup, and the last should be the path to the final incremental backup - that you wish to restore. The reconstructed backup will be written to the + you wish to restore. The reconstructed backup will be written to the output directory specified by the -o option. @@ -48,7 +48,7 @@ PostgreSQL documentation Although pg_combinebackup will attempt to verify that the backups you specify form a legal backup chain from which a correct full backup can be reconstructed, it is not designed to help you keep track - of which backups depend on which other backups. If you remove the one or + of which backups depend on which other backups. If you remove one or more of the previous backups upon which your incremental backup relies, you will not be able to restore it. -- 2.39.3 From 0fc5ea63d7a2700ea841c56dc766a11d8f4182ff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mart=C3=ADn=20Marqu=C3=A9s?= Date: Tue, 9 Apr 2024 09:34:21 +0200 Subject: [PATCH 2/2] Add note of restrictions for combining incremental backups When taking incremental backups the user must be warned that the backup format has to be plain for pg_combinebackup to work properly. Another thing to consider is if a cluster had checksums enabled after the previous backup, an incremental backup will yield a possible valid cluster but with files from the previous backup that don't have checksums, giving checksum errors when replying subsequent changes to those blocks. This behavior was brought up by Tomas Vondra while testing. --- doc/src/sgml/ref/pg_combinebackup.sgml | 21 + 1 file changed, 21 insertions(+) diff --git a/doc/src/sgml/ref/pg_combinebackup.sgml b/doc/src/sgml/ref/pg_combinebackup.sgml index 19b6d159ce..1cafc0ab07 100644 --- a/doc/src/sgml/ref/pg_combinebackup.sgml +++ b/doc/src/sgml/ref/pg_combinebackup.sgml @@ -60,6 +60,27 @@ PostgreSQL documentation be specified on the command line in lieu of the chain of backups from which it was reconstructed. + + + Note that there are limitations in combining backups: + + + + pg_combinebackup works with plain format only. + In order to combine backups in tar format, they need to be untar first. + + + + + If an incremental backup is taken from a cluster where checksums were enabled + after the reference backup finished, the resulting data may be valid, but + the checksums wouldn't validate for files from the reference backup. + In case of enabling checksums on an existing cluster, the next backup must be + a full backup. + + + + -- 2.39.3
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Apr 8, 2024 at 7:26 PM John Naylor wrote: > > I pushed both of these and see that mylodon complains that anonymous > unions are a C11 feature. I'm not actually sure that the union with > uintptr_t is actually needed, though, since that's not accessed as > such here. The simplest thing seems to get rid if the union and name > the inner struct "header", as in the attached. I pushed this with some comment adjustments.
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On Mon, 8 Apr 2024 at 20:15, Tomas Vondra wrote: > > > On 4/8/24 17:48, Matthias van de Meent wrote: >> On Mon, 8 Apr 2024 at 17:21, Tomas Vondra >> wrote: >>> >>> Maybe it'd be better to start by expanding the existing rule about not >>> committing patches introduced for the first time in the last CF. >> >> I don't think adding more hurdles about what to include into the next >> release is a good solution. Why the March CF, and not earlier? Or >> later? How about unregistered patches? Changes to the docs? Bug fixes? >> The March CF already has a submission deadline of "before march", so >> that already puts a soft limit on the patches considered for the april >> deadline. >> >>> What if >>> we said that patches in the last CF must not go through significant >>> changes, and if they do it'd mean the patch is moved to the next CF? >> >> I also think there is already a big issue with a lack of interest in >> getting existing patches from non-committers committed, reducing the >> set of patches that could be considered is just cheating the numbers >> and discouraging people from contributing. For one, I know I have >> motivation issues keeping up with reviewing other people's patches >> when none (well, few, as of this CF) of my patches get reviewed >> materially and committed. I don't see how shrinking the window of >> opportunity for significant review from 9 to 7 months is going to help >> there. >> > > I 100% understand how frustrating the lack of progress can be, and I > agree we need to do better. I tried to move a number of stuck patches > this CF, and I hope (and plan) to do more of this in the future. Thanks in advance. > But I don't quite see how would this rule modification change the > situation for non-committers. AFAIK we already have the rule that > (complex) patches should not appear in the last CF for the first time, > and I'd argue that a substantial rework of a complex patch is not that > far from a completely new patch. Sure, the reworks may be based on a > thorough review, so there's a lot of nuance. But still, is it possible > to properly review if it gets reworked at the very end of the CF? Possible? Probably, if there is a good shared understanding of why the previous patch version's approach didn't work well, and if the next approach is well understood as well. But it's not likely, that I'll agree with. But the main issue I have with your suggestion is that the March commitfest effectively contains all new patches starting from January, and the leftovers not committed by February. If we start banning all new patches and those with significant reworks from the March commitfest, then combined with the lack of CF in May there wouldn't be any chance for new patches in the first half of the year, and an effective block on rewrites for 6 months- the next CF is only in July. Sure, there is a bit of leeway there, as some patches get committed before the commitfest they're registered in is marked active, but our development workflow is already quite hostile to incidental contributor's patches [^0], and increasing the periods in which authors shouldn't expect their patches to be reviewed due to a major release that's planned in >5 months is probably not going to help the case. > > So, I think that'd be counter-productive, as this would get the > > perverse incentive to band-aid over (architectural) issues to limit > > churn inside the patch, rather than fix those issues in a way that's > > appropriate for the project as a whole. > > > > Surely those architectural shortcomings should be identified in a review > - which however requires time to do properly, and thus is an argument > for ensuring there's enough time for such review (which is in direct > conflict with the last-minute crush, IMO). > > Once again, I 100% agree we need to do better in handling patches from > non-committers, absolutely no argument there. But does it require this > last-minute crush? I don't think so, it can't be at the expense of > proper review and getting it right. Agreed on this, 100%, but as mentioned above, the March commitfest contains more than just "last minute crushes" [^1]. I don't think we should throw out the baby with the bathwater here. > A complex patch needs to be > submitted early in the cycle, not in the last CF. If it's submitted > early, but does not receive enough interest/reviews, I think we need to > fix & improve that - not to rework/push it at the very end. Agree on all this, too. -Matthias [^0] (see e.g. the EXPLAIN SERIALIZE patch thread [0], where the original author did not have the time capacity to maintain the patchset over months: https://www.postgresql.org/message-id/ca0adb0e-fa4e-c37e-1cd7-91170b18c...@gmx.de [^1] Are there metrics on how many of the committed patches this CF were new, only registered this CF, and if they're more or less distributed to the feature freeze when compared to longer-running patchess? I can probably build these statistics by extracting the dat
Re: Add notes to pg_combinebackup docs
On 4/9/24 09:59, Martín Marqués wrote: > Hello, > > While doing some work/research on the new incremental backup feature > some limitations were not listed in the docs. Mainly the fact that > pg_combienbackup works with plain format and not tar. > Right. The docs mostly imply this by talking about output directory and backup directories, but making it more explicit would not hurt. FWIW it'd be great if we could make incremental backups work with tar format in the future too. People probably don't want to keep around the expanded data directory or extract everything before combining the backups is not very convenient. Reading and writing the tar would make this simpler. > Around the same time, Tomas Vondra tested incremental backups with a > cluster where he enabled checksums after taking the previous full > backup. After combining the backups the synthetic backup had pages > with checksums and other pages without checksums which ended in > checksum errors. > I'm not sure just documenting this limitation is sufficient. We can't make the incremental backups work in this case (it's as if someone messes with cluster without writing stuff into WAL), but I think we should do better than silently producing (seemingly) corrupted backups. I say seemingly, because the backup is actually fine, the only problem is it has checksums enabled in the controlfile, but the pages from the full backup (and the early incremental backups) have no checksums. What we could do is detect this in pg_combinebackup, and either just disable checksums with a warning and hint to maybe enable them again. Or maybe just print that the user needs to disable them. I was thinking maybe we could detect this while taking the backups, and force taking a full backup if checksums got enabled since the last backup. But we can't do that because we only have the manifest from the last backup, and the manifest does not include info about checksums. It's a bit unfortunate we don't have a way to enable checksums online. That'd not have this problem IIRC, because it writes proper WAL. Maybe it's time to revive that idea ... I recall there were some concerns about tracking progress to allow resuming stuff, but maybe not having anything because in some (rare?) cases it'd need to do more work does not seem like a great trade off. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Incorrect handling of IS [NOT] NULL quals on inheritance parents
In b262ad440e we introduced an optimization that drops IS NOT NULL quals on a NOT NULL column, and reduces IS NULL quals on a NOT NULL column to constant-FALSE. I happened to notice that this is not working correctly for traditional inheritance parents. Traditional inheritance parents might have NOT NULL constraints marked NO INHERIT, while their child tables do not have NOT NULL constraints. In such a case, we would have problems when we have removed redundant IS NOT NULL restriction clauses of the parent rel, as this could cause NULL values from child tables to not be filtered out, or when we have reduced IS NULL restriction clauses of the parent rel to constant-FALSE, as this could cause NULL values from child tables to not be selected out. As an example, consider create table p (a int); create table c () inherits (p); alter table only p alter a set not null; insert into c values (null); -- The IS NOT NULL qual is droped, causing the NULL value from 'c' to -- not be filtered out explain (costs off) select * from p where a is not null; QUERY PLAN - Append -> Seq Scan on p p_1 -> Seq Scan on c p_2 (3 rows) select * from p where a is not null; a --- (1 row) -- The IS NULL qual is reduced to constant-FALSE, causing the NULL value -- from 'c' to not be selected out explain (costs off) select * from p where a is null; QUERY PLAN -- Result One-Time Filter: false (2 rows) select * from p where a is null; a --- (0 rows) To fix this issue, I think we can avoid calculating notnullattnums for inheritance parents in get_relation_info(). Meanwhile, when we populate childrel's base restriction quals from parent rel's quals, we check if each qual can be proven always false/true, to apply the optimization we have in b262ad440e to each child. Something like attached. This can also be beneficial to partitioned tables in cases where the parent table does not have NOT NULL constraints, while some of its child tables do. Previously, the optimization introduced in b262ad440e was not applicable in this case. With this change, the optimization can now be applied to each child table that has the right NOT NULL constraints. Thoughts? Thanks Richard v1-0001-Fix-handling-of-IS-NOT-NULL-quals-on-inheritance-parents.patch Description: Binary data
Re: broken JIT support on Fedora 40
> On Tue, Apr 09, 2024 at 07:07:58PM +1200, Thomas Munro wrote: > On Sat, Apr 6, 2024 at 5:01 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > > Yep, I think this is it. I've spent some hours trying to understand why > > > > suddenly deform function has noundef ret attribute, when it shouldn't -- > > > > this explains it and the proposed change fixes the crash. One thing that > > > > is still not clear to me though is why this copied attribute doesn't > > > > show up in the bitcode dumped right before running inline pass (I've > > > > added this to troubleshoot the issue). > > > > > > One thing to consider in this context is indeed adding "verify" pass as > > > suggested in the PR, at least for the debugging configuration. Without > > > the fix > > > it immediately returns: > > > > > > Running analysis: VerifierAnalysis on deform_0_1 > > > Attribute 'noundef' applied to incompatible type! > > > > > > llvm error: Broken function found, compilation aborted! > > > > Here is what I have in mind. Interestingly enough, it also shows few > > more errors besides "noundef": > > > > Intrinsic name not mangled correctly for type arguments! Should be: > > llvm.lifetime.end.p0 > > ptr @llvm.lifetime.end.p0i8 > > > > It refers to the function from create_LifetimeEnd, not sure how > > important is this. > > Would it be too slow to run the verify pass always, in assertion > builds? Here's a patch for the original issue, and a patch to try > that idea + a fix for that other complaint it spits out. The latter > would only run for LLVM 17+, but that seems OK. Sounds like a good idea. About the overhead, I've done a quick test on the reproducer at hands, doing explain analyze in a tight loop and fetching "optimization" timinigs. It gives quite visible difference 96ms p99 with verify vs 46ms p99 without verify (and a rather low stddev, ~1.5ms). But I can imagine it's acceptable for a build with assertions. Btw, I've found there is a C-api for this exposed, which produces the same warnings for me. Maybe it would be even better this way: /** * Toggle adding the VerifierPass for the PassBuilder, ensuring all functions * inside the module is valid. */ void LLVMPassBuilderOptionsSetVerifyEach(LLVMPassBuilderOptionsRef Options, LLVMBool VerifyEach); + /* In assertion builds, run the LLVM verify pass. */ +#ifdef USE_ASSERT_CHECKING + LLVMPassBuilderOptionsSetVerifyEach(options, true); +#endif
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On 4/9/24 11:25, Matthias van de Meent wrote: > On Mon, 8 Apr 2024 at 20:15, Tomas Vondra > wrote: >> >> >> On 4/8/24 17:48, Matthias van de Meent wrote: >>> On Mon, 8 Apr 2024 at 17:21, Tomas Vondra >>> wrote: Maybe it'd be better to start by expanding the existing rule about not committing patches introduced for the first time in the last CF. >>> >>> I don't think adding more hurdles about what to include into the next >>> release is a good solution. Why the March CF, and not earlier? Or >>> later? How about unregistered patches? Changes to the docs? Bug fixes? >>> The March CF already has a submission deadline of "before march", so >>> that already puts a soft limit on the patches considered for the april >>> deadline. >>> What if we said that patches in the last CF must not go through significant changes, and if they do it'd mean the patch is moved to the next CF? >>> >>> I also think there is already a big issue with a lack of interest in >>> getting existing patches from non-committers committed, reducing the >>> set of patches that could be considered is just cheating the numbers >>> and discouraging people from contributing. For one, I know I have >>> motivation issues keeping up with reviewing other people's patches >>> when none (well, few, as of this CF) of my patches get reviewed >>> materially and committed. I don't see how shrinking the window of >>> opportunity for significant review from 9 to 7 months is going to help >>> there. >>> >> >> I 100% understand how frustrating the lack of progress can be, and I >> agree we need to do better. I tried to move a number of stuck patches >> this CF, and I hope (and plan) to do more of this in the future. > > Thanks in advance. > >> But I don't quite see how would this rule modification change the >> situation for non-committers. AFAIK we already have the rule that >> (complex) patches should not appear in the last CF for the first time, >> and I'd argue that a substantial rework of a complex patch is not that >> far from a completely new patch. Sure, the reworks may be based on a >> thorough review, so there's a lot of nuance. But still, is it possible >> to properly review if it gets reworked at the very end of the CF? > > Possible? Probably, if there is a good shared understanding of why the > previous patch version's approach didn't work well, and if the next > approach is well understood as well. But it's not likely, that I'll > agree with. > > But the main issue I have with your suggestion is that the March > commitfest effectively contains all new patches starting from January, > and the leftovers not committed by February. If we start banning all > new patches and those with significant reworks from the March > commitfest, then combined with the lack of CF in May there wouldn't be > any chance for new patches in the first half of the year, and an > effective block on rewrites for 6 months- the next CF is only in July. > Sure, there is a bit of leeway there, as some patches get committed > before the commitfest they're registered in is marked active, but our > development workflow is already quite hostile to incidental > contributor's patches [^0], and increasing the periods in which > authors shouldn't expect their patches to be reviewed due to a major > release that's planned in >5 months is probably not going to help the > case. > But I don't think I suggested to ban such patches from the March CF entirely. Surely those patches can still be submitted, reviewed, and even reworked in the last CF. All I said is it might be better to treat those patches as not committable by default. Sorry if that wasn't clear enough ... Would this be an additional restriction? I'm not quite sure. Surely if the intent is to only commit patches that we agree are in "sufficiently good" shape, and getting them into such shape requires time & reviews, this would not be a significant change. FWIW I'm not a huge fan of hard unbreakable rules, so there should be some leeway when justified, but the bar would be somewhat higher (clear consensus, RMT having a chance to say no, ...). >>> So, I think that'd be counter-productive, as this would get the >>> perverse incentive to band-aid over (architectural) issues to limit >>> churn inside the patch, rather than fix those issues in a way that's >>> appropriate for the project as a whole. >>> >> >> Surely those architectural shortcomings should be identified in a review >> - which however requires time to do properly, and thus is an argument >> for ensuring there's enough time for such review (which is in direct >> conflict with the last-minute crush, IMO). >> >> Once again, I 100% agree we need to do better in handling patches from >> non-committers, absolutely no argument there. But does it require this >> last-minute crush? I don't think so, it can't be at the expense of >> proper review and getting it right. > > Agreed on this, 100%, but as mentioned above, the March commitfest
RE: Is this a problem in GenericXLogFinish()?
Dear Michael, > On Fri, Apr 05, 2024 at 06:22:58AM +, Hayato Kuroda (Fujitsu) wrote: > > Thanks for pointing out. Yes, we fixed a behavior by the commit aa5edbe37, > > but we missed the redo case. I made a fix patch based on the suggestion [1]. > > + boolmod_buf = false; > > Perhaps you could name that mod_wbuf, to be consistent with the WAL > insert path. Right, fixed. > I'm slightly annoyed by the fact that there is no check on > is_prim_bucket_same_wrt for buffer 1 in the BLK_NEEDS_REDO case to > show the symmetry between the insert and replay paths. Shouldn't > there be at least an assert for that in the branch when there are no > tuples (imagine an else to cover xldata->ntups == 0)? I mean between > just before updating the hash page's opaque area when > is_prev_bucket_same_wrt. Indeed, added an Assert() in else part. Was it same as your expectation? > I've been thinking about ways to make such cases detectable in an > automated fashion. The best choice would be > verifyBackupPageConsistency(), just after RestoreBlockImage() on the > copy of the block image stored in WAL before applying the page masking > which would mask the LSN. A problem, unfortunately, is that we would > need to transfer the knowledge of REGBUF_NO_CHANGE as a new BKPIMAGE_* > flag so we would be able to track if the block rebuilt from the record > has the *same* LSN as the copy used for the consistency check. So > this edge consistency case would come at a cost, I am afraid, and the > 8 bits of bimg_info are precious :/ I could not decide that the change has more benefit than its cost, so I did nothing for it. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ v2_add_modbuf_flag.diff Description: v2_add_modbuf_flag.diff
Re: Add trim_trailing_whitespace to editorconfig file
On Thu, 4 Apr 2024 at 16:58, Jelte Fennema-Nio wrote: > > ISTM that with a small shell script, .editorconfig could be generated > > from .gitattributes? > > Honestly, I don't think building such automation is worth the effort. Okay, I spent the time to add a script to generate the editorconfig based on .gitattributes after all. So attached is a patch that adds that. v5-0001-Add-script-to-keep-.editorconfig-in-sync-with-.gi.patch Description: Binary data
Parallel Recovery in PostgreSQL
Hi, Hackers I have observed that there has been a paucity of discussion concerning the parallel replay of WAL logs. The most recent discourse on this subject was during the PGCon event in 2023, where it was noted that PostgreSQL utilizes a single process for WAL replay. However, when configuring primary-secondary replication, there exists a tangible scenario where the primary accumulates an excessive backlog that the secondary cannot replay promptly. This situation prompts me to question whether it is pertinent to consider integrating a parallel replay feature. Such a functionality could potentially mitigate the risk of delayed WAL application on replicas and enhance overall system resilience and performance. I am keen to hear your thoughts on this issue and whether you share the view that parallel WAL replay is a necessity that we should collaboratively explore further. Thank you for your attention to this matter. Best regards, David Fan
Re: post-freeze damage control
On 4/9/24 01:33, Michael Paquier wrote: > On Tue, Apr 09, 2024 at 01:16:02AM +0200, Tomas Vondra wrote: >> I don't feel too particularly worried about this. Yes, backups are super >> important because it's often the only thing you have left when things go >> wrong, and the incremental aspect is all new. The code I've seen while >> doing the CoW-related patches seemed very precise and careful, and the >> one bug we found & fixed does not make it bad. >> >> Sure, I can't rule there being more bugs, but I've been doing some >> pretty extensive stress testing of this (doing incremental backups + >> combinebackup, and comparing the results against the source, and that >> sort of stuff). And so far only that single bug this way. I'm still >> doing this randomized stress testing, with more and more complex >> workloads etc. and I'll let keep doing that for a while. >> >> Maybe I'm a bit too happy-go-lucky, but IMO the risk here is limited. > > Even if there's a critical bug, there are still other ways to take > backups, so there is an exit route even if a problem is found and even > if this problem requires a complex solution to be able to work > correctly. > I think it's a bit more nuanced, because it's about backups/restore. The bug might be subtle, and you won't learn about it until the moment when you need to restore (or perhaps even long after that). At which point "You might have taken the backup in some other way." is not really a viable exit route. Anyway, I'm still not worried about this particular feature, and I'll keep doing the stress testing. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: sql/json remaining issue
Hi, On Tue, Apr 9, 2024 at 4:47 PM jian he wrote: > > hi. > ` > | NESTED [ PATH ] json_path_specification [ AS json_path_name ] > COLUMNS ( json_table_column [, ...] ) > NESTED [ PATH ] json_path_specification [ AS json_path_name ] COLUMNS > ( json_table_column [, ...] ) > ` > "json_path_specification" should be "path_expression"? > -- > drop table s1; > create or replace function random_text_1000() returns text > as $$select string_agg(md5(random()::text),'') from > generate_Series(1,1000) s $$ LANGUAGE SQL; > > create unlogged table s1(a int GENERATED BY DEFAULT AS IDENTITY, js jsonb); > insert into s1(js) > select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g > || ']},{"z22": [32, 204,145]}]},"c": ' || g > || ',"id": "' || random_text_1000() || '"}') > from generate_series(1_000_000, 1_000_000) g; > insert into s1(js) > select jsonb ('{"a":{"za":[{"z1": [11,]},{"z21": [22, 234,' || g > || ']},{"z22": [32, 204,145]}]},"c": ' || g > || ',"id": "' || random_text_1000() || '"}') > from generate_series(235, 235 + 20,1) g; > > select count(*), pg_size_pretty(pg_total_relation_size('s1')) from s1; > count | pg_size_pretty > + > 22 | 6398 MB > -- > > explain(analyze, costs off,buffers, timing) > SELECT sub.*, s.a as s_a FROM s, > (values(23)) x(x), > generate_series(13, 13) y, > JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y > COLUMNS ( > xx1 int PATH '$.c', > NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS > (c int PATH '$')), > NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'), > NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x" > -2))' as z1 COLUMNS (a int PATH '$')), > NESTED PATH '$.a.za[1]' COLUMNS > (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int > PATH '$ ? (@ <= ($"x" + 76))' default -1000 ON EMPTY)) > )) sub; > > for one jsonb, it can expand to 7 rows, the above query will return > around 1.4 million rows. > i use the above query, and pg_log_backend_memory_contexts in another > session to check the memory usage. > didn't find memory over consumed issue. > > but I am not sure this is the right way to check the memory consumption. > -- > begin; > SELECT sub.*, s.a as s_a FROM s, > (values(23)) x(x), > generate_series(13, 13) y, > JSON_TABLE(js, '$' AS c1 PASSING x AS x, y AS y > COLUMNS ( > xx1 int PATH '$.c', > NESTED PATH '$.a.za[2]' COLUMNS (NESTED PATH '$.z22[*]' as z22 COLUMNS > (c int PATH '$')), > NESTED PATH '$.a.za[1]' COLUMNS (d json PATH '$ ? (@.z21[*] == ($"x" -1))'), > NESTED PATH '$.a.za[0]' COLUMNS (NESTED PATH '$.z1[*] ? (@ >= ($"x" > -2))' as z1 COLUMNS (a int PATH '$')), > NESTED PATH '$.a.za[1]' COLUMNS > (NESTED PATH '$.z21[*] ? (@ >= ($"y" +121))' as z21 COLUMNS (b int > PATH '$ ? (@ <= ($"x" + 76))' error ON EMPTY)) > )) sub; > rollback; > > only the last row will fail, because of "error ON EMPTY", (1_000_000 > <= 23 + 76) is false. > I remember the very previous patch, because of error cleanup, it took > a lot of resources. > does our current implementation, only the very last row fail, will it > be easy to clean up the transaction? I am not sure I understand your concern. Could you please rephrase it? Which previous patch are you referring to and what problem did it cause with respect to error cleanup? Per-row memory allocated for each successful output row JSON_TABLE() doesn't pile up, because it's allocated in a context that is reset after evaluating each row; see tfuncLoadRows(). But again I may be misunderstanding your concern. > the last query error message is: > ` > ERROR: no SQL/JSON item > ` > > we are in ExecEvalJsonExprPath, can we output it to be: > ` > ERROR: after applying json_path "5s", no SQL/JSON item found > ` > in a json_table query, we can have multiple path_expressions, like the > above query. > it's not easy to know applying which path_expression failed. Hmm, I'm not so sure about mentioning the details of the path because path names are optional and printing path expression itself is not a good idea. Perhaps, we could mention the column name which would always be there, but we'd then need to add a new field column_name that's optionally set to JsonFuncExpr and JsonExpr, that is, when they are being set up for JSON_TABLE() columns. As shown in the attached. With the patch you'll get: ERROR: no SQL/JSON item found for column "b" -- Thanks, Amit Langote v1-0001-JSON_TABLE-mention-column-name-in-the-ON-EMPTY-er.patch Description: Binary data
Re: NumericShort vs NumericLong format
On Tue, Mar 07, 2023 at 12:15:27AM -0500, Tom Lane wrote: > "David G. Johnston" writes: > > As an aside, for anyone more fluent than I who reads this, is the use of > > the word "dynamic scale" in this code comment supposed to be "display > > scale"? > > https://github.com/postgres/postgres/blob/cf96907aadca454c4094819c2ecddee07eafe203/src/backend/utils/adt/numeric.c#L121 > > Yeah, I think you're right. Familiarizing myself with numeric.c today, I too was confused by this. AFAICT, it's meant to say display scale as used elsewhere in the file; for instance, the comment for NumericShort's n_header reads "Sign + display scale + weight". Would it be appropriate if I submitted a patch for this? It's admittedly trivial, but I figured I should say hi before submitting one. All the best, Ole -- Ole Peder Brandtzæg | En KLST/ITK-hybrid Please don't look at me with those eyes Please don't hint that you're capable of lies
Re: WIP Incremental JSON Parser
On 2024-04-09 Tu 01:23, Michael Paquier wrote: On Tue, Apr 09, 2024 at 09:48:18AM +0900, Michael Paquier wrote: There is no direct check on test_json_parser_perf.c, either, only a custom rule in the Makefile without specifying something for meson. So it looks like you could do short execution check in a TAP test, at least. Not adding a test for that was deliberate - any sane test takes a while, and I didn't want to spend that much time on it every time someone runs "make check-world" or equivalent. However, adding a test to run it with a trivial number of iterations seems reasonable, so I'll add that. I'll also add a meson target for the binary. While reading the code, I've noticed a few things, as well: + /* max delicious line length is less than this */ + charbuff[6001]; Delicious applies to the contents, nothing to do with this code even if I'd like to think that these lines of code are edible and good. Using a hardcoded limit for some JSON input sounds like a bad idea to me even if this is a test module. Comment that applies to both the perf and the incremental tools. You could use a #define'd buffer size for readability rather than assuming this value in many places. The comment is a remnant of when I hadn't yet added support for incomplete tokens, and so I had to parse the input line by line. I agree it can go, and we can use a manifest constant for the buffer size. +++ b/src/test/modules/test_json_parser/test_json_parser_incremental.c + * This progam tests incremental parsing of json. The input is fed into + * full range of incement handling, especially in the lexer, is exercised. +++ b/src/test/modules/test_json_parser/test_json_parser_perf.c + *Performancet est program for both flavors of the JSON parser + * This progam tests either the standard (recursive descent) JSON parser +++ b/src/test/modules/test_json_parser/README + reads in a file and pases it in very small chunks (60 bytes at a time) to Collection of typos across various files. Will fix. (The older I get the more typos I seem to make and the harder it is to notice them. It's very annoying.) + appendStringInfoString(&json, "1+23 trailing junk"); What's the purpose here? Perhaps the intention should be documented in a comment? The purpose is to ensure that if there is not a trailing '\0' on the json chunk the parser will still do the right thing. I'll add a comment to that effect. At the end, having a way to generate JSON blobs randomly to test this stuff would be more appealing than what you have currently, with perhaps a few factors like: - Array and simple object density. - Max Level of nesting. - Limitation to ASCII characters that can be escaped. - Perhaps more things I cannot think about? No, I disagree. Maybe we need those things as well, but we do need a static test where we can test the output against known results. I have no objection to changing the input and output files. It's worth noting that t/002_inline.pl does generate some input and test e.g., the maximum nesting levels among other errors. Perhaps you missed that. If you think we need more tests there adding them would be extremely simple. So the current state of things is kind of disappointing, and the size of the data set added to the tree is not that portable either if you want to test various scenarios depending on the data set. It seems to me that this has been committed too hastily and that this is not ready for integration, even if that's just a test module. Tom also has shared some concerns upthread, as far as I can see. I have posted a patch already that addresses the issue Tom raised. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Flushing large data immediately in pqcomm
Em seg., 8 de abr. de 2024 às 09:27, Jelte Fennema-Nio escreveu: > On Sun, 7 Apr 2024 at 11:34, David Rowley wrote: > > That seems to require modifying the following function signatures: > > secure_write(), be_tls_write(), be_gssapi_write(). That's not an area > > I'm familiar with, however. > > Attached is a new patchset where 0003 does exactly that. The only > place where we need to cast to non-const is for GSS, but that seems > fine (commit message has more details). > +1. Looks ok to me. The GSS pointer *ptr, is already cast to char * where it is needed, so the code is already correct. best regards, Ranier Vilela
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On 2024-04-08 Mo 19:26, Tom Lane wrote: Andrew Dunstan writes: I quite like the triage idea. But I think there's also a case for being more a bit more flexible with those patches we don't throw out. A case close to my heart: I'd have been very sad if the NESTED piece of JSON_TABLE hadn't made the cut, which it did with a few hours to spare, and I would not have been alone, far from it. I'd have been happy to give Amit a few more days or a week if he needed it, for a significant headline feature. I know there will be those who say it's the thin end of the wedge and rulez is rulez, but this is my view. You've certainly been around the project long enough to remember the times in the past when we let the schedule slip for "one more big patch". It just about never worked out well, so I'm definitely in favor of a hard deadline. The trick is to control the tendency to push in patches that are only almost-ready in order to nominally meet the deadline. (I don't pretend to be immune from that temptation myself, but I think I resisted it better than some this year.) If we want to change how things are working I suspect we probably need something a lot more radical than any of the suggestions I've seen floating around. I don't know what that might be, but ISTM we're not thinking boldly enough. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Parallel Recovery in PostgreSQL
On Tue, 9 Apr 2024 at 13:12, 范润泽 wrote: > > Hi, Hackers > I have observed that there has been a paucity of discussion concerning the > parallel replay of WAL logs. > The most recent discourse on this subject was during the PGCon event in 2023, > where it was noted that PostgreSQL utilizes a single process for WAL replay. > However, when configuring primary-secondary replication, there exists a > tangible scenario where the primary accumulates an excessive backlog that the > secondary cannot replay promptly. > This situation prompts me to question whether it is pertinent to consider > integrating a parallel replay feature. > Such a functionality could potentially mitigate the risk of delayed WAL > application on replicas and enhance overall system resilience and performance. > I am keen to hear your thoughts on this issue and whether you share the view > that parallel WAL replay is a necessity that we should collaboratively > explore further. I think we should definitely explore this further, yes. Note that parallel WAL replay is not the only thing we can improve here: A good part of WAL redo time is spent in reading and validating the WAL records. If we can move that part to a parallel worker/thread, that could probably improve the throughput by a good margin. Then there is another issue with parallel recovery that I also called out at PGCon 2023: You can't just reorder WAL records in a simple page- or relation-based parallelization approach. Indexes often contain transient information, of which replay isn't easily parallelizable with other relation's redo due to their dependencies on other relation's pages while gauranteeing correct query results. E.g., the index insertion of page 2 in one backend's operations order { 1) Page 1: heap insertion, 2) Page 2: index insert, 3) commit } cannot be reordered to before the heap page insertion at 1, because that'd allow a concurrent index access after replay of 2), but before replay of 1), to see an "all-visible" page 1 in its index scan, while the heap tuple of the index entry wasn't even inserted yet. Index-only scans could thus return invalid results. See also the wiki page [0] on parallel recovery, and Koichi-san's repository [1] with his code for parallel replay based on PG 14.6. Kind regards, Matthias van de Meent [0] https://wiki.postgresql.org/wiki/Parallel_Recovery [1] https://github.com/koichi-szk/postgres/commits/parallel_replay_14_6/
Re: Security lessons from liblzma
On 4/8/24 22:57, Bruce Momjian wrote: On Sat, Mar 30, 2024 at 04:50:26PM -0400, Robert Haas wrote: An awful lot of what we do operates on the principle that we know the people who are involved and trust them, and I'm glad we do trust them, but the world is full of people who trusted somebody too much and regretted it afterwards. The fact that we have many committers rather than a single maintainer probably reduces risk at least as far as the source repository is concerned, because there are more people paying attention to potentially notice something that isn't as it should be. One unwritten requirement for committers is that we are able to communicate with them securely. If we cannot do that, they potentially could be forced by others, e.g., governments, to add code to our repositories. Unfortunately, there is on good way for them to communicate with us securely once they are unable to communicate with us securely. I suppose some special word could be used to communicate that status --- that is how it was done in non-electronic communication in the past. I don't know how that really helps. If one of our committers is under duress, they probably cannot risk outing themselves anyway. The best defense, IMHO, is the fact that our source code is open and can be reviewed freely. The trick is to get folks to do the review. I know, for example, at $past_employer we had a requirement to get someone on our staff to review every single commit in order to maintain certain certifications. Of course there is no guarantee that such reviews would catch everything, but maybe we could establish post commit reviews by contributors in a more rigorous way? Granted, getting more qualified volunteers is not a trivial problem... -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 7:24 AM Tomas Vondra wrote: > I think it's a bit more nuanced, because it's about backups/restore. The > bug might be subtle, and you won't learn about it until the moment when > you need to restore (or perhaps even long after that). At which point > "You might have taken the backup in some other way." is not really a > viable exit route. > > Anyway, I'm still not worried about this particular feature, and I'll > keep doing the stress testing. In all sincerity, I appreciate the endorsement. Basically what's been scaring me about this feature is the possibility that there's some incurable design flaw that I've managed to completely miss. If it has some more garden-variety bugs, that's still pretty bad: people will potentially lose data and be unable to get it back. But, as long as we're able to find the bugs and fix them, the situation should improve over time until, hopefully, everybody trusts it roughly as much as we trust, say, crash recovery. Perhaps even a bit more: I think this code is much better-written than our crash recovery code, which has grown into a giant snarl that nobody seems able to untangle, despite multiple refactoring attempts. However, if there's some reason why the approach is fundamentally unsound which I and others have failed to detect, then we're at risk of shipping a feature that is irretrievably broken. That would really suck. I'm fairly hopeful that there is no such design defect: I certainly can't think of one. But, it's much easier to imagine an incurable problem here than with, say, the recent pruning+freezing changes. Those changes might have bugs, and those bugs might be hard to find, but if they do exist and are found, they can be fixed. Here, it's a little less obvious that that's true. We're relying on our ability, at incremental backup time, to sort out from the manifest and the WAL summaries, what needs to be included in the backup in order for a subsequent pg_combinebackup operation to produce correct results. The basic idea is simple enough, but the details are complicated, and it feels like a subtle defect in the algorithm could potentially scuttle the whole thing. I'd certainly appreciate having more smart people try to think of things that I might have overlooked. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Synchronizing slots from primary to standby
On Thursday, April 4, 2024 4:25 PM Masahiko Sawada wrote: Hi, > On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila > wrote: > > > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila > wrote: > > > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > > wrote: > > > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > > > +if (DecodingContextReady(ctx) && > found_consistent_snapshot) > > > > +*found_consistent_snapshot = true; > > > > > > > > Can the found_consistent_snapshot be checked first to help avoid > > > > the function call DecodingContextReady() for > > > > pg_replication_slot_advance callers? > > > > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > > message. I'll push this patch after some more testing. > > > > > > > Pushed! > > While testing this change, I realized that it could happen that the server > logs > are flooded with the following logical decoding logs that are written every > 200 > ms: Thanks for reporting! > > 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.270 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: There are no running > transactions. > 2024-04-04 16:15:19.477 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.477 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: There are no running > transactions. > > For example, I could reproduce it with the following steps: > > 1. create the primary and start. > 2. run "pgbench -i -s 100" on the primary. > 3. run pg_basebackup to create the standby. > 4. configure slotsync setup on the standby and start. > 5. create a publication for all tables on the primary. > 6. create the subscriber and start. > 7. run "pgbench -i -Idtpf" on the subscriber. > 8. create a subscription on the subscriber (initial data copy will start). > > The logical decoding logs were written every 200 ms during the initial data > synchronization. > > Looking at the new changes for update_local_synced_slot(): ... > We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn, > restart_lsn, and catalog_xmin is different between the remote slot and the > local > slot. In my test case, during the initial sync performing, only catalog_xmin > was > different and there was no serialized snapshot at restart_lsn, and the > slotsync > worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot > properties were changed even after the function and it set slot_updated = > true. > So it starts the next slot synchronization after 200ms. I was trying to reproduce this and check why the catalog_xmin is different among synced slot and remote slot, but I was not able to reproduce the case where there are lots of logical decoding logs. The script I used is attached. Would it be possible for you to share the script you used to reproduce this issue? Alternatively, could you please share the log files from both the primary and standby servers after reproducing the problem (it would be greatly helpful if you could set the log level to DEBUG2). Best Regards, Hou zj test.sh Description: test.sh
Re: post-freeze damage control
On 9/4/2024 12:55, Tom Lane wrote: Andrei Lepikhov writes: * I really, really dislike jamming this logic into prepqual.c, where it has no business being. I note that it was shoved into process_duplicate_ors without even the courtesy of expanding the header comment: Yeah, I preferred to do it in parse_expr.c with the assumption of some 'minimal' or 'canonical' tree form. That seems quite the wrong direction to me. AFAICS, the argument for making this transformation depends on being able to convert to an indexscan condition, so I would try to apply it even later, when we have a set of restriction conditions to apply to a particular baserel. (This would weaken the argument that we need hashing rather than naive equal() tests even further, I think.) Applying the transform to join quals seems unlikely to be a win. Our first prototype did this job right at the stage of index path creation. Unfortunately, this approach was too narrow and expensive. The most problematic cases we encountered were from BitmapOr paths: if an incoming query has a significant number of OR clauses, the optimizer spends a lot of time generating these, in most cases, senseless paths (remember also memory allocated for that purpose). Imagine how much worse the situation becomes when we scale it with partitions. Another issue we resolved with this transformation: shorter list of clauses speeds up planning and, sometimes, makes cardinality estimation more accurate. Moreover, it helps even SeqScan: attempting to find a value in the hashed array is much faster than cycling a long-expression on each incoming tuple. One more idea that I have set aside here is that the planner can utilize quick clause hashing: From time to time, in the mailing list, I see disputes on different approaches to expression transformation/simplification/grouping, and most of the time, it ends up with the problem of search complexity. Clause hash can be a way to solve this, can't it? -- regards, Andrei Lepikhov Postgres Professional
Re: post-freeze damage control
Hi, On Tuesday, April 9th, 2024 at 2:46 PM, Robert Haas wrote: > In all sincerity, I appreciate the endorsement. Basically what's been > scaring me about this feature is the possibility that there's some > incurable design flaw that I've managed to completely miss. If it has > some more garden-variety bugs, that's still pretty bad: people will > potentially lose data and be unable to get it back. But, as long as > we're able to find the bugs and fix them, the situation should improve > over time until, hopefully, everybody trusts it roughly as much as we > trust, say, crash recovery. Perhaps even a bit more: I think this code > is much better-written than our crash recovery code, which has grown > into a giant snarl that nobody seems able to untangle, despite > multiple refactoring attempts. However, if there's some reason why the > approach is fundamentally unsound which I and others have failed to > detect, then we're at risk of shipping a feature that is irretrievably > broken. That would really suck. IMHO it totally worth shipping such long-waited feature sooner than later. Yes, it is a complex one, but you started advertising it since last January already, so people should already be able to play with it in Beta. And as you mentioned in your blog about the evergreen backup: > But if you're anything like me, you'll already see that this arrangement > has two serious weaknesses. First, if there are any data-corrupting bugs > in pg_combinebackup or any of the server-side code that supports > incremental backup, this approach could get you into big trouble. At some point, the only way to really validate a backup is to actually try to restore it. And if people get encouraged to do that faster thanks to incremental backups, they could detect potential issues sooner. Ultimately, users will still need their full backups and WAL archives. If pg_combinebackup fails for any reason, the fix will be to perform the recovery from the full backup directly. They still should be able to recover, just slower. -- Stefan FERCOT Data Egret (https://dataegret.com)
Re: WIP Incremental JSON Parser
On Tue, Apr 9, 2024 at 4:54 AM Andrew Dunstan wrote: > On 2024-04-09 Tu 01:23, Michael Paquier wrote: > There is no direct check on test_json_parser_perf.c, either, only a > custom rule in the Makefile without specifying something for meson. > So it looks like you could do short execution check in a TAP test, at > least. > > Not adding a test for that was deliberate - any sane test takes a while, and > I didn't want to spend that much time on it every time someone runs "make > check-world" or equivalent. However, adding a test to run it with a trivial > number of iterations seems reasonable, so I'll add that. I'll also add a > meson target for the binary. Okay, but for what purpose? My understanding during review was that this was a convenience utility for people who were actively hacking on the code (and I used it for exactly that purpose a few months back, so I didn't question that any further). Why does the farm need to spend any time running it at all? --Jacob
Re: IPC::Run::time[r|out] vs our TAP tests
Michael Paquier writes: > By the way, are you planning to do something like [1]? I've not > looked in details at the callers of IPC::Run::timeout, still the extra > debug output would be nice. It needs more review I think - I didn't check every call site to see if anything would be broken. I believe Andrew has undertaken a survey of all the timeout/timer calls, but if he doesn't produce anything I might have a go at it after awhile. regards, tom lane
Re: IPC::Run::time[r|out] vs our TAP tests
On 2024-04-09 Tu 09:46, Tom Lane wrote: Michael Paquier writes: By the way, are you planning to do something like [1]? I've not looked in details at the callers of IPC::Run::timeout, still the extra debug output would be nice. It needs more review I think - I didn't check every call site to see if anything would be broken. I believe Andrew has undertaken a survey of all the timeout/timer calls, but if he doesn't produce anything I might have a go at it after awhile. What I looked at so far was the use of is_expired, but when you look into that you see that you need to delve further, to where timeout/timer objects are created and passed around. I'll take a closer look when I have done some incremental json housekeeping. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: WIP Incremental JSON Parser
On 2024-04-09 Tu 09:45, Jacob Champion wrote: On Tue, Apr 9, 2024 at 4:54 AM Andrew Dunstan wrote: On 2024-04-09 Tu 01:23, Michael Paquier wrote: There is no direct check on test_json_parser_perf.c, either, only a custom rule in the Makefile without specifying something for meson. So it looks like you could do short execution check in a TAP test, at least. Not adding a test for that was deliberate - any sane test takes a while, and I didn't want to spend that much time on it every time someone runs "make check-world" or equivalent. However, adding a test to run it with a trivial number of iterations seems reasonable, so I'll add that. I'll also add a meson target for the binary. Okay, but for what purpose? My understanding during review was that this was a convenience utility for people who were actively hacking on the code (and I used it for exactly that purpose a few months back, so I didn't question that any further). Why does the farm need to spend any time running it at all? I think Michael's point was that if we carry the code we should test we can run it. The other possibility would be just to remove it. I can see arguments for both. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Streaming relation data out of order
On Tue, Apr 9, 2024 at 12:19 AM Thomas Munro wrote: > This idea is due to Robert Haas, who complained that he feared that > the streaming I/O API already worked like this. It doesn't, but it > could! Here is a concept patch to try it out. Oh, no, it's all my fault! My favorite part of this patch is the comment added to read_stream.h, which IMHO makes things a whole lot clearer. I have no opinion on the rest of it; I don't understand this code. -- Robert Haas EDB: http://www.enterprisedb.com
removal of '{' from WORD_BREAKS
On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier wrote: > > It kind of looks like a libedit bug, but maybe we should dig more > > deeply. I felt itchy about 927332b95e77 removing '{' from the > > WORD_BREAKS set, and wondered exactly how that would change readline's > > behavior. But even if that somehow accounts for the extra backslash > > before '{', it's not clear how it could lead to '?' and '}' also > > getting backslashed. > > I don't have a clear idea, either. I also feel uneasy about > 927332b95e77 and its change of WORD_BREAKS, but this has the smell > of a bug from an outdated libedit version. I too felt uneasy about that commit, for the same reason. However, there is a justification for the change in the commit message which is not obviously wrong, namely that ":{?name} is the only psql syntax using the '{' sign". And in fact, SQL basically doesn't use '{' for anything, either. We do see { showing up inside of quoted strings, for arrays or JSON, but I would guess that the word-break characters aren't going to affect behavior within a quoted string. So it still seems like it should be OK? Another thing that makes me think that my unease may be unfounded is that the matching character '}' isn't in WORD_BREAKS either, and I would have thought that if we needed one we'd need both. But does anyone else have a more specific reason for thinking that this might be a problem? -- Robert Haas EDB: http://www.enterprisedb.com
Re: post-freeze damage control
On 2024-Apr-09, Stefan Fercot wrote: > At some point, the only way to really validate a backup is to actually try to > restore it. > And if people get encouraged to do that faster thanks to incremental backups, > they could detect potential issues sooner. > Ultimately, users will still need their full backups and WAL archives. > If pg_combinebackup fails for any reason, the fix will be to perform the > recovery from the full backup directly. > They still should be able to recover, just slower. I completely agree that people should be testing the feature so that we can fix any bugs as soon as possible. However, if my understanding is correct, restoring a full backup plus an incremental no longer needs the intervening WAL up to the incremental. Users wishing to save some disk space might be tempted to delete that WAL. If they do, and later it turns out that the full+incremental cannot be restored for whatever reason, they are in danger. But you're right that if they don't delete that WAL, then the full is restorable on its own. Maybe we should explicitly advise users to not delete that WAL from their archives, until pg_combinebackup is hammered a bit more. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
Re: post-freeze damage control
> On 9 Apr 2024, at 18:45, Alvaro Herrera wrote: > > Maybe we should explicitly advise users to not delete that WAL from > their archives, until pg_combinebackup is hammered a bit more. As a backup tool maintainer, I always reference to out-of-the box Postgres tools as some bulletproof alternative. I really would like to stick to this reputation and not discredit these tools. Best regards, Andrey Borodin.
Re: post-freeze damage control
On Mon, Apr 8, 2024 at 10:12 PM Tom Lane wrote: > I don't know that I'd call it scary exactly, but I do think it > was premature. A week ago there was no consensus that it was > ready to commit, but Alexander pushed it (or half of it, anyway) > despite that. Some of the most compelling cases for the transformation will involve path keys. If the transformation enables the optimizer to build a plain index scan (or index-only scan) with useful path keys, then that might well lead to a far superior plan compared to what's possible with BitmapOrs. I understand that it'll still be possible to use OR expression evaluation in such cases, without applying the transformation (via filter quals), so in principle you don't need the transformation to get an index scan that can (say) terminate the scan early due to the presence of an "ORDER BY ... LIMIT n". But I suspect that that won't work out much of the time, because the planner will believe (rightly or wrongly) that the filter quals will incur too many heap page accesses. Another problem (at least as I see it) with the new or_to_any_transform_limit GUC is that its design seems to have nothing to say about the importance of these sorts of cases. Most of these cases will only have 2 or 3 constants, just because that's what's most common in general. -- Peter Geoghegan
Re: WIP Incremental JSON Parser
On Tue, Apr 9, 2024 at 7:30 AM Andrew Dunstan wrote: > I think Michael's point was that if we carry the code we should test we > can run it. The other possibility would be just to remove it. I can see > arguments for both. Hm. If it's not acceptable to carry this (as a worse-is-better smoke test) without also running it during tests, then my personal vote would be to tear it out and just have people write/contribute targeted benchmarks when they end up working on performance. I don't think the cost/benefit makes sense at that point. --Jacob
Re: post-freeze damage control
Peter Geoghegan writes: > On Mon, Apr 8, 2024 at 10:12 PM Tom Lane wrote: >> I don't know that I'd call it scary exactly, but I do think it >> was premature. A week ago there was no consensus that it was >> ready to commit, but Alexander pushed it (or half of it, anyway) >> despite that. > Some of the most compelling cases for the transformation will involve > path keys. If the transformation enables the optimizer to build a > plain index scan (or index-only scan) with useful path keys, then that > might well lead to a far superior plan compared to what's possible > with BitmapOrs. I did not say it isn't a useful thing to have. I said the patch did not appear ready to go in. > I understand that it'll still be possible to use OR expression > evaluation in such cases, without applying the transformation (via > filter quals), so in principle you don't need the transformation to > get an index scan that can (say) terminate the scan early due to the > presence of an "ORDER BY ... LIMIT n". But I suspect that that won't > work out much of the time, because the planner will believe (rightly > or wrongly) that the filter quals will incur too many heap page > accesses. That's probably related to the fact that we don't have a mechanism for evaluating non-indexed quals against columns that are retrievable from the index. We really oughta work on getting that done. But I've been keeping a side eye on the work to unify plain and index-only scans, because that's going to touch a lot of the same code so it doesn't seem profitable to pursue those goals in parallel. > Another problem (at least as I see it) with the new > or_to_any_transform_limit GUC is that its design seems to have nothing > to say about the importance of these sorts of cases. Most of these > cases will only have 2 or 3 constants, just because that's what's most > common in general. Yeah, that's one of the reasons I'm dubious that the committed patch was ready. regards, tom lane
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 12:27 PM Tom Lane wrote: > > Some of the most compelling cases for the transformation will involve > > path keys. If the transformation enables the optimizer to build a > > plain index scan (or index-only scan) with useful path keys, then that > > might well lead to a far superior plan compared to what's possible > > with BitmapOrs. > > I did not say it isn't a useful thing to have. I said the patch > did not appear ready to go in. Didn't mean to suggest otherwise. I just wanted to hear your thoughts on this aspect of these sorts of transformations. > > I understand that it'll still be possible to use OR expression > > evaluation in such cases, without applying the transformation (via > > filter quals), so in principle you don't need the transformation to > > get an index scan that can (say) terminate the scan early due to the > > presence of an "ORDER BY ... LIMIT n". But I suspect that that won't > > work out much of the time, because the planner will believe (rightly > > or wrongly) that the filter quals will incur too many heap page > > accesses. > > That's probably related to the fact that we don't have a mechanism > for evaluating non-indexed quals against columns that are retrievable > from the index. We really oughta work on getting that done. I agree that that is very important work, but I'm not sure that it makes all that much difference here. Even if we had that improved mechanism already, today, using index quals would still be strictly better than expression evaluation. Index quals can allow nbtree to skip over irrelevant parts of the index as the need arises, which is a significant advantage in its own right. ISTM that the planner should always prefer index quals over expression evaluation, on general principle, even when there's no reason to think it'll work out. At worst the executor has essentially the same physical access patterns as the expression evaluation case. On the other hand, providing nbtree with that context might end up being a great deal faster. -- Peter Geoghegan
Re: WIP Incremental JSON Parser
On Mon, Apr 8, 2024 at 10:24 PM Michael Paquier wrote: > At the end, having a way to generate JSON blobs randomly to test this > stuff would be more appealing For the record, I'm working on an LLVM fuzzer target for the JSON parser. I think that would be a lot more useful than anything we can hand-code. But I want it to cover both the recursive and incremental code paths, and we'd need to talk about where it would live. libfuzzer is seeded with a bunch of huge incomprehensible blobs, which is something we're now trying to avoid checking in. There's also the security aspect of "what do we do when it finds something", and at that point maybe we need to look into a service like oss-fuzz. Thanks, --Jacob
Re: IPC::Run::time[r|out] vs our TAP tests
> On 4 Apr 2024, at 23:46, Daniel Gustafsson wrote: > >> On 4 Apr 2024, at 23:24, Tom Lane wrote: > >> A minimum fix that seems to make this work better is as attached, >> but I feel like somebody ought to examine all the IPC::Run::timer >> and IPC::Run::timeout calls and see which ones are mistaken. >> It's a little scary to convert a timeout to a timer because of >> the hazard that someplace that would need to be checking for >> is_expired isn't. > > Skimming this and a few callsites it seems reasonable to use a timer instead > of > a timeout, but careful study is needed to make sure we're not introducing > anything subtly wrong in the other direction. Sharing a few preliminary results from looking at this, the attached passes check-world but need more study/testing. It seems wrong to me that we die() in query and query_until rather than giving the caller the power to decide how to proceed. We have even documented that we do just this: "Dies on failure to invoke psql, or if psql fails to connect. Errors occurring later are the caller's problem" Turning the timeout into a timer and returning undef along with logging a test failure in case of expiration seems a bit saner (maybe Andrew can suggest an API which has a better Perl feel to it). Most callsites don't need any changes to accommodate for this, the attached 0002 implements this timer change and modify the few sites that need it, converting one to plain query() where the added complexity of query_until isn't required. A few other comments on related things that stood out while reviewing: The tab completion test can use the API call for automatically restart the timer to reduce the complexity of check_completion a hair. Done in 0001 (but really not necessary). Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in the recovery tests: $back_q->query_until( qr/logical_slot_get_changes/, q( \echo logical_slot_get_changes SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL); )); ... ... # Since there are no slots in standby_slot_names, the function # pg_logical_slot_get_changes should now return, and the session can be # stopped. $back_q->quit; There is no guarantee that pg_logical_slot_get_changes has returned when reaching this point. This might still work as intended, but the comment is slightly misleading IMO. recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e in a background session which it then skips terminating. Calling ->quit is mandated by the API, in turn required by IPC::Run. Calling ->quit on the process makes the test fail from the process having already exited, but we can call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes this. -- Daniel Gustafsson v1-0001-Configure-interactive-instance-to-restart-timer.patch Description: Binary data v1-0002-Report-test-failure-rather-than-aborting-in-case-.patch Description: Binary data v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patch Description: Binary data
Re: Can't find not null constraint, but \d+ shows that
On 2024-Mar-29, Tender Wang wrote: > I think aboved case can explain what's meaning about comments in > dropconstraint_internal. > But here, in RemoveConstraintById() , we only care about primary key case, > so NOT NULL is better to removed from comments. Actually, I think it's better if all the resets of attnotnull occur in RemoveConstraintById, for both types of constraints; we would keep that block in dropconstraint_internal only to raise errors in the cases where the constraint is protecting a replica identity or a generated column. Something like the attached, perhaps, may need more polish. I'm not really sure about the business of adding a new pass value -- it's clear that things don't work if we don't do *something* -- I'm just not sure if this is the something that we want to do. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Tiene valor aquel que admite que es un cobarde" (Fernandel) >From 397d424cd64020bcb4aff219deefca7fc7b6f88d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 9 Apr 2024 19:18:57 +0200 Subject: [PATCH] Correctly reset attnotnull when constraints dropped indirectly --- src/backend/catalog/pg_constraint.c | 105 +- src/backend/commands/tablecmds.c | 67 ++ src/test/regress/expected/constraints.out | 11 +++ src/test/regress/sql/constraints.sql | 7 ++ 4 files changed, 150 insertions(+), 40 deletions(-) diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 604280d322..17c7a2d0f3 100644 --- a/src/backend/catalog/pg_constraint.c +++ b/src/backend/catalog/pg_constraint.c @@ -19,6 +19,7 @@ #include "access/htup_details.h" #include "access/sysattr.h" #include "access/table.h" +#include "access/xact.h" #include "catalog/catalog.h" #include "catalog/dependency.h" #include "catalog/heap.h" @@ -916,6 +917,8 @@ RemoveConstraintById(Oid conId) Relation conDesc; HeapTuple tup; Form_pg_constraint con; + bool dropping_pk = false; + List *unconstrained_cols = NIL; conDesc = table_open(ConstraintRelationId, RowExclusiveLock); @@ -940,7 +943,9 @@ RemoveConstraintById(Oid conId) /* * We need to update the relchecks count if it is a check constraint * being dropped. This update will force backends to rebuild relcache - * entries when we commit. + * entries when we commit. For not-null and primary key constraints, + * obtain the list of columns affected, so that we can reset their + * attnotnull flags below. */ if (con->contype == CONSTRAINT_CHECK) { @@ -967,6 +972,36 @@ RemoveConstraintById(Oid conId) table_close(pgrel, RowExclusiveLock); } + else if (con->contype == CONSTRAINT_NOTNULL) + { + unconstrained_cols = list_make1_int(extractNotNullColumn(tup)); + } + else if (con->contype == CONSTRAINT_PRIMARY) + { + Datum adatum; + ArrayType *arr; + int numkeys; + bool isNull; + int16 *attnums; + + dropping_pk = true; + + adatum = heap_getattr(tup, Anum_pg_constraint_conkey, + RelationGetDescr(conDesc), &isNull); + if (isNull) +elog(ERROR, "null conkey for constraint %u", con->oid); + arr = DatumGetArrayTypeP(adatum); /* ensure not toasted */ + numkeys = ARR_DIMS(arr)[0]; + if (ARR_NDIM(arr) != 1 || +numkeys < 0 || +ARR_HASNULL(arr) || +ARR_ELEMTYPE(arr) != INT2OID) +elog(ERROR, "conkey is not a 1-D smallint array"); + attnums = (int16 *) ARR_DATA_PTR(arr); + + for (int i = 0; i < numkeys; i++) +unconstrained_cols = lappend_int(unconstrained_cols, attnums[i]); + } /* Keep lock on constraint's rel until end of xact */ table_close(rel, NoLock); @@ -986,6 +1021,74 @@ RemoveConstraintById(Oid conId) /* Fry the constraint itself */ CatalogTupleDelete(conDesc, &tup->t_self); + /* + * If this was a NOT NULL or the primary key, the constrained columns must + * have had pg_attribute.attnotnull set. See if we need to reset it, and + * do so. + */ + if (unconstrained_cols) + { + Relation tablerel; + Relation attrel; + Bitmapset *pkcols; + ListCell *lc; + + /* Make the above deletion visible */ + CommandCounterIncrement(); + + tablerel = table_open(con->conrelid, NoLock); /* already have lock */ + attrel = table_open(AttributeRelationId, RowExclusiveLock); + + /* + * We want to test columns for their presence in the primary key, but + * only if we're not dropping it. + */ + pkcols = dropping_pk ? NULL : + RelationGetIndexAttrBitmap(tablerel, + INDEX_ATTR_BITMAP_PRIMARY_KEY); + + foreach(lc, unconstrained_cols) + { + AttrNumber attnum = lfirst_int(lc); + HeapTuple atttup; + HeapTuple contup; + Form_pg_attribute attForm; + + /* + * Obtain pg_attribute tuple and verify conditions on it. We use + * a copy we can scribble on. + */ + atttup = SearchSysCacheCopyAttNum(con->conrelid, attnum); + if (!HeapTupleIsValid(atttup)) +elog(ERROR, "cache lookup failed for attr
macOS Ventura won't generate core dumps
At least not for me. According to various things I found on the Internet, it's now required that you codesign your binaries and give them an entitlement in order to generate core dumps: https://nasa.github.io/trick/howto_guides/How-to-dump-core-file-on-MacOS.html But according to previous on-list discussion, code-signing a binary means that DYLD_* will be ignored: http://postgr.es/m/920451.1634265...@sss.pgh.pa.us Now, if DYLD_* is ignored, then our regression tests won't work properly. But if core dumps are not enabled, then how am I supposed to debug things that can only be debugged with a core dump? Unless I'm missing something, this is positively rage-inducing. It's reasonable, on Apple's part, to want to install more secure defaults, but having no practical way of overriding the defaults is not reasonable. And by "practical," I mean ideally (a) doesn't require a macOS-specific patch to the PostgreSQL source repository but at least (b) can be done somehow without breaking other important things that also need to work. Anyone have any ideas? -- Robert Haas EDB: http://www.enterprisedb.com
Re: macOS Ventura won't generate core dumps
Hi, On 2024-04-09 13:35:51 -0400, Robert Haas wrote: > Now, if DYLD_* is ignored, then our regression tests won't work > properly. But if core dumps are not enabled, then how am I supposed to > debug things that can only be debugged with a core dump? FWIW, I posted a patch a while back to make meson builds support relative rpaths on some platforms, including macos. I.e. each library/binary finds the location of the needed libraries relative to its own location. That gets rid of the need to use DYLD_ at all, because the temporary install for the tests can find the library location without a problem. Perhaps it's worth picking that up again? https://github.com/anarazel/postgres/tree/meson-rpath https://github.com/anarazel/postgres/commit/46f1963fee7525c3cc3837ef8423cbf6cb08d10a Greetings, Andres Freund
Re: Annoying build warnings from latest Apple toolchain
Hi, Looks like I forgot to update the thread to note that I finally committed the remaining warning fixes. I had already fixed a bunch of others in upstream meson. commit a3da95deee38ee067b0bead639c830eacbe894d5 Author: Andres Freund Date: 2024-03-13 01:40:53 -0700 meson: macos: Avoid warnings on Sonoma Greetings, Andres Freund
Re: Improve eviction algorithm in ReorderBuffer
On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote: > > I have some further comments and I believe changes are required for > > v17. I also noticed that the simplehash is declared in binaryheap.h with "SH_SCOPE extern", which seems wrong. Attached is a rough patch to bring the declarations into binaryheap.c. Note that the attached patch uses "SH_SCOPE static", which makes sense to me in this case, but causes a bunch of warnings in gcc. I will post separately about silencing that warning, but for now you can either use: SH_SCOPE static inline which is probably fine, but will encourage the compiler to inline more code, when not all callers even use the hash table. Alternatively, you can do: SH_SCOPE static pg_attribute_unused() which looks a bit wrong to me, but seems to silence the warnings, and lets the compiler decide whether or not to inline. Also probably needs comment updates, etc. Regards, Jeff Davis From 08dcf21646e4ded22b10fd0ed536d3bbf6fc1328 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 9 Apr 2024 10:45:00 -0700 Subject: [PATCH v1] binaryheap: move hash table out of header into binaryheap.c. Commit b840508644 declared the internal hash table in the header with scope "extern", which was unnecessary. --- src/common/binaryheap.c | 15 ++- src/include/lib/binaryheap.h | 25 + 2 files changed, 15 insertions(+), 25 deletions(-) diff --git a/src/common/binaryheap.c b/src/common/binaryheap.c index c20ed50acc..2501dad6f2 100644 --- a/src/common/binaryheap.c +++ b/src/common/binaryheap.c @@ -25,6 +25,18 @@ #include "common/hashfn.h" #include "lib/binaryheap.h" +/* + * Struct for a hash table element to store the node's index in the bh_nodes + * array. + */ +typedef struct bh_nodeidx_entry +{ + bh_node_type key; + int index; /* entry's index within the node array */ + char status; /* hash status */ + uint32 hash; /* hash values (cached) */ +} bh_nodeidx_entry; + /* * Define parameters for hash table code generation. The interface is *also* * declared in binaryheap.h (to generate the types, which are externally @@ -37,12 +49,13 @@ #define SH_HASH_KEY(tb, key) \ hash_bytes((const unsigned char *) &key, sizeof(bh_node_type)) #define SH_EQUAL(tb, a, b) (memcmp(&a, &b, sizeof(bh_node_type)) == 0) -#define SH_SCOPE extern +#define SH_SCOPE static #ifdef FRONTEND #define SH_RAW_ALLOCATOR pg_malloc0 #endif #define SH_STORE_HASH #define SH_GET_HASH(tb, a) a->hash +#define SH_DECLARE #define SH_DEFINE #include "lib/simplehash.h" diff --git a/src/include/lib/binaryheap.h b/src/include/lib/binaryheap.h index 4c1a1bb274..8b47132fc3 100644 --- a/src/include/lib/binaryheap.h +++ b/src/include/lib/binaryheap.h @@ -29,29 +29,6 @@ typedef Datum bh_node_type; */ typedef int (*binaryheap_comparator) (bh_node_type a, bh_node_type b, void *arg); -/* - * Struct for a hash table element to store the node's index in the bh_nodes - * array. - */ -typedef struct bh_nodeidx_entry -{ - bh_node_type key; - int index; /* entry's index within the node array */ - char status; /* hash status */ - uint32 hash; /* hash values (cached) */ -} bh_nodeidx_entry; - -/* Define parameters necessary to generate the hash table interface. */ -#define SH_PREFIX bh_nodeidx -#define SH_ELEMENT_TYPE bh_nodeidx_entry -#define SH_KEY_TYPE bh_node_type -#define SH_SCOPE extern -#ifdef FRONTEND -#define SH_RAW_ALLOCATOR pg_malloc0 -#endif -#define SH_DECLARE -#include "lib/simplehash.h" - /* * binaryheap * @@ -76,7 +53,7 @@ typedef struct binaryheap * node's index in bh_nodes. This enables the caller to perform * binaryheap_remove_node_ptr(), binaryheap_update_up/down in O(log n). */ - bh_nodeidx_hash *bh_nodeidx; + struct bh_nodeidx_hash *bh_nodeidx; } binaryheap; extern binaryheap *binaryheap_allocate(int num_nodes, -- 2.34.1
simplehash.h: "SH_SCOPE static" causes warnings
If using "SH_SCOPE static" with simplehash.h, it causes a bunch of warnings about functions that are defined but not used. It's simple enough to fix by appending pg_attribute_unused() to the declarations (attached). There are currently no callers that use "SH_SCOPE static", but I'm suggesting its use in the thread below as a cleanup to a recently- committed feature: https://www.postgresql.org/message-id/791d98f474e518387d09eb390b8a12f265d130cc.ca...@j-davis.com The reason I'm suggesting it there is because the hash table is used only for the indexed binary heap, not an ordinary binary heap, so I'd like to leave it up to the compiler whether to do any inlining or not. If someone thinks the attached patch is a good change to commit now, please let me know. Otherwise, I'll recommend "static inline" in the above thread and leave the attached patch to be considered for v18. Regards, Jeff Davis From 198235448a009a46812eac4854d760af76c85139 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Tue, 9 Apr 2024 10:42:05 -0700 Subject: [PATCH v1 1/2] simplehash.h: declare with pg_attribute_unused(). If SH_SCOPE is static, avoid warnings for unused functions. --- src/include/lib/simplehash.h | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/src/include/lib/simplehash.h b/src/include/lib/simplehash.h index 3e1b1f9461..7d93d68e93 100644 --- a/src/include/lib/simplehash.h +++ b/src/include/lib/simplehash.h @@ -188,62 +188,62 @@ typedef struct SH_ITERATOR /* externally visible function prototypes */ #ifdef SH_RAW_ALLOCATOR /* _hash _create(uint32 nelements, void *private_data) */ -SH_SCOPE SH_TYPE *SH_CREATE(uint32 nelements, void *private_data); +SH_SCOPE SH_TYPE *SH_CREATE(uint32 nelements, void *private_data) pg_attribute_unused(); #else /* * _hash _create(MemoryContext ctx, uint32 nelements, * void *private_data) */ SH_SCOPE SH_TYPE *SH_CREATE(MemoryContext ctx, uint32 nelements, - void *private_data); + void *private_data) pg_attribute_unused(); #endif /* void _destroy(_hash *tb) */ -SH_SCOPE void SH_DESTROY(SH_TYPE * tb); +SH_SCOPE void SH_DESTROY(SH_TYPE * tb) pg_attribute_unused(); /* void _reset(_hash *tb) */ -SH_SCOPE void SH_RESET(SH_TYPE * tb); +SH_SCOPE void SH_RESET(SH_TYPE * tb) pg_attribute_unused(); /* void _grow(_hash *tb, uint64 newsize) */ -SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize); +SH_SCOPE void SH_GROW(SH_TYPE * tb, uint64 newsize) pg_attribute_unused(); /* *_insert(_hash *tb, key, bool *found) */ -SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found); +SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT(SH_TYPE * tb, SH_KEY_TYPE key, bool *found) pg_attribute_unused(); /* * *_insert_hash(_hash *tb, key, uint32 hash, * bool *found) */ SH_SCOPE SH_ELEMENT_TYPE *SH_INSERT_HASH(SH_TYPE * tb, SH_KEY_TYPE key, - uint32 hash, bool *found); + uint32 hash, bool *found) pg_attribute_unused(); /* *_lookup(_hash *tb, key) */ -SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key); +SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP(SH_TYPE * tb, SH_KEY_TYPE key) pg_attribute_unused(); /* *_lookup_hash(_hash *tb, key, uint32 hash) */ SH_SCOPE SH_ELEMENT_TYPE *SH_LOOKUP_HASH(SH_TYPE * tb, SH_KEY_TYPE key, - uint32 hash); + uint32 hash) pg_attribute_unused(); /* void _delete_item(_hash *tb, *entry) */ -SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry); +SH_SCOPE void SH_DELETE_ITEM(SH_TYPE * tb, SH_ELEMENT_TYPE * entry) pg_attribute_unused(); /* bool _delete(_hash *tb, key) */ -SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key); +SH_SCOPE bool SH_DELETE(SH_TYPE * tb, SH_KEY_TYPE key) pg_attribute_unused(); /* void _start_iterate(_hash *tb, _iterator *iter) */ -SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter); +SH_SCOPE void SH_START_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter) pg_attribute_unused(); /* * void _start_iterate_at(_hash *tb, _iterator *iter, * uint32 at) */ -SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at); +SH_SCOPE void SH_START_ITERATE_AT(SH_TYPE * tb, SH_ITERATOR * iter, uint32 at) pg_attribute_unused(); /* *_iterate(_hash *tb, _iterator *iter) */ -SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter); +SH_SCOPE SH_ELEMENT_TYPE *SH_ITERATE(SH_TYPE * tb, SH_ITERATOR * iter) pg_attribute_unused(); /* void _stat(_hash *tb */ -SH_SCOPE void SH_STAT(SH_TYPE * tb); +SH_SCOPE void SH_STAT(SH_TYPE * tb) pg_attribute_unused(); #endif /* SH_DECLARE */ -- 2.34.1
Re: removal of '{' from WORD_BREAKS
On Tue, Apr 9, 2024 at 6:18 PM Robert Haas wrote: > On Thu, Apr 4, 2024 at 10:38 PM Michael Paquier wrote: > > > It kind of looks like a libedit bug, but maybe we should dig more > > > deeply. I felt itchy about 927332b95e77 removing '{' from the > > > WORD_BREAKS set, and wondered exactly how that would change readline's > > > behavior. But even if that somehow accounts for the extra backslash > > > before '{', it's not clear how it could lead to '?' and '}' also > > > getting backslashed. > > > > I don't have a clear idea, either. I also feel uneasy about > > 927332b95e77 and its change of WORD_BREAKS, but this has the smell > > of a bug from an outdated libedit version. > > I too felt uneasy about that commit, for the same reason. However, > there is a justification for the change in the commit message which is > not obviously wrong, namely that ":{?name} is the only psql syntax > using the '{' sign". And in fact, SQL basically doesn't use '{' for > anything, either. We do see { showing up inside of quoted strings, for > arrays or JSON, but I would guess that the word-break characters > aren't going to affect behavior within a quoted string. So it still > seems like it should be OK? Another thing that makes me think that my > unease may be unfounded is that the matching character '}' isn't in > WORD_BREAKS either, and I would have thought that if we needed one > we'd need both. FWIW, the default value of rl_basic_word_break_characters [1] has '{' but doesn't have '}'. The documentation says that this should "break words for completion in Bash". But I failed to find an explanation why this should be so for Bash. As you correctly get, my idea was that our SQL isn't not heavily using '{' unlike Bash. > But does anyone else have a more specific reason for thinking that > this might be a problem? I don't particularly object against reverting this commit, but I think we should get to the bottom of this first. Otherwise there is no warranty to not run into the same problem again. Links. 1. https://tiswww.case.edu/php/chet/readline/readline.html#index-rl_005fbasic_005fword_005fbreak_005fcharacters -- Regards, Alexander Korotkov
Re: macOS Ventura won't generate core dumps
Robert Haas writes: > At least not for me. According to various things I found on the > Internet, it's now required that you codesign your binaries and give > them an entitlement in order to generate core dumps: Still works for me, at least on machines where I have SIP turned off. Admittedly, Apple's busy making that a less and less desirable choice. > Unless I'm missing something, this is positively rage-inducing. I don't disagree. regards, tom lane
Re: macOS Ventura won't generate core dumps
On Tue, Apr 9, 2024 at 2:37 PM Tom Lane wrote: > Robert Haas writes: > > At least not for me. According to various things I found on the > > Internet, it's now required that you codesign your binaries and give > > them an entitlement in order to generate core dumps: > > Still works for me, at least on machines where I have SIP turned > off. Admittedly, Apple's busy making that a less and less desirable > choice. What exact version are you running? -- Robert Haas EDB: http://www.enterprisedb.com
Re: simplehash.h: "SH_SCOPE static" causes warnings
On Tue, Apr 9, 2024 at 2:10 PM Jeff Davis wrote: > If using "SH_SCOPE static" with simplehash.h, it causes a bunch of > warnings about functions that are defined but not used. It's simple > enough to fix by appending pg_attribute_unused() to the declarations > (attached). Hmm. I'm pretty sure that I've run into this problem, but I concluded that I should use either "static inline" or "extern" and didn't think any more of it. I'm not sure that I like the idea of just ignoring the warnings, for fear that the compiler might not actually remove the code for the unused functions from the resulting binary. But I'm not an expert in this area either, so maybe I'm wrong. -- Robert Haas EDB: http://www.enterprisedb.com
Re: simplehash.h: "SH_SCOPE static" causes warnings
Hi, On 2024-04-09 11:10:15 -0700, Jeff Davis wrote: > The reason I'm suggesting it there is because the hash table is used > only for the indexed binary heap, not an ordinary binary heap, so I'd > like to leave it up to the compiler whether to do any inlining or not. FWIW, with just about any modern-ish compiler just using "inline" doesn't actually force inlining, it just changes the cost model to make it more likely. > If someone thinks the attached patch is a good change to commit now, > please let me know. Otherwise, I'll recommend "static inline" in the > above thread and leave the attached patch to be considered for v18. I'm not opposed. I'd however at least add a comment explaining why this is being used. Arguably it doesn't make sense to add it to *_create(), as without that there really isn't a point in having a simplehash instantiation. Might make it slightly easier to notice obsoleted uses of simplehash. Greetings, Andres Freund
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 8:55 AM Tom Lane wrote: > Andrei Lepikhov writes: > > On 9/4/2024 09:12, Tom Lane wrote: > >> I have another one that I'm not terribly happy about: > >> Author: Alexander Korotkov > >> Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300 > >> Transform OR clauses to ANY expression > > >> * What the medical community would call off-label usage of > >> query jumbling. I'm not sure this is even correct as-used, > >> and for sure it's using that code for something never intended. > >> Nor is the added code adequately (as in, at all) documented. > > > I agree with documentation and disagree with critics on the expression > > jumbling. It was introduced in the core. Why don't we allow it to be > > used to speed up machinery with some hashing? > > I would back up from that a good deal: why do we need to hash here in > the first place? There's no evidence I'm aware of that it's needful > from a performance standpoint. I think the feature is aimed to deal with large OR lists. I've seen a significant degradation on 1 or-clause-groups. That might seem like awfully a lot, but actually it's not unachievable in generated queries. Links. 1. https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com -- Regards, Alexander Korotkov
Re: simplehash.h: "SH_SCOPE static" causes warnings
On Tue, 2024-04-09 at 14:49 -0400, Robert Haas wrote: > Hmm. I'm pretty sure that I've run into this problem, but I concluded > that I should use either "static inline" or "extern" and didn't think > any more of it. Pages of warnings is not ideal, though. We should either support "SH_SCOPE static", or have some kind of useful #error that makes it clear that we don't support it (and/or don't think it's a good idea). > I'm not sure that I like the idea of just ignoring the > warnings, for fear that the compiler might not actually remove the > code for the unused functions from the resulting binary. But I'm not > an expert in this area either, so maybe I'm wrong. In a simple "hello world" test with an unreferenced static function, it doesn't seem to be a problem at -O2. I suppose it could be with some compiler somewhere, or perhaps in a more complex scenario, but it would seem strange to me. Regards, Jeff Davis
Re: simplehash.h: "SH_SCOPE static" causes warnings
Hi, On Tue, 2024-04-09 at 11:56 -0700, Andres Freund wrote: > FWIW, with just about any modern-ish compiler just using "inline" > doesn't > actually force inlining, it just changes the cost model to make it > more > likely. OK. In the linked thread, I didn't see a good reason to encourage the compiler to inline the code. Only one caller uses the hash table, so my instinct would be that the code for maniuplating it should not be inlined. But "extern" (which is the scope now) is certainly not right, so "static" made the most sense to me. > > I'm not opposed. I'd however at least add a comment explaining why > this is > being used. Arguably it doesn't make sense to add it to *_create(), > as without > that there really isn't a point in having a simplehash instantiation. > Might > make it slightly easier to notice obsoleted uses of simplehash. That's a good idea that preserves some utility for the warnings. Should I go ahead and commit something like that now, or hold it until the other thread concludes, or hold it until the July CF? Regards, Jeff Davis
Re: macOS Ventura won't generate core dumps
Robert Haas writes: > On Tue, Apr 9, 2024 at 2:37 PM Tom Lane wrote: >> Still works for me, at least on machines where I have SIP turned >> off. Admittedly, Apple's busy making that a less and less desirable >> choice. > What exact version are you running? Works for me on Sonoma 14.4.1 and Ventura 13.6.6, and has done in many versions before those. The usual gotchas apply: you need to have started the postmaster under "ulimit -c unlimited", and the /cores directory has to be writable by whatever user the postmaster is running as. I have occasionally seen system updates reduce the privileges on /cores, although not recently. regards, tom lane
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 5:12 AM Tom Lane wrote: > * OrClauseGroupKey is not a Node type, so why does it have > a NodeTag? I wonder what value will appear in that field, > and what will happen if the struct is passed to any code > that expects real Nodes. I used that to put both not-subject-of-transform nodes together with hash entries into the same list. This is used to save the order of clauses. I think this is an important property, and I have already expressed it in [1]. That could be achieved without adding NodeTag to hash entries, but that would require a level of indirection. It's not passed to code that expects real Nodes, it doesn't go to anything except lists. Links. 1. https://www.postgresql.org/message-id/CAPpHfdutHt31sdt2rfU%3D4fsDMWxf6tvtnHARgCzLY2Tf21%2Bfgw%40mail.gmail.com -- Regards, Alexander Korotkov
Re: simplehash.h: "SH_SCOPE static" causes warnings
On Tue, Apr 9, 2024 at 3:30 PM Jeff Davis wrote: > Pages of warnings is not ideal, though. We should either support > "SH_SCOPE static", or have some kind of useful #error that makes it > clear that we don't support it (and/or don't think it's a good idea). Fair. > > I'm not sure that I like the idea of just ignoring the > > warnings, for fear that the compiler might not actually remove the > > code for the unused functions from the resulting binary. But I'm not > > an expert in this area either, so maybe I'm wrong. > > In a simple "hello world" test with an unreferenced static function, it > doesn't seem to be a problem at -O2. I suppose it could be with some > compiler somewhere, or perhaps in a more complex scenario, but it would > seem strange to me. OK. -- Robert Haas EDB: http://www.enterprisedb.com
Re: simplehash.h: "SH_SCOPE static" causes warnings
On Tue, Apr 9, 2024 at 3:33 PM Jeff Davis wrote: > Should I go ahead and commit something like that now, or hold it until > the other thread concludes, or hold it until the July CF? I think it's fine to commit it now if it makes it usefully easier to fix an open item, and otherwise it should wait until next cycle. But I also wonder if we shouldn't just keep using static inline everywhere. I'm guessing if we do this people are just going to make random decisions about which one to use every time this comes up. -- Robert Haas EDB: http://www.enterprisedb.com
Re: macOS Ventura won't generate core dumps
On Tue, Apr 9, 2024 at 3:44 PM Tom Lane wrote: > Works for me on Sonoma 14.4.1 and Ventura 13.6.6, and has done > in many versions before those. > > The usual gotchas apply: you need to have started the postmaster > under "ulimit -c unlimited", and the /cores directory has to be > writable by whatever user the postmaster is running as. I have > occasionally seen system updates reduce the privileges on /cores, > although not recently. Interesting. I'm on Ventura 13.6.2. I think I've checked all of the stuff you mention, but I'll do some more investigation. -- Robert Haas EDB: http://www.enterprisedb.com
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 8:37 AM Andrei Lepikhov wrote: > On 9/4/2024 09:12, Tom Lane wrote: > > I have another one that I'm not terribly happy about: > > > > Author: Alexander Korotkov > > Branch: master [72bd38cc9] 2024-04-08 01:27:52 +0300 > > > > Transform OR clauses to ANY expression > Because I'm primary author of the idea, let me answer. > > > > I don't know that I'd call it scary exactly, but I do think it > > was premature. A week ago there was no consensus that it was > > ready to commit, but Alexander pushed it (or half of it, anyway) > > despite that. A few concrete concerns: > > > > * Yet another planner GUC. Do we really need or want that? > It is the most interesting question here. Looking around planner > features designed but not applied for the same reason because they can > produce suboptimal plans in corner cases, I think about inventing > flag-type parameters and hiding some features that work better for > different load types under such flagged parameters. Yes, I have spotted this transformation could cause a bitmap scan plan regressions in [1] and [2]. Fixing that required to treat ANY the same as OR for bitmap scans. Andrei implemented that in [3], but that increases planning complexity and elimitates significant part of the advantages of OR-to-ANY transformation. Links. 1. https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com 2. https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com 3. https://www.postgresql.org/message-id/6d27d752-db0b-4cac-9843-6ba3dd7a1e94%40postgrespro.ru -- Regards, Alexander Korotkov
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 7:27 PM Tom Lane wrote: > Peter Geoghegan writes: > > On Mon, Apr 8, 2024 at 10:12 PM Tom Lane wrote: > >> I don't know that I'd call it scary exactly, but I do think it > >> was premature. A week ago there was no consensus that it was > >> ready to commit, but Alexander pushed it (or half of it, anyway) > >> despite that. > > > Some of the most compelling cases for the transformation will involve > > path keys. If the transformation enables the optimizer to build a > > plain index scan (or index-only scan) with useful path keys, then that > > might well lead to a far superior plan compared to what's possible > > with BitmapOrs. > > I did not say it isn't a useful thing to have. I said the patch > did not appear ready to go in. > > > I understand that it'll still be possible to use OR expression > > evaluation in such cases, without applying the transformation (via > > filter quals), so in principle you don't need the transformation to > > get an index scan that can (say) terminate the scan early due to the > > presence of an "ORDER BY ... LIMIT n". But I suspect that that won't > > work out much of the time, because the planner will believe (rightly > > or wrongly) that the filter quals will incur too many heap page > > accesses. > > That's probably related to the fact that we don't have a mechanism > for evaluating non-indexed quals against columns that are retrievable > from the index. We really oughta work on getting that done. But > I've been keeping a side eye on the work to unify plain and index-only > scans, because that's going to touch a lot of the same code so it > doesn't seem profitable to pursue those goals in parallel. > > > Another problem (at least as I see it) with the new > > or_to_any_transform_limit GUC is that its design seems to have nothing > > to say about the importance of these sorts of cases. Most of these > > cases will only have 2 or 3 constants, just because that's what's most > > common in general. > > Yeah, that's one of the reasons I'm dubious that the committed > patch was ready. While inventing this GUC, I was thinking more about avoiding regressions rather than about unleashing the full power of this optimization. But now I see that that wasn't good enough. And it was definitely hasty to commit to this shape. I apologize for this. Tom, I think you are way more experienced in this codebase than me. And, probably more importantly, more experienced in making decisions for planner development. If you see some way forward to polish this post-commit, Andrei and I are ready to work hard on this with you. If you don't see (or don't think that's good), let's revert this. -- Regards, Alexander Korotkov
Re: macOS Ventura won't generate core dumps
Robert Haas writes: > On Tue, Apr 9, 2024 at 3:44 PM Tom Lane wrote: >> The usual gotchas apply: you need to have started the postmaster >> under "ulimit -c unlimited", and the /cores directory has to be >> writable by whatever user the postmaster is running as. I have >> occasionally seen system updates reduce the privileges on /cores, >> although not recently. > Interesting. I'm on Ventura 13.6.2. I think I've checked all of the > stuff you mention, but I'll do some more investigation. Huh, that's odd. One idea that comes to mind is that the core files are frickin' large, for me typically around 3.5GB-4GB even for a postmaster running with default shared memory sizes. (I don't know why, although it seems to me they were less ridiculous a few years ago.) Is it possible you're running out of space, or hitting a quota of some sort? regards, tom lane
Re: post-freeze damage control
Alexander Korotkov writes: > On Tue, Apr 9, 2024 at 5:12 AM Tom Lane wrote: >> * OrClauseGroupKey is not a Node type, so why does it have >> a NodeTag? I wonder what value will appear in that field, >> and what will happen if the struct is passed to any code >> that expects real Nodes. > I used that to put both not-subject-of-transform nodes together with > hash entries into the same list. This is used to save the order of > clauses. I think this is an important property, and I have already > expressed it in [1]. What exactly is the point of having a NodeTag in the struct though? If you don't need it to be a valid Node, that seems pointless and confusing. We certainly have plenty of other lists that contain plain structs without tags, so I don't buy that the List infrastructure is making you do that. regards, tom lane
Re: post-freeze damage control
Alexander Korotkov writes: > On Tue, Apr 9, 2024 at 7:27 PM Tom Lane wrote: >> Yeah, that's one of the reasons I'm dubious that the committed >> patch was ready. > While inventing this GUC, I was thinking more about avoiding > regressions rather than about unleashing the full power of this > optimization. But now I see that that wasn't good enough. And it was > definitely hasty to commit to this shape. I apologize for this. > Tom, I think you are way more experienced in this codebase than me. > And, probably more importantly, more experienced in making decisions > for planner development. If you see some way forward to polish this > post-commit, Andrei and I are ready to work hard on this with you. If > you don't see (or don't think that's good), let's revert this. It wasn't ready to commit, and I think trying to fix it up post feature freeze isn't appropriate project management. Let's revert it and work on it more in the v18 time frame. regards, tom lane
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 11:37 PM Tom Lane wrote: > Alexander Korotkov writes: > > On Tue, Apr 9, 2024 at 5:12 AM Tom Lane wrote: > >> * OrClauseGroupKey is not a Node type, so why does it have > >> a NodeTag? I wonder what value will appear in that field, > >> and what will happen if the struct is passed to any code > >> that expects real Nodes. > > > I used that to put both not-subject-of-transform nodes together with > > hash entries into the same list. This is used to save the order of > > clauses. I think this is an important property, and I have already > > expressed it in [1]. > > What exactly is the point of having a NodeTag in the struct though? > If you don't need it to be a valid Node, that seems pointless and > confusing. We certainly have plenty of other lists that contain > plain structs without tags, so I don't buy that the List > infrastructure is making you do that. This code mixes Expr's and hash entries in the single list. The point of having a NodeTag in the struct is the ability to distinguish them later. -- Regards, Alexander Korotkov
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 11:42 PM Tom Lane wrote: > Alexander Korotkov writes: > > On Tue, Apr 9, 2024 at 7:27 PM Tom Lane wrote: > >> Yeah, that's one of the reasons I'm dubious that the committed > >> patch was ready. > > > While inventing this GUC, I was thinking more about avoiding > > regressions rather than about unleashing the full power of this > > optimization. But now I see that that wasn't good enough. And it was > > definitely hasty to commit to this shape. I apologize for this. > > > Tom, I think you are way more experienced in this codebase than me. > > And, probably more importantly, more experienced in making decisions > > for planner development. If you see some way forward to polish this > > post-commit, Andrei and I are ready to work hard on this with you. If > > you don't see (or don't think that's good), let's revert this. > > It wasn't ready to commit, and I think trying to fix it up post > feature freeze isn't appropriate project management. Let's revert > it and work on it more in the v18 time frame. Ok, let's do this. I'd like to hear from you some directions for further development of this patch if possible. -- Regards, Alexander Korotkov
Re: Improve eviction algorithm in ReorderBuffer
On 09/04/2024 21:04, Jeff Davis wrote: On Fri, 2024-04-05 at 16:58 +0900, Masahiko Sawada wrote: I have some further comments and I believe changes are required for v17. I also noticed that the simplehash is declared in binaryheap.h with "SH_SCOPE extern", which seems wrong. Attached is a rough patch to bring the declarations into binaryheap.c. Note that the attached patch uses "SH_SCOPE static", which makes sense to me in this case, but causes a bunch of warnings in gcc. I will post separately about silencing that warning, but for now you can either use: SH_SCOPE static inline which is probably fine, but will encourage the compiler to inline more code, when not all callers even use the hash table. Alternatively, you can do: SH_SCOPE static pg_attribute_unused() which looks a bit wrong to me, but seems to silence the warnings, and lets the compiler decide whether or not to inline. Also probably needs comment updates, etc. Sorry I'm late to the party, I didn't pay attention to this thread earlier. But I wonder why this doesn't use the existing pairing heap implementation? I would assume that to be at least as good as the binary heap + hash table. And it's cheap to to insert to (O(1)), so we could probably remove the MAX_HEAP_TXN_COUNT_THRESHOLD, and always keep the heap up-to-date. -- Heikki Linnakangas Neon (https://neon.tech)
Re: removal of '{' from WORD_BREAKS
Alexander Korotkov writes: > On Tue, Apr 9, 2024 at 6:18 PM Robert Haas wrote: >> I too felt uneasy about that commit, for the same reason. However, >> there is a justification for the change in the commit message which is >> not obviously wrong, namely that ":{?name} is the only psql syntax >> using the '{' sign". And in fact, SQL basically doesn't use '{' for >> anything, either. True. > FWIW, the default value of rl_basic_word_break_characters [1] has '{' > but doesn't have '}'. The documentation says that this should "break > words for completion in Bash". But I failed to find an explanation > why this should be so for Bash. As you correctly get, my idea was > that our SQL isn't not heavily using '{' unlike Bash. Yeah, there's no doubt that what we are currently using for WORD_BREAKS has mostly been cargo-culted from Bash rather than having any solid foundation in SQL syntax. It works all right for us today because we don't really try to complete anything in general SQL expression contexts, so as long as we have whitespace and parens in there we're more or less fine. I wonder a bit why comma isn't in there, though. As an example, vacuum (full,fre fails to complete "freeze", though it works fine with a space after the comma. I've not experimented, but it seems certain that it'd behave better with comma in WORD_BREAKS. Whether that'd pessimize other behaviors, I dunno. The longer-range concern that I have is that if we ever want to complete stuff within expressions, I think we're going to need all the valid operator characters to be in WORD_BREAKS. And that will be a problem for this patch, not because of '{' but because of '?'. So I'd be happier if the parsing were done in a way that did not rely on '{' and '?' being treated as word characters. But I've not looked into how hard that'd be. In any case, it's likely that expanding WORD_BREAKS like that would cause some other problems that'd have to be fixed, so it's not very reasonable to expect this patch to avoid a hypothetical future problem. regards, tom lane
Re: post-freeze damage control
Alexander Korotkov writes: > On Tue, Apr 9, 2024 at 11:37 PM Tom Lane wrote: >> What exactly is the point of having a NodeTag in the struct though? >> If you don't need it to be a valid Node, that seems pointless and >> confusing. We certainly have plenty of other lists that contain >> plain structs without tags, so I don't buy that the List >> infrastructure is making you do that. > This code mixes Expr's and hash entries in the single list. The point > of having a NodeTag in the struct is the ability to distinguish them > later. If you're doing that, it really really ought to be a proper Node. If nothing else, that would aid debugging by allowing the list to be pprint'ed. regards, tom lane
Re: Issue with the PRNG used by Postgres
Hi, On 2024-04-08 22:52:09 -0700, Parag Paul wrote: > We have an interesting problem, where PG went to PANIC due to stuck > spinlock case. > On careful analysis and hours of trying to reproduce this(something that > showed up in production after almost 2 weeks of stress run), I did some > statistical analysis on the RNG generator that PG uses to create the > backoff for the spin locks. ISTM that the fix here is to not use a spinlock for whatever the contention is on, rather than improve the RNG. Greetings, Andres Freund
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On Mon, Apr 8, 2024 at 09:32:14PM +0200, Jelte Fennema-Nio wrote: > I'll sketch a situation: There's a big patch that some non-committer > submitted that has been sitting on the mailinglist for 6 months or > more, only being reviewed by other non-committers, which the submitter > quickly addresses. Then in the final commit fest it is finally > reviewed by a committer, and they say it requires significant changes. > Right now, the submitter can decide to quickly address those changes, > and hope to keep the attention of this committer, to hopefully get it > merged before the deadline after probably a few more back and forths. > But this new rule would mean that those significant changes would be a > reason not to put it in the upcoming release. Which I expect would > first of all really feel like a slap in the face to the submitter, > because it's not their fault that those changes were only requested in > the last commit fest. This would then likely result in the submitter > not following up quickly (why make time right at this moment, if > you're suddenly going to have another full year), which would then > cause the committer to lose context of the patch and thus interest in > the patch. And then finally getting into the exact same situation next > year in the final commit fest, when some other committer didn't agree > with the redesign of the first one and request a new one pushing it > another year. Yes, I can see this happening. :-( -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On Mon, Apr 8, 2024 at 10:41:17AM -0400, Robert Treat wrote: > Unfortunately many humans are hardwired towards procrastination and > last minute heroics (it's one reason why deadline driven development > works even though in the long run it tends to be bad practice), and > historically was one of the driving factors in why we started doing > commitfests in the first place (you should have seen the mad dash of > commits before we had that process), so ISTM that it's not completely > avoidable... > > That said, are you suggesting that the feature freeze deadline be > random, and also held in secret by the RMT, only to be announced after > the freeze time has passed? This feels weird, but might apply enough > deadline related pressure while avoiding last minute shenanigans. Committing code is a hard job, no question. However, committers have to give up the idea that they should wait for brilliant ideas before finalizing patches. If you come up with a better idea later, great, but don't wait to finalize patches. I used to write college papers much too late because I expected some brilliant idea to come to me, and it rarely happened. I learned to write the paper with the ideas I had, and if I come up with a better idea later, I can add it to the end. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Allow non-superuser to cancel superuser tasks.
Hi, thanks for looking into this. On Tue, 9 Apr 2024 at 08:53, Michael Paquier wrote: > On Mon, Apr 08, 2024 at 05:42:05PM +, Leung, Anthony wrote: > > Are you suggesting that we check if the backend is B_AUTOVAC in > > pg_cancel/ terminate_backend? That seems a bit unclean to me since > > pg_cancel_backend & pg_cancel_backend does not access to the > > procNumber to check the type of the backend. > > > > IMHO, we can keep SIGNAL_BACKEND_NOAUTOVACUUM but just improve the > > errmsg / errdetail to not expose that the backend is an AV > > worker. It'll also be helpful if you can suggest what errdetail we > > should use here. > > The thing is that you cannot rely on a lookup of the backend type for > the error information, or you open yourself to letting the caller of > pg_cancel_backend or pg_terminate_backend know if a backend is > controlled by a superuser or if a backend is an autovacuum worker. > Good catch. Thanks. I think we need to update the error message to not leak backend type info. > The choice of pg_signal_autovacuum is a bit inconsistent, as well, > because autovacuum workers operate like regular backends. This name > can also be confused with the autovacuum launcher. Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good enough?
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-09 17:13:52 +1200, Thomas Munro wrote: > On Tue, Apr 9, 2024 at 5:01 PM Michael Paquier wrote: > > On Mon, Apr 08, 2024 at 12:23:56PM +0300, Nazir Bilal Yavuz wrote: > > > On Mon, 8 Apr 2024 at 11:00, Alexander Lakhin wrote: > > >> As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > > >> is fixed already. > > >> Perhaps it's worth rechecking... > > >> > > >> [1] > > >> https://www.postgresql.org/message-id/d2bf3727-bae4-3aee-65f6-caec2c4ebaa8%40gmail.com > > > > > > I had this problem on my local computer. My build times are: > > > > > > gcc: 20s > > > clang-15: 24s > > > clang-16: 105s > > > clang-17: 111s > > > clang-18: 25s > > > > Interesting. A parallel build of ecpg shows similar numbers here: > > clang-16: 101s > > clang-17: 112s > > clang-18: 14s > > gcc: 10s > > I don't expect it to get fixed BTW, because it's present in 16.0.6, > and .6 is the terminal release, if I understand their system > correctly. They're currently only doing bug fixes for 18, and even > there not for much longer. Interesting that not everyone saw this at > first, perhaps the bug arrived in a minor release that some people > didn't have yet? Or perhaps there is something special required to > trigger it? I think we need to do something about the compile time of this file, even with gcc. Our main grammar already is an issue and stacking all the ecpg stuff on top makes it considerably worse. ISTM there's a bunch of pretty pointless stuff in the generated preproc.y, which do seem to have some impact on compile time. E.g. a good bit of the file is just stuff like reserved_keyword: ALL { $$ = mm_strdup("all"); } ... Why are strduping all of these? We could instead just use the value of the token, instead of forcing the compiler to generate branches for all individual keywords etc. I don't know off-hand if the keyword lookup machinery ends up with an uppercase keyword, but if so, that'd be easy enough to change. It actually looks to me like the many calls to mm_strdup() might actually be what's driving clang nuts. I hacked up preproc.y to not need those calls for unreserved_keyword col_name_keyword type_func_name_keyword reserved_keyword bare_label_keyword by removing the actions and defining those tokens to be of type str. There are many more such calls that could be dealt with similarly. That alone reduced compile times with clang-16 -O1 from 18.268s to 12.516s clang-16 -O2 from 345.188 to 158.084s clang-19 -O2 from 26.018s to 15.200s I suspect what is happening is that clang tries to optimize the number of calls to mm_strdup(), by separating the argument setup from the function call. Which leads to a control flow graph with *many* incoming edges to the basic block containing the function call to mm_strdup(), triggering a normally harmless O(N^2) or such. Greetings, Andres Freund
Re: Allow non-superuser to cancel superuser tasks.
On Wed, Apr 10, 2024 at 12:52:19AM +0300, Kirill Reshke wrote: > On Tue, 9 Apr 2024 at 08:53, Michael Paquier wrote: >> The thing is that you cannot rely on a lookup of the backend type for >> the error information, or you open yourself to letting the caller of >> pg_cancel_backend or pg_terminate_backend know if a backend is >> controlled by a superuser or if a backend is an autovacuum worker. > > Good catch. Thanks. I think we need to update the error message to not > leak backend type info. Yep, that's necessary I am afraid. >> The choice of pg_signal_autovacuum is a bit inconsistent, as well, >> because autovacuum workers operate like regular backends. This name >> can also be confused with the autovacuum launcher. > > Ok. What would be a good choice? Is `pg_signal_autovacuum_worker` good > enough? Sounds fine to me. Perhaps others have an opinion about that? -- Michael signature.asc Description: PGP signature
Re: Speed up clean meson builds by ~25%
Andres Freund writes: > I think we need to do something about the compile time of this file, even with > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on > top makes it considerably worse. Seems reasonable, if we can. > Why are strduping all of these? IIRC, the issue is that the mechanism for concatenating the tokens back together frees the input strings static char * cat2_str(char *str1, char *str2) { char * res_str= (char *)mm_alloc(strlen(str1) + strlen(str2) + 2); strcpy(res_str, str1); if (strlen(str1) != 0 && strlen(str2) != 0) strcat(res_str, " "); strcat(res_str, str2); free(str1); <-- free(str2); <-- return res_str; } So that ought to dump core if you don't make all the productions return malloc'd strings. How did you work around that? (Maybe it'd be okay to just leak all the strings?) regards, tom lane
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
On Mon, Apr 8, 2024 at 11:43 PM Dmitry Koval wrote: > Attached fix for the problems found by Alexander Lakhin. > > About grammar errors. > Unfortunately, I don't know English well. > Therefore, I plan (in the coming days) to show the text to specialists > who perform technical translation of documentation. Thank you. I've pushed this fix with minor corrections from me. -- Regards, Alexander Korotkov
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-09 15:33:10 -0700, Andres Freund wrote: > Which leads to a control flow graph with *many* incoming edges to the basic > block containing the function call to mm_strdup(), triggering a normally > harmless O(N^2) or such. With clang-16 -O2 there is a basic block with 3904 incoming basic blocks. With the hacked up preproc.y it's 2968. A 30% increase leading to a doubling of runtime imo seems consistent with my theory of there being some ~quadratic behaviour. I suspect that this is also what's causing gram.c compilation to be fairly slow, with both clang and gcc. There aren't as many pstrdup()s in gram.y as the are mm_strdup() in preproc.y, but there still are many. ISTM that there are many pstrdup()s that really make very little sense. I think it's largely because there are many rules declared %type , which prevents us from returning a string constant without a warning. There may be (?) some rules that modify strings returned by subsidiary rules, but if so, it can't be many. Greetings, Andres
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-09 19:00:41 -0400, Tom Lane wrote: > Andres Freund writes: > > I think we need to do something about the compile time of this file, even > > with > > gcc. Our main grammar already is an issue and stacking all the ecpg stuff on > > top makes it considerably worse. > > Seems reasonable, if we can. > > > Why are strduping all of these? > > IIRC, the issue is that the mechanism for concatenating the tokens > back together frees the input strings Ah, that explains it - but also seems somewhat unnecessary. > So that ought to dump core if you don't make all the productions > return malloc'd strings. How did you work around that? I just tried to get to the point of understanding the reasons for slow compilation, not to actually keep it working :). I.e. I didn't. > (Maybe it'd be okay to just leak all the strings?) Hm. The input to ecpg can be fairly large, I guess. And we have fun code like cat_str(), which afaict is O(arguments^2) in its memory usage if we wouldn't free? Not immediately sure what the right path is. Greetings, Andres Freund
Re: post-freeze damage control
On 4/10/24 01:59, Andrey M. Borodin wrote: On 9 Apr 2024, at 18:45, Alvaro Herrera wrote: Maybe we should explicitly advise users to not delete that WAL from their archives, until pg_combinebackup is hammered a bit more. As a backup tool maintainer, I always reference to out-of-the box Postgres tools as some bulletproof alternative. I really would like to stick to this reputation and not discredit these tools. +1. Even so, only keeping WAL for the last backup is a dangerous move in any case. Lots of things can happen to a backup (other than bugs in the software) so keeping WAL back to the last full (or for all backups) is always an excellent idea. Regards, -David
wal_consistemcy_checking clean on HEAD
Hi all, I have been doing some checks with page masking and WAL consistency now that we are in feature freeze, and there are no failures on HEAD: cd src/test/recovery/ && \ PG_TEST_EXTRA=wal_consistency_checking \ PROVE_TESTS=t/027_stream_regress.pl make check It's been on my TODO list to automate that in one of my buildfarm animals, and never got down to do it. I've looked at the current animal fleet, and it looks that we don't have one yet. Perhaps I've just missed something? Thanks, -- Michael signature.asc Description: PGP signature
Re: wal_consistemcy_checking clean on HEAD
On Tue, Apr 9, 2024 at 7:35 PM Michael Paquier wrote: > It's been on my TODO list to automate that in one of my buildfarm > animals, and never got down to do it. I've looked at the current > animal fleet, and it looks that we don't have one yet. Perhaps I've > just missed something? wal_consistency_checking is very useful in general. I find myself using it fairly regularly. That's probably why it's not finding anything now: most people working on something that touches WAL already know that testing their patch with wal_consistency_checking early is a good idea. Of course it also wouldn't be a bad idea to have a BF animal for that, especially because we already have BF animals that test things far more niche than this. -- Peter Geoghegan
Re: Speed up clean meson builds by ~25%
Andres Freund writes: > On 2024-04-09 19:00:41 -0400, Tom Lane wrote: >> Andres Freund writes: >>> Why are strduping all of these? >> IIRC, the issue is that the mechanism for concatenating the tokens >> back together frees the input strings > Ah, that explains it - but also seems somewhat unnecessary. I experimented with replacing mm_strdup() with #define mm_strdup(x) (x) As you did, I wasn't trying to get to a working result, so I didn't do anything about removing all the free's or fixing the cast-away-const warnings. The result was disappointing though. On my Mac laptop (Apple clang version 15.0.0), the compile time for preproc.o went from 6.7sec to 5.5sec. Which is better, but not enough better to persuade me to do all the janitorial work of restructuring ecpg's string-slinging. I think we haven't really identified the problem. As a comparison point, compiling gram.o on the same machine takes 1.3sec. So I am seeing a problem here, sure enough, although not as bad as it is in some other clang versions. regards, tom lane
Re: post-freeze damage control
On Wed, Apr 10, 2024 at 09:29:38AM +1000, David Steele wrote: > Even so, only keeping WAL for the last backup is a dangerous move in any > case. Lots of things can happen to a backup (other than bugs in the > software) so keeping WAL back to the last full (or for all backups) is > always an excellent idea. Yeah, that's an excellent practive, but is why I'm less worried for this feature. The docs at [1] caution about "not to remove earlier backups if they might be needed when restoring later incremental backups". Like Alvaro said, should we insist a bit more about the WAL retention part in this section of the docs, down to the last full backup? [1]: https://www.postgresql.org/docs/devel/continuous-archiving.html#BACKUP-INCREMENTAL-BACKUP -- Michael signature.asc Description: PGP signature
Re: Fixup some StringInfo usages
On Tue, 9 Apr 2024 at 14:27, Tom Lane wrote: > > David Rowley writes: > > Similar to f736e188c, I've attached a patch that fixes up a few > > misusages of the StringInfo functions. These just swap one function > > call for another function that is more suited to the use case. > > > I feel like it's a good idea to fix these soon while they're new > > rather than wait N years and make backpatching bug fixes possibly > > harder. > > +1. Looks good in a quick eyeball scan. Thank you to both of you for having a look. Pushed. David
Re: "an SQL" vs. "a SQL"
On Tue, 9 Apr 2024 at 16:18, David Rowley wrote: > There's just 1 instance of "a SQL" that crept into PG16 after > d866f0374. This probably means I'd be better off doing this in June a > few weeks before branching... > > Patch attached. Pushed. David
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-09 19:44:03 -0400, Tom Lane wrote: > I experimented with replacing mm_strdup() with > > #define mm_strdup(x) (x) > > As you did, I wasn't trying to get to a working result, so I didn't do > anything about removing all the free's or fixing the cast-away-const > warnings. The result was disappointing though. On my Mac laptop > (Apple clang version 15.0.0), the compile time for preproc.o went from > 6.7sec to 5.5sec. Which is better, but not enough better to persuade > me to do all the janitorial work of restructuring ecpg's > string-slinging. I think we haven't really identified the problem. With what level of optimization was that? It kinda looks like their version might be from before the worst of the issue... FWIW, just redefining mm_strdup() that way doesn't help much here either, even with an affected compiler. The gain increases substantially after simplifying unreserved_keyword etc to just use the default action. I think having the non-default actions for those branches leaves you with a similar issue, each of the actions just set a register, storing that and going to the loop iteration is the same. FWIW: clang-19 -O2 "plain"0m24.354s mm_strdup redefined0m23.741s +use default action0m14.218s Greetings, Andres Freund
Re: Speed up clean meson builds by ~25%
Andres Freund writes: > On 2024-04-09 19:44:03 -0400, Tom Lane wrote: >> As you did, I wasn't trying to get to a working result, so I didn't do >> anything about removing all the free's or fixing the cast-away-const >> warnings. The result was disappointing though. On my Mac laptop >> (Apple clang version 15.0.0), the compile time for preproc.o went from >> 6.7sec to 5.5sec. Which is better, but not enough better to persuade >> me to do all the janitorial work of restructuring ecpg's >> string-slinging. I think we haven't really identified the problem. > With what level of optimization was that? It kinda looks like their version > might be from before the worst of the issue... Just the autoconf-default -O2. > FWIW, just redefining mm_strdup() that way doesn't help much here either, > even with an affected compiler. The gain increases substantially after > simplifying unreserved_keyword etc to just use the default action. Hm. In any case, this is all moot unless we can come to a new design for how ecpg does its string-mashing. Thoughts? I thought for a bit about not allocating strings as such, but just passing around pointers into the source text plus lengths, and reassembling the string data only at the end when we need to output it. Not sure how well that would work, but it could be a starting point. regards, tom lane
Re: Speed up clean meson builds by ~25%
Hi, On 2024-04-09 20:12:48 -0400, Tom Lane wrote: > Andres Freund writes: > > FWIW, just redefining mm_strdup() that way doesn't help much here either, > > even with an affected compiler. The gain increases substantially after > > simplifying unreserved_keyword etc to just use the default action. > > Hm. > > In any case, this is all moot unless we can come to a new design for > how ecpg does its string-mashing. Thoughts? Tthere might be a quick-n-dirty way: We could make pgc.l return dynamically allocated keywords. Am I missing something, or is ecpg string handling almost comically inefficient? Building up strings in tiny increments, which then get mashed together to get slightly larger pieces, just to then be mashed together again? It's like an intentional allocator stress test. It's particularly absurd because in the end we just print those strings, after carefully assembling them... > I thought for a bit about not allocating strings as such, but just > passing around pointers into the source text plus lengths, and > reassembling the string data only at the end when we need to output it. > Not sure how well that would work, but it could be a starting point. I was wondering about something vaguely similar: Instead of concatenating individual strings, append them to a stringinfo. The stringinfo can be reset individually... Greetings, Andres Freund
Re: broken JIT support on Fedora 40
On Tue, Apr 9, 2024 at 10:05 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > + /* In assertion builds, run the LLVM verify pass. */ > +#ifdef USE_ASSERT_CHECKING > + LLVMPassBuilderOptionsSetVerifyEach(options, true); > +#endif Thanks, that seems nicer. I think the question is whether it will slow down build farm/CI/local meson test runs to a degree that exceeds its value. Another option would be to have some other opt-in macro, like the existing #ifdef LLVM_PASS_DEBUG, for people who maintain JIT-related stuff to turn on. Supposing we go with USE_ASSERT_CHECKING, I have another question: - const char *nm = "llvm.lifetime.end.p0i8"; + const char *nm = "llvm.lifetime.end.p0"; Was that a mistake, or did the mangling rules change in some version? I don't currently feel inclined to go and test this on the ancient versions we claim to support in back-branches. Perhaps we should just do this in master, and then it'd be limited to worrying about LLVM versions 10-18 (see 820b5af7), which have the distinct advantage of being available in package repositories for testing. Or I suppose we could back-patch, but only do it if LLVM_VERSION_MAJOR >= 10. Or we could do it unconditionally, and wait for ancient-LLVM build farm animals to break if they're going to. I pushed the illegal attribute fix though. Thanks for the detective work! (It crossed my mind that perhaps deform functions should have their own template function, but if someone figures out that that's a good idea, I think we'll *still* need that change just pushed.)
Re: Why is parula failing?
On Tue, 9 Apr 2024 at 15:48, David Rowley wrote: > Still no partition_prune failures on master since the compiler version > change. There has been one [1] in REL_16_STABLE. I'm thinking it > might be worth backpatching the partition_prune debug to REL_16_STABLE > to see if we can learn anything new from it. Master failed today for the first time since the compiler upgrade. Again reltuples == 48. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=parula&dt=2024-04-10%2000%3A27%3A02 David
Re: Speed up clean meson builds by ~25%
Andres Freund writes: > On 2024-04-09 20:12:48 -0400, Tom Lane wrote: >> In any case, this is all moot unless we can come to a new design for >> how ecpg does its string-mashing. Thoughts? > Am I missing something, or is ecpg string handling almost comically > inefficient? Building up strings in tiny increments, which then get mashed > together to get slightly larger pieces, just to then be mashed together again? > It's like an intentional allocator stress test. > It's particularly absurd because in the end we just print those strings, after > carefully assembling them... It is that. Here's what I'm thinking: probably 90% of what ecpg does is to verify that a chunk of its input represents a valid bit of SQL (or C) syntax and then emit a representation of that chunk. Currently, that representation tends to case-normalize tokens and smash inter-token whitespace and comments to a single space. I propose though that neither of those behaviors is mission-critical, or even all that desirable. I think few users would complain if ecpg preserved the input's casing and spacing and comments. Given that definition, most of ecpg's productions (certainly just about all the auto-generated ones) would simply need to return a pointer and length describing a part of the input string. There are places where ecpg wants to insert some text it generates, and I think it might need to re-order text in a few places, so we need a production result representation that can cope with those cases --- but if we can make "regurgitate the input" cases efficient, I think we'll have licked the performance problem. With that in mind, I wonder whether we couldn't make the simple cases depend on bison's existing support for location tracking. In which case, the actual actions for all those cases could be default, achieving one of the goals you mention. Obviously, this is not going to be a small lift, but it kind of seems do-able. regards, tom lane
Re: Why is parula failing?
On Wed, 10 Apr 2024 at 10:24, David Rowley wrote: > Master failed today for the first time since the compiler upgrade. > Again reltuples == 48. >From the buildfarm members page, parula seems to be the only aarch64 + gcc 13.2 combination today, and then I suspect whether this is about gcc v13.2 maturity on aarch64? I'll try to upgrade one of the other aarch64s I have (massasauga or snakefly) and see if this is more about gcc 13.2 maturity on this architecture. - robins
Re: Improve eviction algorithm in ReorderBuffer
On Tue, 2024-04-09 at 23:49 +0300, Heikki Linnakangas wrote: > > I wonder why this doesn't use the existing pairing heap > implementation? I would assume that to be at least as good as the > binary > heap + hash table I agree that an additional hash table is not needed -- there's already a hash table to do a lookup based on xid in reorderbuffer.c. I had suggested that the heap could track the element indexes for efficient update/removal, but that would be a change to the binaryheap.h API, which would require some discussion (and possibly not be acceptable post-freeze). But I think you're right: a pairing heap already solves the problem without modification. (Note: our pairing heap API doesn't explicitly support updating a key, so I think it would need to be done with remove/add.) So we might as well just do that right now rather than trying to fix the way the hash table is being used or trying to extend the binaryheap API. Of course, we should measure to be sure there aren't bad cases around updating/removing a key, but I don't see a fundamental reason that it should be worse. Regards, Jeff Davis
Requiring LLVM 14+ in PostgreSQL 18
Hi PostgreSQL 18 will ship after these vacuum horizon systems reach EOL[1]: animal | arch | llvm_version | os | os_release | end_of_support ---+-+--+++ branta | s390x | 10.0.0 | Ubuntu | 20.04 | 2025-04-01 splitfin | aarch64 | 10.0.0 | Ubuntu | 20.04 | 2025-04-01 urutau | s390x | 10.0.0 | Ubuntu | 20.04 | 2025-04-01 massasauga | aarch64 | 11.1.0 | Amazon | 2 | 2025-06-30 snakefly | aarch64 | 11.1.0 | Amazon | 2 | 2025-06-30 Therefore, some time after the tree re-opens for hacking, we could rip out a bunch of support code for LLVM 10-13, and then rip out support for pre-opaque-pointer mode. Please see attached. [1] https://www.postgresql.org/message-id/CA%2BhUKG%2B-g61yq7Ce4aoZtBDO98b4GXH8Cu3zxVk-Zn1Vh7TKpA%40mail.gmail.com From f5de5c6535b825033b1829eaf340baacc10ed654 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 19 Oct 2023 04:45:46 +1300 Subject: [PATCH 1/2] jit: Require at least LLVM 14, if enabled. Remove support for LLVM versions 10-13. The default on all non-EOL'd OSes represented in our build farm will be at least LLVM 14 when PostgreSQL 18 ships. Discussion: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com --- config/llvm.m4 | 8 +- configure | 8 +- doc/src/sgml/installation.sgml | 4 +- meson.build | 2 +- src/backend/jit/llvm/llvmjit.c | 109 src/backend/jit/llvm/llvmjit_error.cpp | 25 -- src/backend/jit/llvm/llvmjit_inline.cpp | 13 --- src/backend/jit/llvm/llvmjit_wrap.cpp | 4 - 8 files changed, 11 insertions(+), 162 deletions(-) diff --git a/config/llvm.m4 b/config/llvm.m4 index 44769d819a0..6ca088b5a45 100644 --- a/config/llvm.m4 +++ b/config/llvm.m4 @@ -13,7 +13,7 @@ AC_DEFUN([PGAC_LLVM_SUPPORT], AC_REQUIRE([AC_PROG_AWK]) AC_ARG_VAR(LLVM_CONFIG, [path to llvm-config command]) - PGAC_PATH_PROGS(LLVM_CONFIG, llvm-config llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14 llvm-config-13 llvm-config-12 llvm-config-11 llvm-config-10) + PGAC_PATH_PROGS(LLVM_CONFIG, llvm-config llvm-config-18 llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14) # no point continuing if llvm wasn't found if test -z "$LLVM_CONFIG"; then @@ -25,14 +25,14 @@ AC_DEFUN([PGAC_LLVM_SUPPORT], AC_MSG_ERROR([$LLVM_CONFIG does not work]) fi # and whether the version is supported - if echo $pgac_llvm_version | $AWK -F '.' '{ if ([$]1 >= 10) exit 1; else exit 0;}';then -AC_MSG_ERROR([$LLVM_CONFIG version is $pgac_llvm_version but at least 10 is required]) + if echo $pgac_llvm_version | $AWK -F '.' '{ if ([$]1 >= 14) exit 1; else exit 0;}';then +AC_MSG_ERROR([$LLVM_CONFIG version is $pgac_llvm_version but at least 14 is required]) fi AC_MSG_NOTICE([using llvm $pgac_llvm_version]) # need clang to create some bitcode files AC_ARG_VAR(CLANG, [path to clang compiler to generate bitcode]) - PGAC_PATH_PROGS(CLANG, clang clang-17 clang-16 clang-15 clang-14 clang-13 clang-12 clang-11 clang-10) + PGAC_PATH_PROGS(CLANG, clang clang-18 clang-17 clang-16 clang-15 clang-14) if test -z "$CLANG"; then AC_MSG_ERROR([clang not found, but required when compiling --with-llvm, specify with CLANG=]) fi diff --git a/configure b/configure index cfbd2a096f4..ba4ce5a5282 100755 --- a/configure +++ b/configure @@ -5065,7 +5065,7 @@ if test "$with_llvm" = yes; then : if test -z "$LLVM_CONFIG"; then - for ac_prog in llvm-config llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14 llvm-config-13 llvm-config-12 llvm-config-11 llvm-config-10 + for ac_prog in llvm-config llvm-config-18 llvm-config-17 llvm-config-16 llvm-config-15 llvm-config-14 do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 @@ -5129,8 +5129,8 @@ fi as_fn_error $? "$LLVM_CONFIG does not work" "$LINENO" 5 fi # and whether the version is supported - if echo $pgac_llvm_version | $AWK -F '.' '{ if ($1 >= 10) exit 1; else exit 0;}';then -as_fn_error $? "$LLVM_CONFIG version is $pgac_llvm_version but at least 10 is required" "$LINENO" 5 + if echo $pgac_llvm_version | $AWK -F '.' '{ if ($1 >= 14) exit 1; else exit 0;}';then +as_fn_error $? "$LLVM_CONFIG version is $pgac_llvm_version but at least 14 is required" "$LINENO" 5 fi { $as_echo "$as_me:${as_lineno-$LINENO}: using llvm $pgac_llvm_version" >&5 $as_echo "$as_me: using llvm $pgac_llvm_version" >&6;} @@ -5138,7 +5138,7 @@ $as_echo "$as_me: using llvm $pgac_llvm_version" >&6;} # need clang to create some bitcode files if test -z "$CLANG"; then - for ac_prog in clang clang-17 clang-16 clang-15 clang-14 clang-13 clang-12 clang-11 clang-10 + for ac_prog in clang clang-18 clang-17 clang-16 clang-15 clang-14 do # Extract the first word of "$ac_prog", so it can be a progra
Re: post-freeze damage control
On Tue, Apr 9, 2024 at 2:47 AM Robert Haas wrote: > > - I'm slightly worried about the TID store work (ee1b30f12, 30e144287, > 667e65aac35), perhaps for no reason. Actually, the results seem really > impressive, First, thanks for the complement. I actually suspect if we had this years ago, it might never have occurred to anyone to go through the trouble of adding parallel index cleanup. In a funny way, it's almost too effective -- the claim is that m_w_m over 1GB is perfectly usable, but I haven't been able to get anywere near that via vacuum (we have indirectly, via bespoke dev code, but...). I don't have easy access to hardware that can hold a table big enough to do so -- vacuuming 1 billion records stays under 400MB. > and a very quick look at some of the commits seemed kind > of reassuring. But, like the changes to pruning and freezing, this is > making some really fundamental changes to critical code. In this case, > it's code that has evolved very little over the years and seems to > have now evolved quite a lot. True. I'd say that at a high level, storage and retrieval of TIDs is a lot simpler conceptually than other aspects of vacuuming. The low-level guts are now much more complex, but I'm confident it won't just output a wrong answer. That aspect has been working for a long time, and when it has broken during development, it fails very quickly and obviously. The more challenging aspects are less cut-and-dried, like memory management, delegation of responsibility, how to expose locking (which vacuum doesn't even use), readability/maintainability. Those are more subjective, but it seems to have finally clicked into place in a way that feels like the right trade-offs have been made. That's hand-wavy, I realize. The more recent follow-up commits are pieces that were discussed and planned for earlier, but have had less review and were left out from the initial commits since they're not essential to the functionality. I judged that test coverage was enough to have confidence in them. On Tue, Apr 9, 2024 at 6:33 AM Michael Paquier wrote: > > This worries me less than other patches like the ones around heap > changes or the radix tree stuff for TID collection plugged into > vacuum, which don't have explicit on/off switches AFAIK. Yes, there is no switch. Interestingly enough, the previous item array ended up with its own escape hatch to tamp down its eager allocation, autovacuum_work_mem. Echoing what I said to Robert, if we had the new storage years ago, I doubt this GUC would ever have been proposed. I'll also mention the array is still in the code base, but only in a test module as a standard to test against. Hopefully that offers some reassurance.
Re: Improve eviction algorithm in ReorderBuffer
On Tue, Apr 09, 2024 at 06:24:43PM -0700, Jeff Davis wrote: > I had suggested that the heap could track the element indexes for > efficient update/removal, but that would be a change to the > binaryheap.h API, which would require some discussion (and possibly not > be acceptable post-freeze). > > But I think you're right: a pairing heap already solves the problem > without modification. (Note: our pairing heap API doesn't explicitly > support updating a key, so I think it would need to be done with > remove/add.) So we might as well just do that right now rather than > trying to fix the way the hash table is being used or trying to extend > the binaryheap API. > > Of course, we should measure to be sure there aren't bad cases around > updating/removing a key, but I don't see a fundamental reason that it > should be worse. This is going to require a rewrite of 5bec1d6bc5e3 with a new performance study, which strikes me as something that we'd better not do after feature freeze. Wouldn't the best way forward be to revert 5bec1d6bc5e3 and revisit the whole in v18? I have added an open item, for now. -- Michael signature.asc Description: PGP signature
Re: Improve eviction algorithm in ReorderBuffer
On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote: > Wouldn't the best way forward be to revert > 5bec1d6bc5e3 and revisit the whole in v18? That's a reasonable conclusion. Also consider commits b840508644 and bcb14f4abc. I had tried to come up with a narrower fix, and I think it's already been implemented here in approach 2: https://www.postgresql.org/message-id/CAD21AoAtf12e9Z9NLBuaO1GjHMMo16_8R-yBu9Q9jrk2QLqMEA%40mail.gmail.com but it does feel wrong to introduce an unnecessary hash table in 17 when we know it's not the right solution. Regards, Jeff Davis
Re: Improve eviction algorithm in ReorderBuffer
On Tue, Apr 09, 2024 at 09:16:53PM -0700, Jeff Davis wrote: > On Wed, 2024-04-10 at 12:13 +0900, Michael Paquier wrote: >> Wouldn't the best way forward be to revert >> 5bec1d6bc5e3 and revisit the whole in v18? > > Also consider commits b840508644 and bcb14f4abc. Indeed. These are also linked. -- Michael signature.asc Description: PGP signature
Re: Weird test mixup
On Tue, Apr 09, 2024 at 12:41:57PM +0900, Michael Paquier wrote: > Applied that after a second check. And thanks to Bharath for the > poke. And now that the buildfarm is cooler, I've also applied the final patch in the series as of 5105c9079681 to make the GIN module concurrent-safe using injection_points_set_local(). -- Michael signature.asc Description: PGP signature