Re: Add 64-bit XIDs into PostgreSQL 15
Hi; Trying to discuss where we are talking past eachother. On Fri, Nov 25, 2022 at 9:38 AM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi hackers, > > > I'm wondering whether the safest way to handle this is by creating a > > new TAM called "heap64", so that all storage changes happens there. > > > Many current users see stability as one of the greatest strengths of > > Postgres, so while I very much support this move, I wonder if this > > gives us a way to have both stability and innovation at the same time? > > That would be nice. > > However from what I see TransactionId is a type used globally in > PostgresSQL. It is part of structures used by TAM interface, used in > WAL records, etc. So we will have to learn these components to work > with 64-bit XIDs anyway and then start thinking about cases like: when > a user runs a transaction affecting two tables, a heap32 one and > heap64 one and we will have to figure out which tuples are visible and > which are not. This perhaps is doable but the maintenance burden for > the project will be too high IMO. > > It seems to me that the best option we can offer for the users looking > for stability is to use the latest PostgreSQL version with 32-bit > XIDs. Assuming these users care that much about this particular design > choice of course. > I didn't see any changes to pg_upgrade to make this change possible on upgrade. Is that also outside of the scope of your patch set? If so how is that continuity supposed to be ensured? Also related to that, I think you would have to have a check on streaming replication that both instances use the same xid format (that you don't accidently upgrade this somehow), since this is set per db cluster, right? > > > The whole project seems to just ignore basic, pertinent questions. > > Questions like: why are we falling behind like this in the first > > place? And: If we don't catch up soon, why should we be able to catch > > up later on? Falling behind on freezing is still a huge problem with > > 64-bit XIDs. > > Is the example I provided above wrong? > > """ > Consider the case when you run a slow OLAP query that takes 12h to > complete and 100K TPS of fast OLTP-type queries on the same system. > The fast queries will consume all 32-bit XIDs in less than 12 hours, > while the OLAP query started 12 hours ago didn't finish yet and thus > its tuples can't be frozen. > """ > > If it is, please let me know. I would very much like to know if my > understanding here is flawed. > So, you have described a scenario we cannot support today (because xids would be exhausted within 5.5 hours at that transactional rate). Additionally as PostgreSQL becomes more capable, this sort of scale will increasingly be within reach and that is an important point in favor of this effort. This being said, there is another set of xid wraparound cases which today is much larger in number that I think would be hurt if this patchset were to be accepted into Postgres without mitigating measures which you consider out of bounds -- the cases like Mailchimp, Adjust, and the like. This is why I keep stressing this, and I don't think waiving away concerns about use cases outside of the one you are focusing on is helpful, particularly from those of us who have faced xid wraparounds in these cases in the past. In these cases, database teams are usually faced with an operational emergency while tools like vacuum, pg_repack, etc are severely degraded due to getting so far behind on freezing. The deeper the hole, the harder it will be to dig out of. Every large-scale high-throughput database I have ever worked on had long-running query alerts precisely because of the impact on vacuum and the downstream performance impacts. I would love to get to a point where this wasn't necessary and maybe in a few specific workloads we might be there very soon. The effort you are engaging in here is an important part of the path to get there, but let's not forget the people who today are facing xid wraparounds due to vacuum problems and what this sort of set of changes will mean for them. -- > Best regards, > Aleksander Alekseev > > >
Re: [PATCH] Allow specification of custom slot for custom nodes
2022年11月22日(火) 5:50 Alexander Korotkov : > > On Mon, Nov 21, 2022 at 4:34 PM Pavel Borisov wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation:not tested > > > > I've looked at this patch and don't see any problems with it. It is > > minimally invasive, it doesn't affect functionality unless anyone (e.g. > > extension) sets its own slotOps in CustomScanState. > > Furthermore, the current patch very slightly modifies patch 0b03e5951bf0 > > with the intention of introducing extensibility. So I think adding more > > extensibility regarding different tuple formats is an excellent thing to do. > > > > I'm going to mark it as RfC if there are no objections. > > Thank you for your feedback. I also don't see how this patch could > affect anybody. > I'm going to push this if there are no objections. I see this was pushed (cee1209514) so have closed it in the CF app. Thanks Ian Barwick
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Sat, Nov 26, 2022 at 10:59 AM Dilip Kumar wrote: > > On Fri, Nov 25, 2022 at 4:04 PM Ashutosh Bapat > wrote: > > > > Excellent catch. We were looking at this code last week and wondered > > the purpose of this abort. Probably we should have some macro or > > function to decided whether to skip a transaction based on log record. > > That will avoid using different values in different places. > > We do have a common function i.e. SnapBuildXactNeedsSkip() but there > are two problems 1) it has a dependency on the input parameter so the > result may vary based on the input 2) this is only checked based on > the LSN but there are other factors dbid and originid based on those > also transaction could be skipped during DecodeCommit. So I think one > possible solution could be to remember a dbid and originid in > ReorderBufferTXN as soon as we get the first change which has valid > values for these parameters. > But is the required information say 'dbid' available in all records, for example, what about XLOG_XACT_INVALIDATIONS? The other thing to consider in this regard is if we are planning to have additional information as mentioned by me in another to decide whether to stream or not then the additional checks may be redundant anyway. It is a good idea to have a common check at both places but if not, we can at least add some comments to say why the check is different. -- With Regards, Amit Kapila.
Re: postgres_fdw: batch inserts vs. before row triggers
On Sat, Nov 26, 2022 at 1:57 AM Tom Lane wrote: > I don't think the committed patch is acceptable at all, at least > not in the back branches, because it creates a severe ABI break. > Specifically, by adding a field to ResultRelInfo you have changed > the array stride of es_result_relations, and that will break any > previously-compiled extension code that accesses that array. Ugh. > I'm not terribly pleased with it having added a field to EState > either. That seems much more global than what we need here. The field stores pending buffered inserts, and I added it to Estate so that it can be shared across primary/secondary ModifyTable nodes. (Re-)consider this: create extension postgres_fdw; create server loopback foreign data wrapper postgres_fdw options (dbname 'postgres'); create user mapping for current_user server loopback; create table t1 (a text, b int); create foreign table ft1 (a text, b int) server loopback options (table_name 't1'); create table w1 (a text, b int); create function ft1_rowcount_trigf() returns trigger language plpgsql as $$ begin raise notice '%: there are % rows in ft1', tg_name, (select count(*) from ft1); return new; end; $$; create trigger ft1_rowcount_trigger before insert on w1 for each row execute function ft1_rowcount_trigf(); alter server loopback options (add batch_size '10'); with t as (insert into w1 values ('foo', 10), ('bar', 20) returning *) insert into ft1 select * from t; NOTICE: ft1_rowcount_trigger: there are 0 rows in ft1 NOTICE: ft1_rowcount_trigger: there are 1 rows in ft1 INSERT 0 2 For this query, the primary ModifyTable node doing batch insert is executed concurrently with the secondary ModifyTable node doing the modifying CTE, and in the secondary ModifyTable node, any pending buffered insert done in the primary ModifyTable node needs to be flushed before firing the BEFORE ROW trigger, so the row is visible to the trigger. The field is useful for cases like this. > Couldn't we add the field to ModifyTableState, instead? We could probably do so, but I thought having a global list would be more efficient to handle pending buffered inserts than that. Anyway I will work on this further. Thanks for looking at this! Best regards, Etsuro Fujita
Re: Avoid distributing new catalog snapshot again for the transaction being decoded.
On Fri, Nov 25, 2022 at 5:30 PM Ashutosh Bapat wrote: > > Hi Hou, > Thanks for the patch. With a simple condition, we can eliminate the > need to queueing snapshot change in the current transaction and then > applying it. Saves some memory and computation. This looks useful. > > When the queue snapshot change is processed in > ReorderBufferProcessTXN(), we call SetupHistoricSnapshot(). But I > didn't find a path through which SetupHistoricSnapshot() is called > from SnapBuildCommitTxn(). > It will be called after SnapBuildCommitTxn() via ReorderBufferCommit()->ReorderBufferReplay()->ReorderBufferProcessTXN(). But, I think what I don't see is how the snapshot we build in SnapBuildCommitTxn() will be assigned as a historic snapshot. We assign base_snapshot as a historic snapshot and the new snapshot we build in SnapBuildCommitTxn() is assigned as base_snapshot only if the same is not set previously. I might be missing something but if that is true then I don't think the patch is correct, OTOH I would expect some existing tests to fail if this change is incorrect. -- With Regards, Amit Kapila.
Re: Small miscellaneous fixes
Em sex., 25 de nov. de 2022 às 22:21, Michael Paquier escreveu: > On Fri, Nov 25, 2022 at 01:15:40PM +0100, Peter Eisentraut wrote: > > Is this something you want to follow up on, since you were involved in > that > > patch? Is the redundant assignment simply to be deleted, or do you want > to > > check the original patch again for context? > > Most of the changes of this thread have been applied as of c42cd05c. > Remains the SIGALRM business with sig_atomic_t, and I wanted to check > that by myself first. > Thank you Michael, for taking care of it. regards, Ranier Vilela
Re: MSVC vs Perl
On 2022-11-25 Fr 18:52, Andrew Dunstan wrote: > On 2022-11-25 Fr 18:48, Andrew Dunstan wrote: >> For various reasons (see below) it's preferable to build on Windows with >> Strawberry Perl. This works OK if you're building with Msys2, I upgraded >> the instance on the machine that runs fairywren and drongo today, and >> fairywren seems fine. Not so drongo, however. First it encountered a >> build error, which I attempted to cure using something I found in the >> vim sources [1]. That worked, but then there's a failure at runtime like >> this: >> >> 2022-11-25 21:33:01.073 UTC [3764:3] pg_regress LOG: statement: CREATE >> EXTENSION IF NOT EXISTS "plperl" >> src/pl/plperl/Util.c: loadable library and perl binaries are mismatched (got >> handshake key 12800080, needed 12900080) >> 2022-11-25 21:33:01.100 UTC [5048:5] LOG: server process (PID 3764) exited >> with exit code 1 >> >> I don't know how to debug this. I'm not sure if there's some flag we >> could set which would cure it. It looks like a 1 bit difference, but I >> haven't found what that bit corresponds to. OK, so this cures the problem for drongo: diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index 83a3e40425..dc6b94b74f 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -707,6 +707,7 @@ sub mkvcbuild print "CFLAGS recommended by Perl: $Config{ccflags}\n"; print "CFLAGS to compile embedded Perl: ", (join ' ', map { "-D$_" } @perl_embed_ccflags), "\n"; + push @perl_embed_ccflags,'NO_THREAD_SAFE_LOCALE'; foreach my $f (@perl_embed_ccflags) { $plperl->AddDefine($f); I'll see if it also works for bowerbird. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: postgres_fdw: batch inserts vs. before row triggers
Etsuro Fujita writes: > On Sat, Nov 26, 2022 at 1:57 AM Tom Lane wrote: >> Couldn't we add the field to ModifyTableState, instead? > We could probably do so, but I thought having a global list would be > more efficient to handle pending buffered inserts than that. OK, as long as there's a reason for doing it that way, it's OK by me. I don't think that adding a field at the end of EState is an ABI problem. We have to do something else than add to ResultRelInfo, though. regards, tom lane
Re: CI and test improvements
[ resending to -hackers because of list issues ] On 2022-05-28 Sa 11:37, Justin Pryzby wrote: > I'm "joining" a bunch of unresolved threads hoping to present them better > since > they're related and I'm maintaining them together anyway. > > https://www.postgresql.org/message-id/flat/20220219234148.GC9008%40telsasoft.com > - set TESTDIR from perl rather than Makefile I looked at the latest set here, patch 1 still doesn't look right, I think vc_regress.pl should be setting PG_SUBDIR like the Makefile.global does. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
On Fri, 2022-11-25 at 14:47 -0800, Peter Geoghegan wrote: > Attached WIP patch invents the idea of a regular autovacuum that is > tasked with advancing relfrozenxid -- which is really just another > trigger criteria, reported on in the server log in its autovacuum > reports. Of course we retain the idea of antiwraparound autovacuums. > The only difference is that they are triggered when table age has > advanced by twice the usual amount, which is presumably only possible > because a regular autovacuum couldn't start or couldn't complete in > time (most likely due to continually being auto-cancelled). > > As I said before, I think that the most important thing is to give > regular autovacuuming a chance to succeed. The exact approach taken > has a relatively large amount of slack, but that probably isn't > needed. So the new antiwraparound cutoffs were chosen because they're > easy to understand and remember, which is fairly arbitrary. The target is a table that receives no DML at all, right? I think that is a good idea. Wouldn't it make sense to trigger that at *half* "autovacuum_freeze_max_age"? Yours, Laurenz Albe
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
On Sat, Nov 26, 2022 at 9:58 AM Laurenz Albe wrote: > The target is a table that receives no DML at all, right? Sort of, but not really. The target is a table that doesn't get vacuumed for any other reason -- I don't make any claims beyond that one. It seems a little too optimistic to suppose that such a table really didn't need any vacuuming to deal with bloat just because autovacuum.c didn't seem to think that it did. > I think that is a good idea. > Wouldn't it make sense to trigger that at *half* "autovacuum_freeze_max_age"? That would be equivalent to what I've done here, provided you also double the autovacuum_freeze_max_age setting. I did it this way because I believe that it has fewer problems. The approach I took makes the general perception that antiwraparound autovacuum are a scary thing (really just needed for emergencies) become true, for the first time. We should expect to see very few antiwraparound autovacuums with the patch, but when we do see even one it'll be after a less aggressive approach was given the opportunity to succeed, but (for whatever reason) failed. Just seeing any antiwraparound autovacuums will become a useful signal of something being amiss in a way that it just isn't at the moment. The docs can be updated to talk about antiwraparound autovacuum as a thing that you should encounter approximately never. This is possible even though the patch isn't invasive at all. -- Peter Geoghegan
Re: TAP output format in pg_regress
В письме от пятница, 25 ноября 2022 г. 00:20:01 MSK пользователь Daniel Gustafsson написал: + /* + * The width of the testname field when printing to ensure vertical alignment + * of test runtimes. Thius number is somewhat arbitrarily chosen to match the + * older pre-TAP output format. + */ "Thius" seems to be a typo :-) - + #define bail_noatexit(...) bail_out(true, __VA_ARGS__) BTW what does "noat" stands for? I thought it is typo too :-) and originally meant to be "not". - - snprintf(buf, sizeof(buf), -_(" All %d tests passed. "), -success_count); - else if (fail_count == 0) /* fail_count=0, fail_ignore_count>0 */ - snprintf(buf, sizeof(buf), -_(" %d of %d tests passed, %d failed test(s) ignored. "), -success_count, -success_count + fail_ignore_count, -fail_ignore_count); - else if (fail_ignore_count == 0)/* fail_count>0 && fail_ignore_count=0 */ - snprintf(buf, sizeof(buf), -_(" %d of %d tests failed. "), -fail_count, -success_count + fail_count); + note(_("All %d tests passed.\n"), success_count); + /* fail_count=0, fail_ignore_count>0 */ + else if (fail_count == 0) + note(_("%d of %d tests passed, %d failed test(s) ignored.\n"), +success_count, +success_count + fail_ignore_count, +fail_ignore_count); + /* fail_count>0 && fail_ignore_count=0 */ + else if (fail_ignore_count == 0) + diag(_("%d of %d tests failed.\n"), +fail_count, +success_count + fail_count); + /* fail_count>0 && fail_ignore_count>0 */ Just out of overaccuracy: Logic here have not changed. Can we keep ifs, elses and may be indent offsets of lines that did not change as they were to have nicer diff? Would make understanding this changeset more easy... Or this is work of pg_indent that spoils it? While looking at the my output I am getting wrong offset for sanity_check: ok 84 hash_func 121 ms ok 85 errors 68 ms ok 86 infinite_recurse 233 ms ok 87sanity_check 144 ms # parallel group (20 tests): select_into delete random select_having select_distinct_on namespace select_implicit case prepared_xacts subselect transactions portals select_distinct union arrays update hash_index join aggregates btree_index ok 88 select_into134 ms ok 89 select_distinct812 ms (also for select_parallel write_parallel vacuum_parallel and fast_default) I guess the intention was to align them too... As for the rest: I see no other problems in the code, and consider it should be passed to commiter (or may be more experienced reviewer) > > On 24 Nov 2022, at 20:32, Andres Freund wrote: > > > > On November 24, 2022 11:07:43 AM PST, Daniel Gustafsson wrote: > >>> On 24 Nov 2022, at 18:07, Nikolay Shaplov wrote: > >> One option could be to redefine bail() to take the exit function as a > >> parameter and have the caller pass the preferred exit handler. > >> > >> -bail_out(bool non_rec, const char *fmt,...) > >> +bail(void (*exit_func)(int), const char *fmt,...) > >> > >> The callsites would then look like the below, which puts a reference to > >> the > >> actual exit handler used in the code where it is called. > > > > I'd just rename _bail to bail_noatexit(). > > That's probably the best option, done in the attached along with the comment > fixup to mention the recursion issue. > > >>> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it > >>> even two times. I understand problems with spaces... But may be it > >>> would be better somehow narrow it to one ugly print... Print "ok %-5i > >>> "|"not ok %-5i" to buffer first, and then have one "%s%-*s %8.0f > >>> ms%s\n" print or something like that... > >> > >> I'm not convinced that this printf format is that hard to read (which may > >> well be attributed to Stockholm Syndrome), and I do think that breaking > >> it up and adding more code to print the line will make it less readable > >> instead.> > > I don't think it's terrible either. I do think it'd also be ok to switch > > between ok / not ok within a single printf, making it easier to keep them > > in sync. > I made it into a single printf to see what it would look like, with some > additional comments to make it more readable (I'm not a fan of where > pgindent moves those but..). > > -- > Daniel Gustafsson https://vmware.com/ -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: Th
Re: User functions for building SCRAM secrets
On 11/16/22 10:09 PM, Michael Paquier wrote: On Thu, Nov 10, 2022 at 11:14:34PM -0500, Jonathan S. Katz wrote: On 10/31/22 8:56 PM, Michael Paquier wrote: Well, one could pass a salt based on something generated by random() to emulate what we currently do in the default case, as well. The salt length is an extra possibility, letting it be randomly generated by pg_strong_random(). Sure, this is a good point. From a SQL level we can get that from pgcrypto "gen_random_bytes". Could it be something we could just push into core? FWIW, I've used that quite a bit in the last to cheaply build long random strings of data for other things. Without pgcrypto, random() with generate_series() has always been kind of.. fun. I would be a +1 for moving that into core, given we did something similar with gen_random_uuid[1]. Separate patch, of course :) +SELECT scram_build_secret_sha256(NULL); +ERROR: password must not be null +SELECT scram_build_secret_sha256(NULL, NULL); +ERROR: password must not be null +SELECT scram_build_secret_sha256(NULL, NULL, NULL); +ERROR: password must not be null This is just testing three times the same thing as per the defaults. I would cut the second and third cases. AFAICT it's not returning the defaults. Quick other example: CREATE FUNCTION ab (a int DEFAULT 0) RETURNS int AS $$ SELECT a; $$ LANGUAGE SQL; SELECT ab(); ab 0 (1 row) SELECT ab(NULL::int); ab (1 row) Given scram_build_secret_sha256 is not a strict function, I'd prefer to test all of the NULL cases in case anything in the underlying code changes in the future. It's a cheap cost to be a bit more careful. git diff --check reports some whitespaces. Ack. Will fix on the next pass. (I've been transitioning editors, which could have resulted in that), scram_build_secret_sha256_internal() is missing SASLprep on the password string. Perhaps the best thing to do here is just to extend pg_be_scram_build_secret() with more arguments so as callers can optionally pass down a custom salt with its length, leaving the responsibility to pg_be_scram_build_secret() to create a random salt if nothing has been given? Ah, good catch! I think if we go with passing down the salt, we'd also have to allow for the passing down of the iterations, too, and we're close to rebuilding "scram_build_secret". I'll stare a bit at this on the next pass and either 1/ just SASLprep the string in the new "scram_build_secret_sha256_internal" func, or 2/ change the definition of "pg_be_scram_build_secret" to accommodate more overrides. Thanks, Jonathan [1] https://www.postgresql.org/docs/current/functions-uuid.html OpenPGP_signature Description: OpenPGP digital signature
pg_regress: Treat child process failure as test failure
In the thread about TAP format out in pg_regress, Andres pointed out [0] that we allow a test to pass even if the test child process failed. While its probably pretty rare to have a test pass if the process failed, this brings a risk for false positives (and it seems questionable that any regress test will have a child process failing as part of its intended run). The attached makes child failures an error condition for the test as a belts and suspenders type check. Thoughts? -- Daniel Gustafsson https://vmware.com/ [0] https://postgr.es/m/20221122235636.4frx7hjterq6b...@awork3.anarazel.de v1-0001-Consider-a-failed-test-process-as-a-failed-test.patch Description: Binary data
Re: TAP output format in pg_regress
> On 26 Nov 2022, at 20:37, Nikolay Shaplov wrote: > В письме от пятница, 25 ноября 2022 г. 00:20:01 MSK пользователь Daniel > Gustafsson написал: > "Thius" seems to be a typo :-) Correct, fixed in the attached. > + #define bail_noatexit(...) bail_out(true, __VA_ARGS__) > > BTW what does "noat" stands for? I thought it is typo too :-) and originally > meant to be "not". Calling _exit() will cause exit handler functions registered with atexit() to not be invoked, no noatexit was intentional spelling. > Just out of overaccuracy: Logic here have not changed. Can we keep ifs, elses > and may be indent offsets of lines that did not change as they were to have > nicer diff? Would make understanding this changeset more easy... Or this is > work of pg_indent that spoils it? The diff algorithm decided that this was the compact way of displaying the unified diff, probably because too many lines in proximity changed. While avoiding moving the comments to before the line might mitigate that somewhat I prefer this greatly to a slightly easier to read diff. > While looking at the my output I am getting wrong offset for > sanity_check: > > ok 84 hash_func 121 ms > ok 85 errors 68 ms > ok 86 infinite_recurse 233 ms > ok 87 sanity_check 144 ms > # parallel group (20 tests): select_into delete random select_having > select_distinct_on namespace select_implicit case prepared_xacts subselect > transactions portals select_distinct union arrays update hash_index join > aggregates btree_index > ok 88 select_into 134 ms > ok 89 select_distinct 812 ms > > (also for select_parallel write_parallel vacuum_parallel and fast_default) > > I guess the intention was to align them too... No, they are aligned in such a way because they are running outside of a parallel group. Note that it's not part of the "parallel group" note preceeding the tests: # parallel group (6 tests): collate.linux.utf8 amutils psql_crosstab psql rules stats_ext ok 146 rules 507 ms ok 147 psql 448 ms ok 148 psql_crosstab 47 ms ok 149 amutils 39 ms ok 150 stats_ext 2578 ms ok 151 collate.linux.utf8 27 ms ok 152 select_parallel 668 ms ok 153 write_parallel 84 ms ok 154 vacuum_parallel90 ms In the previous format it's a bit clearer, and maybe we should adopt that for TAP as well? parallel group (6 tests): collate.linux.utf8 amutils psql_crosstab psql rules stats_ext rules... ok 488 ms psql ... ok 430 ms psql_crosstab... ok 47 ms amutils ... ok 38 ms stats_ext... ok 2301 ms collate.linux.utf8 ... ok 24 ms test select_parallel ... ok 641 ms test write_parallel ... ok 83 ms test vacuum_parallel ... ok 87 ms That would if so make the output something like the below. Personally I think the "test" prefix adds little value since everything printed are test suites, and we are already today using indentation for grouping parallel tests. # parallel group (6 tests): collate.linux.utf8 amutils psql_crosstab psql rules stats_ext ok 146 rules 507 ms ok 147 psql 448 ms ok 148 psql_crosstab 47 ms ok 149 amutils 39 ms ok 150 stats_ext 2578 ms ok 151 collate.linux.utf8 27 ms ok 152 test select_parallel 668 ms ok 153 test write_parallel84 ms ok 154 test vacuum_parallel 90 ms -- Daniel Gustafsson https://vmware.com/ v13-0001-Change-pg_regress-output-format-to-be-TAP-compli.patch Description: Binary data
Re: pg_regress: Treat child process failure as test failure
Hi, On 2022-11-26 21:11:39 +0100, Daniel Gustafsson wrote: > In the thread about TAP format out in pg_regress, Andres pointed out [0] that > we allow a test to pass even if the test child process failed. While its > probably pretty rare to have a test pass if the process failed, this brings a > risk for false positives (and it seems questionable that any regress test will > have a child process failing as part of its intended run). > The attached makes child failures an error condition for the test as a belts > and suspenders type check. Thoughts? I wonder if it's the right thing to treat a failed psql that's then also ignored as "failed (ignored)". Perhaps it'd be better to move the statuses[i] != 0 check to before the if (differ)? > - if (differ) > + if (differ || statuses[i] != 0) > { > boolignore = false; > _stringlist *sl; > @@ -1815,7 +1815,7 @@ run_single_test(const char *test, test_start_function > startfunc, > differ |= newdiff; > } > > - if (differ) > + if (differ || exit_status != 0) > { > status(_("FAILED")); > fail_count++; It certainly is a bit confusing that we print a psql failure separately from the if "FAILED" vs "ok" bit. Greetings, Andres Freund
Re: MSVC vs Perl
Hi, On 2022-11-26 09:43:19 -0500, Andrew Dunstan wrote: > OK, so this cures the problem for drongo: > > > diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm > index 83a3e40425..dc6b94b74f 100644 > --- a/src/tools/msvc/Mkvcbuild.pm > +++ b/src/tools/msvc/Mkvcbuild.pm > @@ -707,6 +707,7 @@ sub mkvcbuild > print "CFLAGS recommended by Perl: $Config{ccflags}\n"; > print "CFLAGS to compile embedded Perl: ", > (join ' ', map { "-D$_" } @perl_embed_ccflags), "\n"; > + push @perl_embed_ccflags,'NO_THREAD_SAFE_LOCALE'; > foreach my $f (@perl_embed_ccflags) > { > $plperl->AddDefine($f); This likely is just a test patch, in case it is not, it seems we should add NO_THREAD_SAFE_LOCALE to @perl_embed_ccflags before printing it. Do we need a "configure" check for this? I guess it's ok to define this whenever building with msvc - I don't currently see a scenario where it could hurt. We already define flags unconditionally, c.f. PLPERL_HAVE_UID_GID. Given how fragile the embedding is (we've had several prior iterations of problems around this), I think it'd be good to test that the current flags avoid the "got handshake key" at configure time, rather than having to debug runtime failures. As noted by Noah in [1], the Mkvcbuild.pm actually has code to do so - but only does for 32bit builds. I don't think it's worth generalizing this for src/tools/msvc at this point, but it might be worth copying the test to meson and running the binary (except when cross building, of course). Greetings, Andres Freund [1] https://postgr.es/m/20220130231432.GA2658915%40rfd.leadboat.com
Re: Decouple last important WAL record LSN from WAL insert locks
Hi, On 2022-11-23 19:12:07 +0530, Bharath Rupireddy wrote: > While working on something else, I noticed that each WAL insert lock > tracks its own last important WAL record's LSN (lastImportantAt) and > both the bgwriter and checkpointer later computes the max > value/server-wide last important WAL record's LSN via > GetLastImportantRecPtr(). While doing so, each WAL insertion lock is > acquired in exclusive mode in a for loop. This seems like too much > overhead to me. GetLastImportantRecPtr() should be a very rare operation, so it's fine for it to be expensive. The important thing is for the maintenance of the underlying data to be very cheap. > I quickly coded a patch (attached herewith) that > tracks the server-wide last important WAL record's LSN in > XLogCtlInsert (lastImportantPos) protected with a spinlock and gets > rid of lastImportantAt from each WAL insert lock. That strikes me as a very bad idea. It adds another point of contention to a very very hot code path, to make a very rare code path cheaper. Greetings, Andres Freund
Re: Allow processes to reset procArrayGroupNext themselves instead of leader resetting for all the followers
Hi, On 2022-11-24 10:43:46 +0530, Bharath Rupireddy wrote: > While working on something else, I noticed that the proc array group > XID clearing leader resets procArrayGroupNext of all the followers > atomically along with procArrayGroupMember. ISTM that it's enough for > the followers to exit the wait loop and continue if the leader resets > just procArrayGroupMember, the followers can reset procArrayGroupNext > by themselves. This relieves the leader a bit, especially when there > are many followers, as it avoids a bunch of atomic writes and > pg_write_barrier() for the leader . I doubt this is a useful change - the leader already has to modify the relevant cacheline (for procArrayGroupMember). That makes it pretty much free to modify another field in the same cacheline. Greetings, Andres Freund
Re: MSVC vs Perl
On 2022-11-26 Sa 16:05, Andres Freund wrote: > Hi, > > On 2022-11-26 09:43:19 -0500, Andrew Dunstan wrote: >> OK, so this cures the problem for drongo: >> >> >> diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm >> index 83a3e40425..dc6b94b74f 100644 >> --- a/src/tools/msvc/Mkvcbuild.pm >> +++ b/src/tools/msvc/Mkvcbuild.pm >> @@ -707,6 +707,7 @@ sub mkvcbuild >> print "CFLAGS recommended by Perl: $Config{ccflags}\n"; >> print "CFLAGS to compile embedded Perl: ", >> (join ' ', map { "-D$_" } @perl_embed_ccflags), "\n"; >> + push @perl_embed_ccflags,'NO_THREAD_SAFE_LOCALE'; >> foreach my $f (@perl_embed_ccflags) >> { >> $plperl->AddDefine($f); > This likely is just a test patch, in case it is not, it seems we should add > NO_THREAD_SAFE_LOCALE to @perl_embed_ccflags before printing it. Sure > Do we need a "configure" check for this? I guess it's ok to define this > whenever building with msvc - I don't currently see a scenario where it could > hurt. We already define flags unconditionally, c.f. PLPERL_HAVE_UID_GID. > > Given how fragile the embedding is (we've had several prior iterations of > problems around this), I think it'd be good to test that the current flags > avoid the "got handshake key" at configure time, rather than having to debug > runtime failures. > > As noted by Noah in [1], the Mkvcbuild.pm actually has code to do so - but > only does for 32bit builds. > > I don't think it's worth generalizing this for src/tools/msvc at this point, > but it might be worth copying the test to meson and running the binary (except > when cross building, of course). Yeah, given that we are planning on ditching this build system as soon as we can I'm not inclined to do anything very heroic. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_regress: Treat child process failure as test failure
> On 26 Nov 2022, at 21:55, Andres Freund wrote: > On 2022-11-26 21:11:39 +0100, Daniel Gustafsson wrote: >> The attached makes child failures an error condition for the test as a belts >> and suspenders type check. Thoughts? > > I wonder if it's the right thing to treat a failed psql that's then also > ignored as "failed (ignored)". Perhaps it'd be better to move the statuses[i] > != 0 check to before the if (differ)? I was thinking about that too, but I think you're right. The "ignore" part is about the test content and not the test run structure. > It certainly is a bit confusing that we print a psql failure separately from > the if "FAILED" vs "ok" bit. I've moved the statuses[i] check before the differ check, such that there is a separate block for this not mixed up with the differs check and printing. It does duplicate things a little bit but also makes it a lot clearer. -- Daniel Gustafsson https://vmware.com/ v2-0001-Consider-a-failed-test-process-as-a-failed-test.patch Description: Binary data
CF 2022-11: entries "Waiting for Committer" but no recent activity
Hi Apologies for the intermission in activity, among other "fun" things had a family visitation of the influenza virus, which is becoming fashionable in these parts again. In an attempt to salvage something from the situation I am having a crack at the older entries marked "Ready for Committer", some of which probably need some sort of action, but it's not always clear (to me at least) what. If there's some sort of consensus for individual items, per previous practice I'll update the individual threads. Date in parenthesis is that of the most recent message on the thread. pg_stat_statements: Track statement entry timestamp (2022-04-08) - https://commitfest.postgresql.org/40/3048/ - https://www.postgresql.org/message-id/flat/72e80e7b160a6eb189df9ef6f068cce3765d37f8.ca...@moonset.ru Patch is applying but no recent activity. psql - refactor echo code (2022-07-24) -- - https://commitfest.postgresql.org/40/3140/ - https://www.postgresql.org/message-id/flat/alpine.DEB.2.22.394.2105301104400.3020016@pseudo Seems like a small patch which can be applied easily. Add Amcheck option for checking unique constraints in btree indexes (2022-09-28) - https://commitfest.postgresql.org/40/3464/ - https://www.postgresql.org/message-id/flat/calt9zehrn5xam5boga0qnrcmpv52bscek2qnq1hmuzdd301...@mail.gmail.com Seems to be consensus it is actually RfC, but needs a committer to show interest. meson PGXS compatibility (2022-10-13) - - https://commitfest.postgresql.org/40/3932/ - https://www.postgresql.org/message-id/flat/20221005200710.luvw5evhwf6cl...@awork3.anarazel.de Seems to be RfC. pg_receivewal fail to streams when the partial file to write [...] (2022-10-13) --- - https://commitfest.postgresql.org/40/3503/ - https://www.postgresql.org/message-id/flat/cahg+qdcvuss6ocomblbv5f4yeglhoct+1x2ylnfg2h_edwu...@mail.gmail.com The author of the latest patch (not the original patch author) indicates this needs further review; should the status be changed? Setting it back to "Needs review" seems the obvious thing to do, but it feels like it would put it back in the pool of possibly unreviewe entries (maybe we need a "Needs futher review" Update relfrozenxmin when truncating temp tables (2022-11-05) - https://commitfest.postgresql.org/40/3358/ - https://www.postgresql.org/message-id/flat/cam-w4hnnbdexixrj0b+_-wvp5nz6of0rlueqfuzfyqblcbe...@mail.gmail.com I'll set this one to "Waiting on Author" based on Tom's latest feedback. Faster pglz compression (2021-11-17) - https://commitfest.postgresql.org/40/2897/ - https://www.postgresql.org/message-id/flat/fef3dc5e-4bc4-44e1-8deb-dadc67046...@yandex-team.ru This one, prior to my reminder, has a committer promising to commit but was inactive for over a year. Regards Ian Barwick
Re: CF 2022-11: entries "Waiting for Committer" but no recent activity
On 11/27/22 01:43, Ian Lawrence Barwick wrote: > ... > Faster pglz compression (2021-11-17) > > > - https://commitfest.postgresql.org/40/2897/ > - > https://www.postgresql.org/message-id/flat/fef3dc5e-4bc4-44e1-8deb-dadc67046...@yandex-team.ru > > This one, prior to my reminder, has a committer promising to commit but was > inactive for over a year. > Ugh, I see that slacker is me, so I'll get this committed (unless someone else wants to). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Removing another gen_node_support.pl special case
I got confused about how we were managing EquivalenceClass pointers in the copy/equal infrastructure, and it took me awhile to remember that the reason it works is that gen_node_support.pl has hard-wired knowledge about that. I think that's something we'd be best off dropping in favor of explicit annotations on affected fields. Hence, I propose the attached. This results in zero change in the generated copy/equal code. regards, tom lane diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl index b6f086e262..9f7b4b833f 100644 --- a/src/backend/nodes/gen_node_support.pl +++ b/src/backend/nodes/gen_node_support.pl @@ -166,9 +166,6 @@ my @custom_read_write; # Track node types with manually assigned NodeTag numbers. my %manual_nodetag_number; -# EquivalenceClasses are never moved, so just shallow-copy the pointer -push @scalar_types, qw(EquivalenceClass* EquivalenceMember*); - # This is a struct, so we can copy it by assignment. Equal support is # currently not required. push @scalar_types, qw(QualCost); @@ -458,9 +455,14 @@ foreach my $infile (@ARGV) && $attr !~ /^copy_as\(\w+\)$/ && $attr !~ /^read_as\(\w+\)$/ && !elem $attr, -qw(equal_ignore equal_ignore_if_zero read_write_ignore -write_only_relids write_only_nondefault_pathtarget write_only_req_outer) - ) +qw(copy_as_scalar +equal_as_scalar +equal_ignore +equal_ignore_if_zero +read_write_ignore +write_only_relids +write_only_nondefault_pathtarget +write_only_req_outer)) { die "$infile:$lineno: unrecognized attribute \"$attr\"\n"; @@ -692,6 +694,8 @@ _equal${n}(const $n *a, const $n *b) # extract per-field attributes my $array_size_field; my $copy_as_field; + my $copy_as_scalar = 0; + my $equal_as_scalar = 0; foreach my $a (@a) { if ($a =~ /^array_size\(([\w.]+)\)$/) @@ -702,19 +706,41 @@ _equal${n}(const $n *a, const $n *b) { $copy_as_field = $1; } + elsif ($a eq 'copy_as_scalar') + { +$copy_as_scalar = 1; + } + elsif ($a eq 'equal_as_scalar') + { +$equal_as_scalar = 1; + } elsif ($a eq 'equal_ignore') { $equal_ignore = 1; } } - # override type-specific copy method if copy_as is specified + # override type-specific copy method if requested if (defined $copy_as_field) { print $cff "\tnewnode->$f = $copy_as_field;\n" unless $copy_ignore; $copy_ignore = 1; } + elsif ($copy_as_scalar) + { + print $cff "\tCOPY_SCALAR_FIELD($f);\n" + unless $copy_ignore; + $copy_ignore = 1; + } + + # override type-specific equal method if requested + if ($equal_as_scalar) + { + print $eff "\tCOMPARE_SCALAR_FIELD($f);\n" + unless $equal_ignore; + $equal_ignore = 1; + } # select instructions by field type if ($t eq 'char*') diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index a80f43e540..d98f2c91d9 100644 --- a/src/include/nodes/nodes.h +++ b/src/include/nodes/nodes.h @@ -86,6 +86,12 @@ typedef enum NodeTag * * - copy_as(VALUE): In copyObject(), replace the field's value with VALUE. * + * - copy_as_scalar: In copyObject(), copy the field's pointer value + * even if it is a node-type pointer. + * + * - equal_as_scalar: In equal(), compare the field by pointer equality + * even if it is a node-type pointer. + * * - equal_ignore: Ignore the field for equality. * * - equal_ignore_if_zero: Ignore the field for equality if it is zero. diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index a544b313d3..885ad42327 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1273,7 +1273,9 @@ typedef struct StatisticExtInfo * * NB: EquivalenceClasses are never copied after creation. Therefore, * copyObject() copies pointers to them as pointers, and equal() compares - * pointers to EquivalenceClasses via pointer equality. + * pointers to EquivalenceClasses via pointer equality. This is implemented + * by putting copy_as_scalar and equal_as_scalar attributes on fields that + * are pointers to EquivalenceClasses. The same goes for EquivalenceMembers. */ typedef struct EquivalenceClass { @@ -1364,7 +1366,8 @@ typedef struct PathKey NodeTag type; - EquivalenceClass *pk_eclass; /* the value that is ordered */ + /* the value that is ordered */ + EquivalenceClass *pk_eclass pg_node_attr(copy_as_scalar, equal_as_scalar); Oid pk_opfamily; /* btree opfamily defining the ordering */ int pk_strategy; /* sort direction (ASC or DESC) */ bool pk_nulls_first; /* do NULLs come before normal values? */ @@ -2472,7 +2475,7 @@ typedef struct RestrictInfo * Generating EquivalenceClass. This field is NULL unless clause is * potentially redundant. */ - EquivalenceClass *parent_ec pg_node_attr(equal_ignore, read_write_ignore); + EquivalenceClass *paren
Re: Add tracking of backend memory allocated to pg_stat_activity
Code rebased to current master. Updated to incorporate additional recommendations from the the list - add units to variables in view - remove 'backend_' prefix from variables/functions - update documentation - add functional test for allocated_bytes - refactor allocation reporting to reduce number of functions and branches/reduce performance hit - zero allocated bytes after fork to avoid double counting postmaster allocations -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com From 3d772d8620faba4bd4e091d6618c63557fbf6749 Mon Sep 17 00:00:00 2001 From: Reid Thompson Date: Thu, 11 Aug 2022 12:01:25 -0400 Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity This new field displays the current bytes of memory allocated to the backend process. It is updated as memory for the process is malloc'd/free'd. Memory allocated to items on the freelist is included in the displayed value. Dynamic shared memory allocations are included only in the value displayed for the backend that created them, they are not included in the value for backends that are attached to them to avoid double counting. On occasion, orphaned memory segments may be cleaned up on postmaster startup. This may result in decreasing the sum without a prior increment. We limit the floor of backend_mem_allocated to zero. Updated pg_stat_activity documentation for the new column. --- doc/src/sgml/monitoring.sgml| 15 src/backend/catalog/system_views.sql| 1 + src/backend/postmaster/autovacuum.c | 6 ++ src/backend/postmaster/postmaster.c | 13 src/backend/postmaster/syslogger.c | 3 + src/backend/storage/ipc/dsm_impl.c | 81 + src/backend/utils/activity/backend_status.c | 45 src/backend/utils/adt/pgstatfuncs.c | 4 +- src/backend/utils/mmgr/aset.c | 17 + src/backend/utils/mmgr/generation.c | 15 src/backend/utils/mmgr/slab.c | 21 ++ src/include/catalog/pg_proc.dat | 6 +- src/include/utils/backend_status.h | 59 ++- src/test/regress/expected/rules.out | 9 ++- src/test/regress/expected/stats.out | 11 +++ src/test/regress/sql/stats.sql | 3 + 16 files changed, 300 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 5579b8b9e0..ffe7d2566c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -947,6 +947,21 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + allocated_bytes bigint + + + Memory currently allocated to this backend in bytes. This is the balance + of bytes allocated and freed by this backend. Dynamic shared memory + allocations are included only in the value displayed for the backend that + created them, they are not included in the value for backends that are + attached to them to avoid double counting. Use pg_size_pretty + described in to make this value + more easily readable. + + + query text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2d8104b090..9ea8f78c95 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS S.state, S.backend_xid, s.backend_xmin, +S.allocated_bytes, S.query_id, S.query, S.backend_type diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 601834d4b4..f54606104d 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -407,6 +407,9 @@ StartAutoVacLauncher(void) #ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild(); @@ -1485,6 +1488,9 @@ StartAutoVacWorker(void) #ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild(); diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a8a246921f..89a6caec78 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) { free(bn); + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* Detangle from postmaster */ InitPostmasterChild(); @@ -5307,6 +5310,11 @@ StartChildProcess(AuxProcType type) MemoryContextDelete(Postmast
Re: Add tracking of backend memory allocated to pg_stat_activity
On Wed, 2022-11-09 at 09:23 -0500, Reid Thompson wrote: > Hi Melanie, > Thank you for looking at this and for the feedback. Responses inline > below. > > On Mon, 2022-11-07 at 16:17 -0500, Melanie Plageman wrote: > > > > It doesn't seem like you need the backend_ prefix in the view since > > that is implied by it being in pg_stat_activity. > > I will remove the prefix. done > > > For the wording on the description, I find "memory allocated to > > this > > backend" a bit confusing. Perhaps you could reword it to make clear > > you mean that the number represents the balance of allocations by > > this > > backend. Something like: > > > > Memory currently allocated to this backend in bytes. This > > is the > > balance of bytes allocated and freed by this backend. > > I would also link to the system administration function > > pg_size_pretty() so users know how to easily convert the value. > > Thanks, I'll make these changes done > > > +/* > > > + * pgstat_report_backend_allocated_bytes_increase() - > > > + * > > > + * Called to report increase in memory allocated for this > > > backend > > > + * > > > + */ > > > > It seems like you could combine the > > pgstat_report_backend_allocated_bytes_decrease/increase() by either > > using a signed integer to represent the allocation/deallocation or > > passing in > > a "direction" that is just a positive or negative multiplier enum. > > > > Especially if you don't use the write barriers, I think you could > > simplify the logic in the two functions. > > > > If you do combine them, you might shorten the name to > > pgstat_report_backend_allocation() or pgstat_report_allocation(). > > Agreed. This seems a cleaner, simpler way to go. I'll add it to the > TODO list. done > > > > + /* > > > + * Do not allow backend_allocated_bytes to go below zero. > > > ereport if we > > > + * would have. There's no need for a lock around the read > > > here as it's > > > + * being referenced from the same backend which means > > > that > > > there shouldn't > > > + * be concurrent writes. We want to generate an ereport > > > in > > > these cases. > > > + */ > > > + if (deallocation > beentry->backend_allocated_bytes) > > > + { > > > + ereport(LOG, errmsg("decrease reduces reported > > > backend memory allocated below zero; setting reported to 0")); > > > + > > > > I also think it would be nice to include the deallocation amount > > and > > backend_allocated_bytes amount in the log message. > > It also might be nice to start the message with something more > > clear > > than "decrease". > > For example, I would find this clear as a user: > > > > Backend [backend_type or pid] deallocated [deallocation > > number] bytes, > > [backend_allocated_bytes - deallocation number] more than > > this backend > > has reported allocating. > > Sounds good, I'll implement these changes done > > > diff --git a/src/include/utils/backend_status.h > > > b/src/include/utils/backend_status.h > > > index b582b46e9f..75d87e8308 100644 > > > --- a/src/include/utils/backend_status.h > > > +++ b/src/include/utils/backend_status.h > > > @@ -169,6 +169,9 @@ typedef struct PgBackendStatus > > > > > > /* query identifier, optionally computed using > > > post_parse_analyze_hook */ > > > uint64 st_query_id; > > > + > > > + /* Current memory allocated to this backend */ > > > + uint64 backend_allocated_bytes; > > > } PgBackendStatus; > > > > I don't think you need the backend_ prefix here since it is in > > PgBackendStatus. > > Agreed again, I'll remove the prefix. done > > > /* -- > > > * Support functions for the SQL-callable functions to > > > diff --git a/src/test/regress/expected/rules.out > > > b/src/test/regress/expected/rules.out > > > index 624d0e5aae..ba9f494806 100644 > > > --- a/src/test/regress/expected/rules.out > > > +++ b/src/test/regress/expected/rules.out > > > @@ -1753,10 +1753,11 @@ pg_stat_activity| SELECT s.datid, > > > s.state, > > > s.backend_xid, > > > s.backend_xmin, > > > + s.backend_allocated_bytes, > > > s.query_id, > > > s.query, > > > s.backend_type > > > > Seems like it would be possible to add a functional test to > > stats.sql. > > I will look at adding this. done patch attached to https://www.postgresql.org/message-id/06b4922193b80776a31e08a3809f2414b0d4bf90.camel%40crunchydata.com -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com
Re: Add tracking of backend memory allocated to pg_stat_activity
On Wed, 2022-11-09 at 08:54 -0500, Reid Thompson wrote: > Hi Andres, > Thanks for looking at this and for the feedback. Responses inline > below. > >> On Fri, 2022-11-04 at 19:41 -0700, Andres Freund wrote: > > Hi, > > > On 2022-11-04 08:56:13 -0400, Reid Thompson wrote: > > > > I'm not convinced that counting DSM values this way is quite right. > > There are a few uses of DSMs that track shared resources, with the biggest > > likely being the stats for relations etc. I suspect that tracking that via > > backend memory usage will be quite confusing, because fairly random > > backends that > > had to grow the shared state end up being blamed with the memory usage in > > perpituity - and then suddenly that memory usage will vanish when that > > backend exits, > > despite the memory continuing to exist. > > Ok, I'll make an attempt to identify these allocations and manage > them elsewhere. still TBD > > > > > > > @@ -734,6 +747,7 @@ AllocSetAlloc(MemoryContext context, Size > > > size) > > > return NULL; > > > > > > context->mem_allocated += blksize; > > > + pgstat_report_backend_allocated_bytes_increase(bl > > > ksize); > > > > I suspect this will be noticable cost-wise. Even though these paths aren't > > the > > hottest memory allocation paths, by nature of going down into malloc, adding > > an external function call that then does a bunch of branching etc. seems > > likely to add up to some. See below for how I think we can deal with > > that... > > > > This is quite a few branches, including write/read barriers. > > > > It doesn't really make sense to use the > > PGSTAT_BEGIN_WRITE_ACTIVITY() pattern > > here - you're just updating a single value, there's nothing to be gained by > > it. The point of PGSTAT_BEGIN_*ACTIVITY() stuff is to allow readers to get a > > consistent view of multiple values - but there aren't multiple values > > here! > > I'll remove the barriers - initially I copied how prior functions were barriers removed > > > > To avoid the overhead of checking (!beentry || !pgstat_track_activities) and > > the external function call, I think you'd be best off copying the trickery I > > introduced for pgstat_report_wait_start(), in 225a22b19ed. > > > > I.e. make pgstat_report_backend_allocated_bytes_increase() a static inline > > function that unconditionally updates something like > > *my_backend_allocated_memory. To deal with the case of (!beentry || > > !pgstat_track_activities), that variable initially points to some backend > > local state and is set to the shared state in pgstat_bestart(). > > > > This additionally has the nice benefit that you can track memory usage from > > before pgstat_bestart(), it'll be in the local variable. > > OK, I think I can mimic the code you reference. done patch attached to https://www.postgresql.org/message-id/06b4922193b80776a31e08a3809f2414b0d4bf90.camel%40crunchydata.com -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com
Re: Add the ability to limit the amount of memory that can be allocated to backends.
On Thu, 2022-11-03 at 11:48 -0400, Reid Thompson wrote: > On Tue, 2022-10-25 at 11:49 -0400, Reid Thompson wrote: > > Rebased to current. Add a couple changes per conversation with D > Christensen (include units in field name, group field with > backend_xid > and backend_xmin fields in pg_stat_activity view, rather than between > query_id and query) > rebased/patched to current master && current pg-stat-activity-backend-memory-allocated -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com From 1470f45e086bef0757cc262d10e08904e46b9a88 Mon Sep 17 00:00:00 2001 From: Reid Thompson Date: Sat, 4 Jun 2022 22:23:59 -0400 Subject: [PATCH 2/2] Add the ability to limit the amount of memory that can be allocated to backends. This builds on the work that adds backend memory allocated to pg_stat_activity. Add GUC variable max_total_backend_memory. Specifies a limit to the amount of memory (in MB) that may be allocated to backends in total (i.e. this is not a per user or per backend limit). If unset, or set to 0 it is disabled. It is intended as a resource to help avoid the OOM killer on LINUX and manage resources in general. A backend request that would push the total over the limit will be denied with an out of memory error causing that backend's current query/transaction to fail. Due to the dynamic nature of memory allocations, this limit is not exact. If within 1.5MB of the limit and two backends request 1MB each at the same time both may be allocated, and exceed the limit. Further requests will not be allocated until dropping below the limit. Keep this in mind when setting this value. This limit does not affect auxiliary backend processes. Backend memory allocations are displayed in the pg_stat_activity view. --- doc/src/sgml/config.sgml | 26 + src/backend/storage/ipc/dsm_impl.c| 12 ++ src/backend/utils/activity/backend_status.c | 108 ++ src/backend/utils/misc/guc_tables.c | 11 ++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/backend/utils/mmgr/aset.c | 17 +++ src/backend/utils/mmgr/generation.c | 9 ++ src/backend/utils/mmgr/slab.c | 8 ++ src/include/utils/backend_status.h| 3 + 9 files changed, 197 insertions(+) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 24b1624bad..c2db3ace7a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2079,6 +2079,32 @@ include_dir 'conf.d' + + max_total_backend_memory (integer) + + max_total_backend_memory configuration parameter + + + + +Specifies a limit to the amount of memory (MB) that may be allocated to +backends in total (i.e. this is not a per user or per backend limit). +If unset, or set to 0 it is disabled. A backend request that would +push the total over the limit will be denied with an out of memory +error causing that backend's current query/transaction to fail. Due to +the dynamic nature of memory allocations, this limit is not exact. If +within 1.5MB of the limit and two backends request 1MB each at the same +time both may be allocated, and exceed the limit. Further requests will +not be allocated until dropping below the limit. Keep this in mind when +setting this value. This limit does not affect auxiliary backend +processes . Backend memory +allocations (allocated_bytes) are displayed in the +pg_stat_activity +view. + + + + diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index 65d59fc43e..8d9df676af 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -254,6 +254,10 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, return true; } + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + return false; + /* * Create new segment or open an existing one for attach. * @@ -525,6 +529,10 @@ dsm_impl_sysv(dsm_op op, dsm_handle handle, Size request_size, int flags = IPCProtection; size_t segsize; + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + return false; + /* * Allocate the memory BEFORE acquiring the resource, so that we don't * leak the resource if memory allocation fails. @@ -719,6 +727,10 @@ dsm_impl_windows(dsm_op op, dsm_handle handle, Size request_size, return true; } + /* Do not exceed maximum allowed memory allocation */ + if (op == DSM_OP_CREATE && exceeds_max_total_bkend_mem(request_size)) + return false; + /* Create new segment or open an existing one for attach. */ if (op == DSM_OP_CREATE)
RE: Perform streaming logical transactions by background workers and parallel apply
On Friday, November 25, 2022 10:54 AM Peter Smith wrote: > > Here are some review comments for v51-0001. Thanks for the comments! > == > > .../replication/logical/applyparallelworker.c > > 1. General - Error messages, get_worker_name() > > I previously wrote a comment to ask if the get_worker_name() should be used > in more places but the reply [1, #2b] was: > > > 2b. > > Consider if maybe all of these ought to be calling get_worker_name() > > which is currently static in worker.c. Doing this means any future > > changes to get_worker_name won't cause more inconsistencies. > > The most error message in applyparallelxx.c can only use "xx parallel worker", > so I think it's fine not to call get_worker_name > > ~ > > I thought the reply missed the point I was trying to make -- I meant if it was > arranged now so *every* message would go via > get_worker_name() then in future somebody wanted to change the names (e.g. > from "logical replication parallel apply worker" to "LR PA > worker") then it would only need to be changed in one central place instead of > hunting down every hardwired error message. > Thanks for the suggestion. I understand your point, but I feel that using get_worker_name() at some places where the worker type is decided could make developer think that all kind of worker can enter this code which I am not sure is better. So I didn't change this. > > 2. HandleParallelApplyMessage > > + case 'X': /* Terminate, indicating clean exit. */ > + shm_mq_detach(winfo->error_mq_handle); > + winfo->error_mq_handle = NULL; > + break; > + default: > + elog(ERROR, "unrecognized message type received from logical > replication parallel apply worker: %c (message length %d bytes)", > + msgtype, msg->len); > > The case 'X' code indentation is too much. Changed. > == > > src/backend/replication/logical/origin.c > > 3. replorigin_session_setup(RepOriginId node, int acquired_by) > > @@ -1075,12 +1075,20 @@ ReplicationOriginExitCleanup(int code, Datum arg) > * array doesn't have to be searched when calling > * replorigin_session_advance(). > * > - * Obviously only one such cached origin can exist per process and the > current > + * Normally only one such cached origin can exist per process and the > + current > * cached value can only be set again after the previous value is torn down > * with replorigin_session_reset(). > + * > + * However, we do allow multiple processes to point to the same origin > + slot if > + * requested by the caller by passing PID of the process that has > + already > + * acquired it as acquired_by. This is to allow multiple parallel apply > + * processes to use the same origin, provided they maintain commit > + order, for > + * example, by allowing only one process to commit at a time. For the > + first > + * process requesting this origin, the acquired_by parameter needs to > + be set to > + * 0. > */ > void > -replorigin_session_setup(RepOriginId node) > +replorigin_session_setup(RepOriginId node, int acquired_by) > > I think the meaning of the acquired_by=0 is not fully described here: > "For the first process requesting this origin, the acquired_by parameter needs > to be set to 0." > IMO that seems to be describing it only from POV that you are always going to > want to allow multiple processes. But really this is an optional feature so > you > might pass acquired_by=0, not just because this is the first of multiple, but > also > because you *never* want to allow multiple at all. The comment does not > convey this meaning. > > Maybe something worded like below is better? > > SUGGESTION > Normally only one such cached origin can exist per process so the cached value > can only be set again after the previous value is torn down with > replorigin_session_reset(). For this normal case pass > acquired_by=0 (meaning the slot is not allowed to be already acquired by > another process). > > However, sometimes multiple processes can safely re-use the same origin slot > (for example, multiple parallel apply processes can safely use the same > origin, > provided they maintain commit order by allowing only one process to commit > at a time). For this case the first process must pass acquired_by=0, and then > the > other processes sharing that same origin can pass acquired_by=PID of the first > process. Changes as suggested. > == > > src/backend/replication/logical/worker.c > > 4. GENERAL - get_worker_name() > > If you decide it is OK to hardwire some error messages instead of > unconditionally calling the get_worker_name() -- see my #1 review comment in > this post -- then there are some other messages in this file that also seem > like > they can be also hardwired because the type of worker is already known. > > Here are some examples: > > 4a. > > + else if (am_parallel_apply_worker()) > + { > + if (rel->state != SUBREL_STATE_READY) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + /* translator: first
RE: Perform streaming logical transactions by background workers and parallel apply
On Tuesday, November 22, 2022 9:53 PM Kuroda, Hayato wroteL > > Thanks for updating the patch! > I tested the case whether the deadlock caused by foreign key constraint could > be detected, and it worked well. > > Followings are my review comments. They are basically related with 0001, but > some contents may be not. It takes time to understand 0002 correctly... Thanks for the comments! > 01. typedefs.list > > LeaderFileSetState should be added to typedefs.list. > > > 02. 032_streaming_parallel_apply.pl > > As I said in [1]: the test name may be not matched. Do you have reasons to > revert the change? The original parallel safety check has been removed, so I changed the name. After rethinking about this, I named it to stream_parallel_conflict. > > 03. 032_streaming_parallel_apply.pl > > The test does not cover the case that the backend process relates with the > deadlock. IIUC this is another motivation to use a stream/transaction lock. > I think it should be added. The main deadlock cases that stream/transaction lock can detect is 1) LA->PA 2) LA->PA->PA as explained atop applyparallelworker.c. So I think backend process related one is a variant which I think have been covered by the existing tests in the patch. > 04. log output > > While being applied spooled changes by PA, there are so many messages like > "replayed %d changes from file..." and "applied %u changes...". They comes > from > apply_handle_stream_stop() and apply_spooled_messages(). They have same > meaning, so I think one of them can be removed. Changed. > 05. system_views.sql > > In the previous version you modified pg_stat_subscription system view. Why > do you revert that? I was not sure should we include that in the main patch set. I added a top-up patch that change the view. > 06. interrupt.c - SignalHandlerForShutdownRequest() > > In the comment atop SignalHandlerForShutdownRequest(), some processes > that assign the function except SIGTERM are clarified. We may be able to add > the parallel apply worker. Changed > 08. guc_tables.c - ConfigureNamesInt > > ``` > &max_sync_workers_per_subscription, > + 2, 0, MAX_PARALLEL_WORKER_LIMIT, > + NULL, NULL, NULL > + }, > ``` > > The upper limit for max_sync_workers_per_subscription seems to be wrong, it > should be used for max_parallel_apply_workers_per_subscription. That's my miss, sorry for that. > 10. worker.c - maybe_reread_subscription() > > > ``` > + if (am_parallel_apply_worker()) > + ereport(LOG, > + /* translator: first %s is the name of logical > replication > worker */ > + (errmsg("%s for subscription \"%s\" > will stop because of a parameter change", > + > + get_worker_name(), MySubscription->name))); > ``` > > I was not sure get_worker_name() is needed. I think "logical replication apply > worker" > should be embedded. Changed. > > 11. worker.c - ApplyWorkerMain() > > ``` > + (errmsg_internal("%s for subscription \"%s\" > two_phase is %s", > + > + get_worker_name(), > ``` Changed Best regards, Hou zj
CF 2022-11: entries "Ready for Committer" with recent activity
Hi This is an overview of the current patches marked "Ready for Committer" which have been actively worked on during the current CF. They largely fall into two categories: - active participation from likely committers - have been reviewed and marked as "RfC", but need committer interest Dates in parentheses represent the last mail on the relevant thread. meson PGXS compatibility (2022-10-13) - - https://commitfest.postgresql.org/40/3932/ - https://www.postgresql.org/message-id/flat/20221005200710.luvw5evhwf6cl...@awork3.anarazel.de Sounds like it is committable, presumably just needs one of the committers on the thread to do it. Let libpq reject unexpected authentication requests (2022-11-16) - https://commitfest.postgresql.org/40/3716/ - https://www.postgresql.org/message-id/flat/9e5a8ccddb8355ea9fa4b75a1e3a9edc88a70cd3.ca...@vmware.com - Named commtter: Michael Paquier (michael-kun) Patch refreshed per Michael's comments. Add LSN to error messages reported for WAL file (2022-11-17) - https://commitfest.postgresql.org/40/3909/ - https://www.postgresql.org/message-id/flat/CALj2ACWV=FCddsxcGbVOA=cvPyMr75YCFbSQT6g4KDj=gcj...@mail.gmail.com Thread consensus is that it is RfC, but needes interest from a committer. Parallel Hash Full Join (2022-11-17) - https://commitfest.postgresql.org/40/2903/ - https://www.postgresql.org/message-id/flat/CA+hUKG+A6ftXPz4oe92+x8Er+xpGZqto70-Q_ERwRaSyA=a...@mail.gmail.com - Named commtter: Thomas Munro (macdice) Thomas has indicated he will look at this Fix assertion failure with barriers in parallel hash join (2022-11-18) -- - https://commitfest.postgresql.org/40/3662/ - https://www.postgresql.org/message-id/flat/20200929061142.ga29...@paquier.xyz Thomas has indicated he will look at this XID formatting and SLRU refactorings ... (2022-11-21) - - https://commitfest.postgresql.org/40/3489/ - https://www.postgresql.org/message-id/flat/caj7c6tpdoybyrncaeyndkbkto0wg2xsdydutf0nxq+vfkmt...@mail.gmail.com Patch has been updated several times very recently; needs interest from a committer Reducing power consumption when idle (2022-11-21) - - https://commitfest.postgresql.org/40/3566/ - https://www.postgresql.org/message-id/flat/canbhv-hk8yvo_g4vwbmz__vzu_vz4_jjwstwmanmxietdzq...@mail.gmail.com This is mainly awaiting resolution of the decision whether to deprecate or remove "promote_trigger_file"; seems consensus is towards "remove". pg_dump - read data for some options from external file (2022-11-22) - https://commitfest.postgresql.org/40/2573/ - https://www.postgresql.org/message-id/flat/CAFj8pRB10wvW0CC9Xq=1XDs=zcqxer3cblcnza+qix4cuh-...@mail.gmail.com Updated patch, needs interest from committer. This one has been floating around for a couple of years... Support % wildcard in extension upgrade scripts (2022-11-22) - https://commitfest.postgresql.org/40/3654/ - https://www.postgresql.org/message-id/flat/YgakFklJyM5pNdt+@c19 Patch has feedback, needs interest from committer; I haven't looked into it in detail but maybe it needs a bit more discussion? Compress KnownAssignedXids more frequently (2022-11-22) --- - https://commitfest.postgresql.org/40/3902/ - https://www.postgresql.org/message-id/flat/CALdSSPgahNUD_=pB_j=1zsndbaiotqvfzo8ejt5j_k7qziu...@mail.gmail.com Active, with feedback from committers. Transaction Management docs (2022-11-23) - https://commitfest.postgresql.org/40/3899/ - https://www.postgresql.org/message-id/flat/canbhv-e_iy9fmrerxrch8tztyenpfo72hf_xd2hldppva4d...@mail.gmail.com Seems committable. Fix order of checking ICU options in initdb and create database (2022-11-24) - https://commitfest.postgresql.org/40/3976/ - https://www.postgresql.org/message-id/flat/534fed4a262fee534662bd07a691c...@postgrespro.ru - Named commtter: Peter Eistentraut (petere) Concerns expressed by Peter, will change to WoA. Use fadvise in wal replay (2022-11-25) -- - https://commitfest.postgresql.org/40/3707/ - https://www.postgresql.org/message-id/flat/CADVKa1WsQMBYK_02_Ji=pbofnms+ct7tv6qjxddhsfcic9v...@mail.gmail.com Seems to be consensus that this patch is small and will bring a small benefit with minimal code. Needs interest from a committer. PGDOCS - Stats views and functions not in order? (2022-11-26) --
Re: Small miscellaneous fixes
On Sat, Nov 26, 2022 at 11:30:07AM -0300, Ranier Vilela wrote: > Thank you Michael, for taking care of it. (As of 1e31484, after finishing the tests I wanted.) -- Michael signature.asc Description: PGP signature
Re: Reducing power consumption on idle servers
2022年11月22日(火) 5:50 Laurenz Albe : > > On Mon, 2022-11-21 at 12:11 -0500, Tom Lane wrote: > > Robert Haas writes: > > > The reason that I pushed back -- not as successfully as I would have > > > liked -- on the changes to pg_stop_backup / pg_start_backup is that I > > > know there are people using the old method successfully, and it's not > > > just a 1:1 substitution. Here I don't, and it is. I'm totally open to > > > the feedback that such people exist and to hearing why adopting one of > > > the newer methods would be a problem for them, if that's the case. But > > > if there's no evidence that such people exist or that changing is a > > > problem for them, I don't think waiting 5 years on principle is good > > > for the project. > > > > We make incompatible changes in every release; see the release notes. > > Unless somebody can give a plausible use-case where this'd be a > > difficult change to deal with, I concur that we don't need to > > deprecate it ahead of time. > > Since I am the only one that seems to worry, I'll shut up. You are probably > right that it the feature won't be missed by many users. FWIW, though I prefer to err on the side of caution when removing features from anything, I am struggling to remember ever having used "promote_trigger_file" (or "trigger_file" as it was in Pg11 and earlier); grepping my private notes brings up a single example from ca. 2012 when I appear to have been experimenting with replication. On a quick web search, a large part of the results are related to its change to a GUC in Pg 12 and/or commented out entries in sample postgresql.conf files; most of the rest seem to be blog articles/tutorials on setting up replication. Regards Ian Barwick
Re: User functions for building SCRAM secrets
On 11/26/22 2:53 PM, Jonathan S. Katz wrote: On 11/16/22 10:09 PM, Michael Paquier wrote: git diff --check reports some whitespaces. Ack. Will fix on the next pass. (I've been transitioning editors, which could have resulted in that), Fixed (and have run that check subsequently). scram_build_secret_sha256_internal() is missing SASLprep on the password string. Perhaps the best thing to do here is just to extend pg_be_scram_build_secret() with more arguments so as callers can optionally pass down a custom salt with its length, leaving the responsibility to pg_be_scram_build_secret() to create a random salt if nothing has been given? Ah, good catch! I think if we go with passing down the salt, we'd also have to allow for the passing down of the iterations, too, and we're close to rebuilding "scram_build_secret". I'll stare a bit at this on the next pass and either 1/ just SASLprep the string in the new "scram_build_secret_sha256_internal" func, or 2/ change the definition of "pg_be_scram_build_secret" to accommodate more overrides. In the end I went with your suggested approach as it limited the amount of code duplication. I did keep in all the permutations of the tests as it did help me catch an error in my code that let to a panic. As this seems to be closer to completion, I did include docs in this patch. I added this function as part of the "string functions" section of the docs as "md5" was already there. If we continue to add more authentication helper functions, perhaps we should consider breaking those out into their own documentation section. Thanks, Jonathan From 2b04b1aca0415b726fdacc7c0cc4903ee864257c Mon Sep 17 00:00:00 2001 From: "Jonathan S. Katz" Date: Mon, 31 Oct 2022 16:13:08 -0400 Subject: [PATCH] Add "scram_build_secret_sha_256" SQL function This function lets users build SCRAM secrets from SQL functions and provides the ability for the user to select the password, salt, and number of iterations for the password hashing algorithm. --- doc/src/sgml/func.sgml | 42 +++ src/backend/catalog/system_functions.sql | 6 ++ src/backend/libpq/auth-scram.c | 29 +--- src/backend/libpq/crypt.c| 2 +- src/backend/utils/adt/Makefile | 1 + src/backend/utils/adt/authfuncs.c| 69 ++ src/backend/utils/adt/meson.build| 1 + src/include/catalog/pg_proc.dat | 4 ++ src/include/libpq/scram.h| 3 +- src/test/regress/expected/scram.out | 91 src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/scram.sql | 56 +++ 12 files changed, 295 insertions(+), 11 deletions(-) create mode 100644 src/backend/utils/adt/authfuncs.c create mode 100644 src/test/regress/expected/scram.out create mode 100644 src/test/regress/sql/scram.sql diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6e0425cb3d..f582da138f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3485,6 +3485,48 @@ repeat('Pg', 4) PgPgPgPg + + + + scram_build_secret_sha256 + +scram_build_secret_sha256 ( password text +[, salt bytea +[, iterations integer ] ]) +text + + +Using the value provided in password, builds a +SCRAM secret equilvaent to what is stored in +pg_authid.rolpassword +and used with scram-sha-256 +authentication. If not provided or set to NULL, +salt is randomly generated and +iterations defaults to 4096. + + +SELECT scram_build_secret_sha256('secret password', decode('MTIzNDU2Nzg5MGFiY2RlZg==', 'base64')); + + + SCRAM-SHA-256$4096:MTIzNDU2Nzg5MGFiY2RlZg==$D5BmucT796UQKargx2k3fdqjDYR7cH/L0viKKhGo6kA=:M33+iHFOESP8C3DKLDb2k5QAhkNVWEbp/YUIFd2CxN4= + + + +SELECT scram_build_secret_sha256('secret password', '\xabba5432'); + + + SCRAM-SHA-256$4096:q7pUMg==$05Nb9QHwHkMA0CRcYaEfwtgZ+3kStIefz8fLMjTEtio=:P126h1ycyP938E69yxktEfhoAILbiwL/UMsMk3Efb6o= + + + +SELECT scram_build_secret_sha256('secret password', decode('MTIzNDU2Nzg5MGFiY2RlZg==', 'base64'), 1); + + + SCRAM-SHA-256$1:MTIzNDU2Nzg5MGFiY2RlZg==$9NkDu1TFpx3L30zMgHUqjRNSq3GRZRrdWU4TuGOnT3Q=:svuIH9L6HH8loyKWguT64XXoOLCrr4FkVViPd2JVR4M= + + + + diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index 30a048f6b0..4aa76b81d9 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -594,6 +594,12 @@ LANGUAGE internal STRICT IMMUTABLE PARALLEL SAFE AS 'unicode_is_normalized'; +-- defaults for building a "scram-sha-256" secret +CREATE OR REPLACE FUNCTION + scram_build_secret_sha256(text, bytea DEFAULT NULL, int DEFAULT NULL) +RETURNS text +LANGUA
Re: Add tracking of backend memory allocated to pg_stat_activity
On Sat, 2022-11-26 at 22:10 -0500, Reid Thompson wrote: > Code rebased to current master. > Updated to incorporate additional recommendations from the the list > - add units to variables in view > - remove 'backend_' prefix from variables/functions > - update documentation > - add functional test for allocated_bytes > - refactor allocation reporting to reduce number of functions and > branches/reduce performance hit > - zero allocated bytes after fork to avoid double counting > postmaster allocations > > > > attempt to remedy cfbot windows build issues -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com From fb43761181925a87cb0674b6744b009fea614796 Mon Sep 17 00:00:00 2001 From: Reid Thompson Date: Thu, 11 Aug 2022 12:01:25 -0400 Subject: [PATCH] Add tracking of backend memory allocated to pg_stat_activity This new field displays the current bytes of memory allocated to the backend process. It is updated as memory for the process is malloc'd/free'd. Memory allocated to items on the freelist is included in the displayed value. Dynamic shared memory allocations are included only in the value displayed for the backend that created them, they are not included in the value for backends that are attached to them to avoid double counting. On occasion, orphaned memory segments may be cleaned up on postmaster startup. This may result in decreasing the sum without a prior increment. We limit the floor of backend_mem_allocated to zero. Updated pg_stat_activity documentation for the new column. --- doc/src/sgml/monitoring.sgml| 15 src/backend/catalog/system_views.sql| 1 + src/backend/postmaster/autovacuum.c | 6 ++ src/backend/postmaster/postmaster.c | 13 src/backend/postmaster/syslogger.c | 3 + src/backend/storage/ipc/dsm_impl.c | 81 + src/backend/utils/activity/backend_status.c | 45 src/backend/utils/adt/pgstatfuncs.c | 4 +- src/backend/utils/mmgr/aset.c | 17 + src/backend/utils/mmgr/generation.c | 15 src/backend/utils/mmgr/slab.c | 21 ++ src/include/catalog/pg_proc.dat | 6 +- src/include/utils/backend_status.h | 58 ++- src/test/regress/expected/rules.out | 9 ++- src/test/regress/expected/stats.out | 11 +++ src/test/regress/sql/stats.sql | 3 + 16 files changed, 299 insertions(+), 9 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 5579b8b9e0..ffe7d2566c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -947,6 +947,21 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + allocated_bytes bigint + + + Memory currently allocated to this backend in bytes. This is the balance + of bytes allocated and freed by this backend. Dynamic shared memory + allocations are included only in the value displayed for the backend that + created them, they are not included in the value for backends that are + attached to them to avoid double counting. Use pg_size_pretty + described in to make this value + more easily readable. + + + query text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2d8104b090..9ea8f78c95 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -864,6 +864,7 @@ CREATE VIEW pg_stat_activity AS S.state, S.backend_xid, s.backend_xmin, +S.allocated_bytes, S.query_id, S.query, S.backend_type diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 601834d4b4..f54606104d 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -407,6 +407,9 @@ StartAutoVacLauncher(void) #ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild(); @@ -1485,6 +1488,9 @@ StartAutoVacWorker(void) #ifndef EXEC_BACKEND case 0: + /* Zero allocated bytes to avoid double counting parent allocation */ + pgstat_zero_my_allocated_bytes(); + /* in postmaster child ... */ InitPostmasterChild(); diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a8a246921f..89a6caec78 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4102,6 +4102,9 @@ BackendStartup(Port *port) { free(bn); + /* Zero allocated
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Sat, Nov 26, 2022 at 12:15 PM Amit Kapila wrote: > > On Fri, Nov 25, 2022 at 5:38 PM Amit Kapila wrote: > > > > On Fri, Nov 25, 2022 at 1:35 PM Dilip Kumar wrote: > > > > > > During DecodeCommit() for skipping a transaction we use ReadRecPtr to > > > check whether to skip this transaction or not. Whereas in > > > ReorderBufferCanStartStreaming() we use EndRecPtr to check whether to > > > stream or not. Generally it will not create a problem but if the > > > commit record itself is adding some changes to the transaction(e.g. > > > snapshot) and if the "start_decoding_at" is in between ReadRecPtr and > > > EndRecPtr then streaming will decide to stream the transaction where > > > as DecodeCommit will decide to skip it. And for handling this case in > > > ReorderBufferForget() we call stream_abort(). > > > > > > > The other cases are probably where we don't have FilterByOrigin or > > dbid check, for example, XLOG_HEAP2_NEW_CID/XLOG_XACT_INVALIDATIONS. > > We anyway actually don't send anything for such cases except empty > > start/stop messages. Can we add some flag to txn which says that there > > is at least one change like DML that we want to stream? > > > > We can probably think of using txn_flags for this purpose. In the attached patch I have used txn_flags to identify whether it has any streamable change or not and the transaction will not be selected for streaming unless it has at least one streamable change. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From f330d37f6ac1930cde1d2773dcd568b9a35454c9 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Fri, 25 Nov 2022 13:11:44 +0530 Subject: [PATCH v2] Fix thinko in when to stream a transaction Actually, during DecodeCommit() for skipping a transaction we use ReadRecPtr to check whether to skip this transaction or not. Whereas in ReorderBufferCanStartStreaming() we use EndRecPtr to check whether to stream or not. Generally it will not create a problem but if the commit record itslef is adding some changes to the transaction(e.g. snapshot) and if the start_decoding_at is in between ReadRecPtr and EndRecPtr then streaming will decide to stream the transaction where as DecodeCommit will decide to skip it. And for handling this case in ReorderBufferForget() we call stream_abort() in order to abort any streamed changes. So ideally if we are planning to skip the transaction we should never stream it hence there is no need to stream abort such transaction in case of skip. Along with that we also skip the transaction if the transaction dbid is not same slot dbid or it is filtered by origin id. So in corner cases it is possible that we might stream the transaction but later it will be skipped in DecodeCommit. For fixing that do not select any transaction for streaming unless there is any streamable change and if there is any streamable change then we can safely select it for streaming as it will not be skipped by DecodeCommit. --- src/backend/replication/logical/reorderbuffer.c | 34 + src/include/replication/reorderbuffer.h | 23 +++-- 2 files changed, 44 insertions(+), 13 deletions(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 31f7381..e1a031d 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -793,6 +793,30 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn, return; } + /* + * If there are any streamable changes getting queued then get the top + * transaction and mark it has streamable change. This is required for + * streaming in-progress transactions, the in-progress transaction will + * not be selected for streaming unless it has at least one streamable + * change. + */ + if (change->action == REORDER_BUFFER_CHANGE_INSERT || + change->action == REORDER_BUFFER_CHANGE_UPDATE || + change->action == REORDER_BUFFER_CHANGE_DELETE || + change->action == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT || + change->action == REORDER_BUFFER_CHANGE_TRUNCATE) + { + ReorderBufferTXN *toptxn; + + /* get the top transaction */ + if (txn->toptxn != NULL) + toptxn = txn->toptxn; + else + toptxn = txn; + + toptxn->txn_flags |= RBTXN_HAS_STREAMABLE_CHANGE; + } + change->lsn = lsn; change->txn = txn; @@ -2942,9 +2966,8 @@ ReorderBufferForget(ReorderBuffer *rb, TransactionId xid, XLogRecPtr lsn) if (txn == NULL) return; - /* For streamed transactions notify the remote node about the abort. */ - if (rbtxn_is_streamed(txn)) - rb->stream_abort(rb, txn, lsn); + /* the transaction which is being skipped shouldn't have been streamed */ + Assert(!rbtxn_is_streamed(txn)); /* cosmetic... */ txn->final_lsn = lsn; @@ -3502,7 +3525,8 @@ ReorderBufferLargestTopTXN(ReorderBuffer *rb) Assert(txn->base_snapshot != NULL); if ((largest == NULL || txn->total_size > largest_size) && - (txn->total_
Re: Allow file inclusion in pg_hba and pg_ident files
2022年11月25日(金) 11:25 Julien Rouhaud : > > On Fri, Nov 25, 2022 at 07:41:59AM +0900, Michael Paquier wrote: > > On Thu, Nov 24, 2022 at 05:07:24PM +0800, Julien Rouhaud wrote: > > > So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH. Attached > > > v22 > > > that fixes it in all the places I found. > > > > Sounds fine. Added one comment, fixed one comment, and applied. > > Thanks! > > Thanks a lot! Hi I'm trying to reconcile open CommitFest entries with their actual status; the entry for this: https://commitfest.postgresql.org/40/3558/ shows as "Waiting on Author", but looks like it's all been committed; is there anything left to do on this? Thanks Ian Barwick
Re: Allow file inclusion in pg_hba and pg_ident files
Le dim. 27 nov. 2022 à 15:31, Ian Lawrence Barwick > > I'm trying to reconcile open CommitFest entries with their actual > status; the entry for this: > > https://commitfest.postgresql.org/40/3558/ > > shows as "Waiting on Author", but looks like it's all been committed; > is there anything > left to do on this? > right the CF entry is out of date. there's still the additional tap test that needs to be taken care of. I sent a new version recently that works on windows CI, so I guess the correct status should now be needs review, although the latest patchset probably doesn't apply anymore since the first patch has been committed. I'm traveling today I'll try to send a rebased version in the evening >