Re: making relfilenodes 56 bits
On Sat, Jul 2, 2022 at 9:38 AM Andres Freund wrote: Thanks for the review, > I'm not feeling inspired by "locator", tbh. But I don't really have a great > alternative, so ... > > > On 2022-07-01 16:12:01 +0530, Dilip Kumar wrote: > > From f07ca9ef19e64922c6ee410707e93773d1a01d7c Mon Sep 17 00:00:00 2001 > > From: dilip kumar > > Date: Sat, 25 Jun 2022 10:43:12 +0530 > > Subject: [PATCH v4 2/4] Preliminary refactoring for supporting larger > > relfilenumber > > I don't think we have abbreviated buffer as 'buff' in many places? I think we > should either spell buffer out or use 'buf'. This is in regard to BuffTag etc. Okay, I will change it to 'buf' > > Subject: [PATCH v4 3/4] Use 56 bits for relfilenumber to avoid wraparound > Normally we don't capitalize the first character of a comment that's not a > full sentence (i.e. ending with a punctuation mark). Okay. > "logged for use" - looks like you reformulated the sentence incompletely. Right, I will fix it. > > + if (ShmemVariableCache->relnumbercount == 0) > > + { > > + > > XLogPutNextRelFileNumber(ShmemVariableCache->nextRelFileNumber + > > + > > VAR_RFN_PREFETCH); > > I know this is just copied, but I find "XLogPut" as a prefix pretty unhelpful. Maybe we can change to LogNextRelFileNumber()? > What's the story behind moving relfilenode to the front? Alignment > consideration? Seems odd to move the relfilenode before the oid. If there's an > alignment issue, can't you just swap it with reltablespace or such to resolve > it? Because of a test case added in this commit (79b716cfb7a1be2a61ebb4418099db1258f35e30). Even I did not like to move relfilenode before oid, but under this commit it is expected this to aligned as well as before any NameData check this comments === +-- +-- Keep such columns before the first NameData column of the +-- catalog, since packagers can override NAMEDATALEN to an odd number. +-- === > > > From f6e8e0e7412198b02671e67d1859a7448fe83f38 Mon Sep 17 00:00:00 2001 > > From: dilip kumar > > Date: Wed, 29 Jun 2022 13:24:32 +0530 > > Subject: [PATCH v4 4/4] Don't delay removing Tombstone file until next > > checkpoint > > > > Currently, we can not remove the unused relfilenode until the > > next checkpoint because if we remove them immediately then > > there is a risk of reusing the same relfilenode for two > > different relations during single checkpoint due to Oid > > wraparound. > > Well, not quite "currently", because at this point we've fixed that in a prior > commit ;) Right, I will change, but I'm not sure whether we want to commit 0003 and 0004 as an independent patch or as a simple patch. > > Now as part of the previous patch set we have made relfilenode > > 56 bit wider and removed the risk of wraparound so now we don't > > need to wait till the next checkpoint for removing the unused > > relation file and we can clean them up on commit. > > Hm. Wasn't there also some issue around crash-restarts benefiting from having > those files around until later? I think what I'm remembering is what is > referenced in this comment: I think due to wraparound if relfilenode gets reused by another relation in the same checkpoint then there was an issue in crash recovery with wal level minimal. But the root of the issue is a wraparound, right? > > > - * For regular relations, we don't unlink the first segment file of the > > rel, > > - * but just truncate it to zero length, and record a request to unlink it > > after > > - * the next checkpoint. Additional segments can be unlinked immediately, > > - * however. Leaving the empty file in place prevents that relfilenumber > > - * from being reused. The scenario this protects us from is: > > - * 1. We delete a relation (and commit, and actually remove its file). > > - * 2. We create a new relation, which by chance gets the same > > relfilenumber as > > - * the just-deleted one (OIDs must've wrapped around for that to > > happen). > > - * 3. We crash before another checkpoint occurs. > > - * During replay, we would delete the file and then recreate it, which is > > fine > > - * if the contents of the file were repopulated by subsequent WAL entries. > > - * But if we didn't WAL-log insertions, but instead relied on fsyncing the > > - * file after populating it (as we do at wal_level=minimal), the contents > > of > > - * the file would be lost forever. By leaving the empty file until after > > the > > - * next checkpoint, we prevent reassignment of the relfilenumber until it's > > - * safe, because relfilenumber assignment skips over any existing file. > > This isn't related to oid wraparound, just crashes. It's possible that the > XLogFlush() in XLogPutNextRelFileNumber() prevents such a scenario, but if so > it still ought to be explained here, I think. I think the root cause of the problem is oid reuse which is due to relfilenode wraparound, and the problem is created if the
Re: making relfilenodes 56 bits
On Sat, Jul 2, 2022 at 4:53 AM Dilip Kumar wrote: > > I'm doubtful it's a good idea to start dropping at the first segment. I'm > > fairly certain that there's smgrexists() checks in some places, and they'll > > now stop working, even if there are later segments that remained, e.g. > > because > > of an error in the middle of removing later segments. > > Okay, so you mean to say that we can first drop the remaining segment > and at last we drop the segment 0 right? I think we need to do it in descending order, starting with the highest-numbered segment and working down. md.c isn't smart about gaps in the sequence of files, so it's better if we don't create any gaps. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg15b2: large objects lost on upgrade
On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby wrote: > I noticed this during beta1, but dismissed the issue when it wasn't easily > reproducible. Now, I saw the same problem while upgrading from beta1 to > beta2, > so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was > run. Yikes. That's really bad, and I have no idea what might be causing it, either. I'll plan to investigate this on Tuesday unless someone gets to it before then. -- Robert Haas EDB: http://www.enterprisedb.com
Re: replacing role-level NOINHERIT with a grant-level option
On Fri, Jul 1, 2022 at 5:12 PM Nathan Bossart wrote: > > I have some hesitations about this approach. On the positive side, I > > believe it's fully backward compatible, and that's something to like. > > On the negative side, I've always felt that the role-level properties > > - [NO]INHERIT, [NO]CREATEROLE, [NO]SUPERUSER, [NO]CREATEDB were hacky > > kludges, and I'd be happier they all went away in favor of more > > principled ways of controlling those behaviors. I think that the > > previous design moves us in the direction of being able to eventually > > remove [NO]INHERIT, whereas this, if anything, entrenches it. > > +1. As mentioned upthread [0], I think attributes like CREATEROLE, > SUPERUSER, and CREATEDB could be replaced with predefined roles. However, > since role attributes aren't inherited, that would result in a behavior > change. I'm curious what you have in mind. It might be worth spinning off > a new thread for this sooner than later. I don't think there is a whole lot of point in replacing role-level flags with predefined roles that work exactly the same way. Maybe there is some point, but I'm not excited about it. The problem with these settings in my opinion is that they are too monolithic. Either you can create any role with basically any privileges or you can create no roles at all. Either you are a superuser and can bypass all permissions checks or you are not and can't bypass any permissions checks. > unparenthesized syntax or add WARNINGs when it is used?) For [NO]INHERIT > and WITH INHERIT DEFAULT, presumably we could do something similar. Down > the road, those would be removed in favor of only using grant-level > options. I think it'd be hard to do that if WITH INHERIT DEFAULT is actually state stored in the catalog. Maybe I should revise this again so that the catalog column is just true or false, and the role-level property only sets the default for future grants. That might be more like what Tom had in mind originally. More clear sketch: Remove WITH INHERIT DEFAULT and just have WITH INHERIT { TRUE | FALSE }. If neither is specified, the default for a new GRANT is set based on the role-level property. I want more control than that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Support logical replication of DDLs
On Sat, Jul 2, 2022 at 8:51 AM Amit Kapila wrote: > > On Fri, Jul 1, 2022 at 10:22 PM vignesh C wrote: > > > > On Wed, Jun 29, 2022 at 3:25 PM houzj.f...@fujitsu.com > > wrote: > > > > > > > Thanks for the updated patch. > > Few comments on 0002 patch: > > 1) When we create a subscription for a publication with the existing > > default PUBLISH parameter having default value as > > 'insert,update,delete,truncate', we do an initial table sync to get > > the initial table data from the publisher to the subscriber. But in > > case of a publication created with 'ddl', the subscription expects the > > existing initial tables present in the publisher to be created > > beforehand in the subscriber. Should this be the default behavior? > > Should we do a ddl dump for all the tables and restore the ddl to the > > subscription while creating the subscription? Or is this planned as an > > option for the later version. > > > > The idea is to develop initial sync (for ddl replication) as a > separate patch. But both need to be integrated at some point. Yes, that approach makes sense. > > > > 3) SYNTAX Support: > > Currently creation of "FOR TABLE" publication with ddl is supported. > > Should we allow support of ddl for "FOR TABLE" publication. > > > > The above comment is unclear to me. It seems to me in the first > sentence, you are saying that the "FOR TABLE" syntax is supported and > in the second sentence, you are asking to allow support of it? I think > at this stage, the focus is to build the core part of the feature > (allow ddl replication and deparsing support), and then we can discuss > more on Syntax. Having said that, it will be good if we can support > table-level DDL replication as well in the patch as you seem to be > suggesting. I initially thought that supporting "FOR TABLE" publication for ddl might not be useful as currently the create subscription fails with table does not exist error. Now that the initial sync for ddl replication will also be implemented as mentioned in [1], this issue will be handled. I agree with supporting table-level DDL replication. [1] - https://www.postgresql.org/message-id/CAA4eK1%2B5zJAT_RYOAEOq8M33s196kR5sDyLQLUXd8Rnqr%2BiB0Q%40mail.gmail.com Regards, Vignesh
Re: pg15b2: large objects lost on upgrade
On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote: > On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby wrote: > > I noticed this during beta1, but dismissed the issue when it wasn't easily > > reproducible. Now, I saw the same problem while upgrading from beta1 to > > beta2, > > so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was > > run. > > Yikes. That's really bad, and I have no idea what might be causing it, > either. I'll plan to investigate this on Tuesday unless someone gets > to it before then. I suppose it's like Bruce said, here. https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us |One tricky case is pg_largeobject, which is copied from the old to new |cluster since it has user data. To preserve that relfilenode, you would |need to have pg_upgrade perform cluster surgery in each database to |renumber its relfilenode to match since it is created by initdb. I |can't think of a case where pg_upgrade already does something like that. Rather than setting the filenode of the next object as for user tables, pg-upgrade needs to UPDATE the relfilenode. This patch "works for me" but feel free to improve on it. >From 88cbe118d4eb4dcf9b6d2831d81f723587d80942 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 2 Jul 2022 00:48:47 -0500 Subject: [PATCH] wip: preserve relfilenode of pg_largeobject and its indexes An empty table is created by initdb, but its filenode was not preserved, so pg_largeobject was empty after upgrades. See also: 9a974cbcba005256a19991203583a94b4f9a21a9 https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us https://www.postgresql.org/message-id/20220701231413.GI13040%40telsasoft.com --- src/bin/pg_dump/pg_dump.c | 37 +++--- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 ++ 2 files changed, 24 insertions(+), 15 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d9c5bcafd20..c629f4154fc 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3135,7 +3135,7 @@ dumpDatabase(Archive *fout) /* * pg_largeobject comes from the old system intact, so set its - * relfrozenxids and relminmxids. + * relfrozenxids, relminmxids and relfilenode. */ if (dopt->binary_upgrade) { @@ -3143,34 +3143,41 @@ dumpDatabase(Archive *fout) PQExpBuffer loFrozenQry = createPQExpBuffer(); PQExpBuffer loOutQry = createPQExpBuffer(); int i_relfrozenxid, + i_relfilenode, + i_oid, i_relminmxid; /* * pg_largeobject */ if (fout->remoteVersion >= 90300) - appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid\n" + appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, relminmxid, relfilenode, oid\n" "FROM pg_catalog.pg_class\n" - "WHERE oid = %u;\n", - LargeObjectRelationId); + "WHERE oid IN (%u, %u);\n", + LargeObjectRelationId, LargeObjectLOidPNIndexId); else - appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, 0 AS relminmxid\n" + appendPQExpBuffer(loFrozenQry, "SELECT relfrozenxid, 0 AS relminmxid, relfilenode, oid\n" "FROM pg_catalog.pg_class\n" - "WHERE oid = %u;\n", - LargeObjectRelationId); + "WHERE oid IN (%u, %u);\n", + LargeObjectRelationId, LargeObjectLOidPNIndexId); - lo_res = ExecuteSqlQueryForSingleRow(fout, loFrozenQry->data); + lo_res = ExecuteSqlQuery(fout, loFrozenQry->data, PGRES_TUPLES_OK); i_relfrozenxid = PQfnumber(lo_res, "relfrozenxid"); i_relminmxid = PQfnumber(lo_res, "relminmxid"); + i_relfilenode = PQfnumber(lo_res, "relfilenode"); + i_oid = PQfnumber(lo_res, "oid"); + + appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid, relminmxid and relfilenode\n"); + for (int i = 0; i < PQntuples(lo_res); ++i) + appendPQExpBuffer(loOutQry, "UPDATE pg_catalog.pg_class\n" + "SET relfrozenxid = '%u', relminmxid = '%u', relfilenode = '%u'\n" + "WHERE oid = %u;\n", + atooid(PQgetvalue(lo_res, i, i_relfrozenxid)), + atooid(PQgetvalue(lo_res, i, i_relminmxid)), + atooid(PQgetvalue(lo_res, i, i_relfilenode)), + atooid(PQgetvalue(lo_res, i, i_oid))); - appendPQExpBufferStr(loOutQry, "\n-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid\n"); - appendPQExpBuffer(loOutQry, "UPDATE pg_catalog.pg_class\n" - "SET relfrozenxid = '%u', relminmxid = '%u'\n" - "WHERE oid = %u;\n", - atooid(PQgetvalue(lo_res, 0, i_relfrozenxid)), - atooid(PQgetvalue(lo_res, 0, i_relminmxid)), - LargeObjectRelationId); ArchiveEntry(fout, nilCatalogId, createDumpId(), ARCHIVE_OPTS(.tag = "pg_largeobject", .description = "pg_largeobject", diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index 67e0be68568..a22f0f8c885 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t
Re: pg15b2: large objects lost on upgrade
On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote: > On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby wrote: > > I noticed this during beta1, but dismissed the issue when it wasn't easily > > reproducible. Now, I saw the same problem while upgrading from beta1 to > > beta2, > > so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was > > run. > > Yikes. That's really bad, and I have no idea what might be causing it, > either. I'll plan to investigate this on Tuesday unless someone gets > to it before then. As far as I can see the data is still there, it's just that the new cluster keeps its default relfilenode instead of preserving the old cluster's value: regression=# table pg_largeobject; loid | pageno | data --++-- (0 rows) regression=# select oid, relfilenode from pg_class where relname = 'pg_largeobject'; oid | relfilenode --+- 2613 |2613 (1 row) -- using the value from the old cluster regression=# update pg_class set relfilenode = 39909 where oid = 2613; UPDATE 1 regression=# table pg_largeobject; loid | pageno | ---++- 33211 | 0 | \x0a4920776[...] 34356 | 0 | \xdeadbeef (2 rows)
Re: PSA: Autoconf has risen from the dead
Peter Eisentraut writes: > To summarize: > - Autoconf 2.71 has been out for 1.5 years. > - It is available in many recently updated OSs. > - It allows us to throw away several workarounds. > Hence: >> Currently, I think early PG16 might be good time to do this update. In preparation for reviewing this, I tried to install autoconf 2.71 from source locally. All went well on my RHEL8 workstation, but autoconf's testsuite falls over rather badly on my macOS laptop [1]. It fails differently on another Mac where I have a MacPorts installation at the head of the search path [2]. After sending the requested reports, I tried scanning the bug-autoconf archives, and found a similar report that was answered thus [3]: > I *think* this is the same problem as > https://savannah.gnu.org/support/?110492 > : current Autoconf doesn't work correctly with the (rather old) version of > GNU > M4 that ships with MacOS. Please try installing a current version of GNU M4 > in > your PATH and then retry the build and testsuite. So that explains part of it: most of the failures are down to using Apple's hoary m4 instead of the one from MacPorts. We could usefully warn about that in our own docs, perhaps. But there's still these scary failures: 509: AC_CHECK_HEADER_STDBOOL FAILED (acheaders.at:9) 514: AC_HEADER_STDBOOL FAILED (acheaders.at:14) The generated autoconf program builds the same output files as you have in your patch, and running the configure script gives the correct answer from AC_HEADER_STDBOOL, so I'm not sure what these test failures are unhappy about. Still, this is not a good look for a mainstream development platform. I wonder if we ought to wait for a fix. regards, tom lane [1] https://lists.gnu.org/archive/html/bug-autoconf/2022-07/msg0.html [2] https://lists.gnu.org/archive/html/bug-autoconf/2022-07/msg1.html [3] https://lists.gnu.org/archive/html/bug-autoconf/2022-04/msg2.html
Re: Slow standby snapshot
Hello, Simon. Sorry for calling you directly, but you know the subject better than anyone else. It is related to your work from 2010 - replacing KnownAssignedXidsHash with the KnownAssignedXids array. I have added additional optimization to the data structure you implemented. Initially, it was caused by the huge usage of CPU (70%+) for KnownAssignedXidsGetAndSetXmin in case of long (few seconds) transactions on primary and high (few thousands) max_connections in Postgres 11. After snapshot scalability optimization by Andres Freund (2), it is not so crucial but still provides a significant performance impact (+10% TPS) for a typical workload, see benchmark (3). Last patch version is here - (4). Does such optimisation look worth committing? Thanks in advance, Michail. [1]: https://github.com/postgres/postgres/commit/2871b4618af1acc85665eec0912c48f8341504c4#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179 [2]: https://commitfest.postgresql.org/29/2500/ [3]: https://www.postgresql.org/message-id/flat/CANtu0ohzBFTYwdLtcanWo4%2B794WWUi7LY2rnbHyorJdE8_ZnGg%40mail.gmail.com#379c1be7b8134ada5a574078d51b64c6 [4]: https://www.postgresql.org/message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b260120b3173a66382
Re: PSA: Autoconf has risen from the dead
I wrote: > So that explains part of it: most of the failures are down to using > Apple's hoary m4 instead of the one from MacPorts. We could usefully > warn about that in our own docs, perhaps. Hmm. I have just spent a very frustrating hour trying, and failing, to build any version of GNU m4 from source on either RHEL8 or current macOS. I don't quite understand why: neither the RPM specfile nor the MacPorts recipe for their respective m4 packages seem to contain any special hacks, so that it looks like the usual "configure; make; make check; make install" procedure ought to work fine. But it doesn't. I hit build failures (apparently because the source code is far too much in bed with nonstandard aspects of libc), or get an executable that SIGABRT's instantly, or if it doesn't do that it still fails some self-tests. With the latest 1.4.19 on macOS, the configure script hangs up, for crissakes. I am now feeling *very* hesitant about doing anything where we might be effectively asking people to build m4 for themselves. On the whole, I'm questioning the value of messing with our autoconf infrastructure at this stage. We did agree at PGCon that we'd keep it going for a couple years more, but it's not real clear to me why we can't limp along with 2.69 until we decide to drop it. regards, tom lane
AIX support - alignment issues
Hi, (sorry for sending this twice to you Noah, forgot -hackers the first time round) We've had a bunch of changes to manually deal with our alignment code not understanding AIX alignment. commit f3b421da5f4addc95812b9db05a24972b8fd9739 Author: Peter Eisentraut Date: 2016-12-21 12:00:00 -0500 Reorder pg_sequence columns to avoid alignment issue commit 79b716cfb7a1be2a61ebb4418099db1258f35e30 Author: Amit Kapila Date: 2022-04-07 09:39:25 +0530 Reorder subskiplsn in pg_subscription to avoid alignment issues. A good explanation of the problem is in https://postgr.es/m/20220402081346.GD3719101%40rfd.leadboat.com I strikes me as a remarkably bad idea to manually try to maintain the correct alignment. Even with the tests added it's still quite manual and requires contorted struct layouts (see e.g. [1]). I think we should either teach our system the correct alignment rules or we should drop AIX support. If we decide we want to continue supporting AIX we should bite the bullet and add a 64bit-int TYPALIGN_*. It might be worth to translate that to bytes when building tupledescs, so we don't need more branches (reducing them compared to today). Personally I think we should just drop AIX. The amount of effort to keep it working is substantial due to being quite different from other unices ([2]), the is very outdated, the whole ecosystem is barely on lifesupport ([3]). And all of that for very little real world use. Afaics we don't have access to an up2date AIX system. Some of have access to 7.2 via the gcc compile farm, but not 7.3. Most other niche-y operating systems we can start in a VM, but I've yet to see a legal and affordable way to do that with AIX. I think Noah has done quite a heroic effort at keeping the AIX animals in a kind-of-healthy state, but without more widespread access and more widespread usage it seems like a doomed effort. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/CAFiTN-uiAngcW50Trwa94F1EWY2BxEx%2BB38QSyX3DtV3dzEGhA%40mail.gmail.com [2] linking etc is handled entirely different, so there's a fair bit of dedicated AIX code around the buildsystem - a lot of it vestigial stuff, see references to aix3.2.5 etc. [3] 7.2 was released in 2015-10-05, 7.3 in 2021-12-10, the set of changes is pretty darn small for that timeframe https://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=AN&subtype=CA&htmlfid=897/ENUS221-328&appname=USN Bull / Atos stopped their AIX work in 2022-03-01 - unfortunately they didn't even keep the announcement of that online. https://www.linkedin.com/pulse/said-say-bull-closing-down-aix-open-source-platform-michaelis https://github.com/power-devops/bullfreeware
Re: Emit extra debug message when executing extension script.
On Fri, 2022-07-01 at 15:33 -0700, Nathan Bossart wrote: > On Fri, Jul 01, 2022 at 03:24:27PM -0700, Jeff Davis wrote: > > + ereport(DEBUG1, errmsg("executing extension update > > script from version '%s' to '%s'", from_version, version)); > > nitpick: I would suggest "executing extension script for update from > version X to Y." Thank you. Committed with minor modification to include the extension name. I did end up using Peter's suggestion. I reviewed other DEBUG messages and it seems nearly all use elog() or errmsg_internal(). > I personally would rather this output the name of the file. If > revealing > the directory is a concern, perhaps we could just trim everything but > the > file name. I could have slightly refactored the code to do this, but it didn't quite seem worth it for a single debug message. Regards, Jeff Davis
Re: Probable memory leak with ECPG and AIX
Noah Misch writes: > I had expected to use pthread_once() for the newlocale() call, but there would > be no useful way to report failure and try again later. Instead, I called > newlocale() while ECPGconnect() holds connections_mutex. See log message and > comments for details. I tested "./configure ac_cv_func_uselocale=no ..." and > tested the scenario of newlocale() failing every time. This looks solid to me. The only nit I can find to pick is that I'd have added one more comment, along the lines of diff --git a/src/interfaces/ecpg/ecpglib/connect.c b/src/interfaces/ecpg/ecpglib/connect.c index 9f958b822c..96f99ae072 100644 --- a/src/interfaces/ecpg/ecpglib/connect.c +++ b/src/interfaces/ecpg/ecpglib/connect.c @@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const char *user, const char *p #ifdef ENABLE_THREAD_SAFETY pthread_mutex_lock(&connections_mutex); #endif + + /* +* ... but first, make certain we have created ecpg_clocale. Rely on +* holding connections_mutex to ensure this is done by only one thread. +*/ #ifdef HAVE_USELOCALE if (!ecpg_clocale) { I've marked it RFC. regards, tom lane
Re: AIX support - alignment issues
On Sat, Jul 2, 2022 at 11:34 AM Andres Freund wrote: > Personally I think we should just drop AIX. The amount of effort to keep it > working is substantial due to being quite different from other unices ([2]), > the is > very outdated, the whole ecosystem is barely on lifesupport ([3]). And all of > that > for very little real world use. I tend to agree about dropping AIX. But I wonder if there is an argument against that proposal that doesn't rely on AIX being relevant to at least one user. Has supporting AIX ever led to the discovery of a bug that didn't just affect AIX? In other words, are AIX systems peculiar in some particular way that clearly makes them more likely to flush out a certain class of bugs? What is the best argument *against* desupporting AIX that you know of? Desupporting AIX doesn't mean that any AIX users will be left in the lurch immediately. Obviously these users will be able to use a supported version of Postgres for several more years. -- Peter Geoghegan
Re: AIX support - alignment issues
Peter Geoghegan writes: > I tend to agree about dropping AIX. But I wonder if there is an > argument against that proposal that doesn't rely on AIX being relevant > to at least one user. Has supporting AIX ever led to the discovery of > a bug that didn't just affect AIX? Searching the commit log quickly finds 591e088dd datetime.c's parsing logic has assumed that strtod() will accept a string that looks like ".", which it does in glibc, but not on some less-common platforms such as AIX. glibc's behavior is clearly not meeting the letter of the POSIX spec here. a745b9365 I'm not sure how we've managed not to notice this problem, but it seems to explain slow execution of the 017_shm.pl test script on AIX since commit 4fdbf9af5, which added a speculative "pg_ctl stop" with the idea of making real sure that the postmaster isn't there. In the test steps that kill-9 and then restart the postmaster, it's possible to get past the initial signal attempt before kill() stops working for the doomed postmaster. If that happens, pg_ctl waited till PGCTLTIMEOUT before giving up ... and the buildfarm's AIX members have that set very high. Admittedly, this one is more about "slow" than about "AIX". 57b5a9646 Most versions of tar are willing to overlook the missing terminator, but the AIX buildfarm animals were not. Fix by inventing a new kind of bbstreamer that just blindly adds a terminator, and using it whenever we don't parse the tar archive. Another place where we failed to conform to relevant standards. b9b610577 Fix ancient violation of zlib's API spec. And another. Now, it's certainly possible that AIX is the only surviving platform that hasn't adopted bug-compatible-with-glibc interpretations of POSIX. But I think the standard is the standard, and we ought to stay within it. So I find value in these fixes. regards, tom lane
Re: making relfilenodes 56 bits
Hi, On 2022-07-02 14:23:08 +0530, Dilip Kumar wrote: > > > + if (ShmemVariableCache->relnumbercount == 0) > > > + { > > > + > > > XLogPutNextRelFileNumber(ShmemVariableCache->nextRelFileNumber + > > > + > > > VAR_RFN_PREFETCH); > > > > I know this is just copied, but I find "XLogPut" as a prefix pretty > > unhelpful. > > Maybe we can change to LogNextRelFileNumber()? Much better. Hm. Now that I think about it, isn't the XlogFlush() in XLogPutNextRelFileNumber() problematic performance wise? Yes, we'll spread the cost across a number of GetNewRelFileNumber() calls, but still, an additional f[data]sync for every 64 relfilenodes assigned isn't cheap - today there's zero fsyncs when creating a sequence or table inside a transaction (there are some for indexes, but there's patches to fix that). Not that I really see an obvious alternative. I guess we could try to invent a flush-log-before-write type logic for relfilenodes somehow? So that the first block actually written to a file needs to ensure the WAL record that created the relation is flushed. But getting that to work reliably seems nontrivial. One thing that would be good is to add an assertion to a few places ensuring that relfilenodes aren't above ->nextRelFileNumber, most importantly somewhere in the recovery path. Why did you choose a quite small value for VAR_RFN_PREFETCH? VAR_OID_PREFETCH is 8192, but you chose 64 for VAR_RFN_PREFETCH? I'd spell out RFN in VAR_RFN_PREFETCH btw, it took me a bit to expand RFN to relfilenode. > > What's the story behind moving relfilenode to the front? Alignment > > consideration? Seems odd to move the relfilenode before the oid. If there's > > an > > alignment issue, can't you just swap it with reltablespace or such to > > resolve > > it? > > Because of a test case added in this commit > (79b716cfb7a1be2a61ebb4418099db1258f35e30). Even I did not like to > move relfilenode before oid, but under this commit it is expected this > to aligned as well as before any NameData check this comments > > === > +-- > +-- Keep such columns before the first NameData column of the > +-- catalog, since packagers can override NAMEDATALEN to an odd number. > +-- > === This is embarassing. Trying to keep alignment match between C and catalog alignment on AIX, without actually making the system understand the alignment rules, is a remarkably shortsighted approach. I started a separate thread about it, since it's not really relevant to this thread: https://postgr.es/m/20220702183354.a6uhja35wta7agew%40alap3.anarazel.de Maybe we could at least make the field order to be something like oid, relam, relfilenode, relname that should be ok alignment wise, keep oid first, and seems to make sense from an "importance" POV? Can't really interpret later fields without knowing relam etc. > > > Now as part of the previous patch set we have made relfilenode > > > 56 bit wider and removed the risk of wraparound so now we don't > > > need to wait till the next checkpoint for removing the unused > > > relation file and we can clean them up on commit. > > > > Hm. Wasn't there also some issue around crash-restarts benefiting from > > having > > those files around until later? I think what I'm remembering is what is > > referenced in this comment: > > I think due to wraparound if relfilenode gets reused by another > relation in the same checkpoint then there was an issue in crash > recovery with wal level minimal. But the root of the issue is a > wraparound, right? I'm not convinced the tombstones were required solely in the oid wraparound case before, despite the comment saying so, with wal_level=minimal. I gotta do some non-work stuff for a bit, so I need to stop pondering this now :) I think it might be a good idea to have a few weeks in which we do *not* remove the tombstones, but have assertion checks against such files existing when we don't expect them to. I.e. commit 1-3, add the asserts, then commit 4 a bit later. > > I'm doubtful it's a good idea to start dropping at the first segment. I'm > > fairly certain that there's smgrexists() checks in some places, and they'll > > now stop working, even if there are later segments that remained, e.g. > > because > > of an error in the middle of removing later segments. > > Okay, so you mean to say that we can first drop the remaining segment > and at last we drop the segment 0 right? I'd use the approach Robert suggested and delete from the end, going down. Greetings, Andres Freund
Re: Add red-black tree missing comparison searches
Hey Alexander, > But I think we can support strict inequalities too (e.g. less and greater without equals). Could you please make it a bool argument equal_matches? Sure, an argument is a good idea to keep the code shorter. > Could you please extract this change as a separate patch. Done! On Thu, 30 Jun 2022 at 14:34, Alexander Korotkov wrote: > Hi, Steve! > > Thank you for working on this. > > On Thu, Jun 30, 2022 at 7:51 PM Steve Chavez wrote: > > Currently the red-black tree implementation only has an equality search. > Other extensions might need other comparison searches, like less-or-equal > or greater-or-equal. For example OrioleDB defines a greater-or-equal search > on its postgres fork: > > > > > https://github.com/orioledb/postgres/blob/4c18ae94c20e3e95c374b9947f1ace7d1d6497a1/src/backend/lib/rbtree.c#L164-L186 > > > > So I thought this might be valuable to have in core. I've added > less-or-equal and greater-or-equal searches functions plus tests in the > test_rbtree module. I can add the remaining less/great searches if this is > deemed worth it. > > Looks good. But I think we can support strict inequalities too (e.g. > less and greater without equals). Could you please make it a bool > argument equal_matches? > > > Also I refactored the sentinel used in the rbtree.c to use C99 > designators. > > Could you please extract this change as a separate patch. > > -- > Regards, > Alexander Korotkov > From 8046463e2c98fb4c51ec05740cee130bdf2ad533 Mon Sep 17 00:00:00 2001 From: steve-chavez Date: Tue, 28 Jun 2022 23:46:38 -0500 Subject: [PATCH 1/2] Change rbtree sentinel to C99 designator --- src/backend/lib/rbtree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c index a9981dbada..e1cc4110bd 100644 --- a/src/backend/lib/rbtree.c +++ b/src/backend/lib/rbtree.c @@ -62,7 +62,7 @@ struct RBTree static RBTNode sentinel = { - RBTBLACK, RBTNIL, RBTNIL, NULL + .color = RBTBLACK,.left = RBTNIL,.right = RBTNIL,.parent = NULL }; -- 2.34.1 From 5d49ddd764b05083e1c276df0b50d43ec6896931 Mon Sep 17 00:00:00 2001 From: steve-chavez Date: Wed, 29 Jun 2022 16:28:02 -0500 Subject: [PATCH 2/2] Add rbtree missing comparison searches * Add find great/less or equal by adding an argument --- src/backend/lib/rbtree.c | 61 + src/include/lib/rbtree.h | 2 + src/test/modules/test_rbtree/test_rbtree.c | 101 + 3 files changed, 164 insertions(+) diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c index e1cc4110bd..92f6b99164 100644 --- a/src/backend/lib/rbtree.c +++ b/src/backend/lib/rbtree.c @@ -161,6 +161,67 @@ rbt_find(RBTree *rbt, const RBTNode *data) return NULL; } +/* + * rbt_find_great: search for a greater value in an RBTree + * + * If equal_match is true, this will be a great or equal search. + * + * Returns the matching tree entry, or NULL if no match is found. + */ +RBTNode * +rbt_find_great(RBTree *rbt, const RBTNode *data, bool equal_match) +{ + RBTNode*node = rbt->root; + RBTNode*greater = NULL; + + while (node != RBTNIL) + { + int cmp = rbt->comparator(data, node, rbt->arg); + + if (equal_match && cmp == 0) + return node; + else if (cmp < 0) + { + greater = node; + node = node->left; + } + else + node = node->right; + } + + return greater; +} + +/* + * rbt_find_less: search for a lesser value in an RBTree + * + * If equal_match is true, this will be a less or equal search. + * + * Returns the matching tree entry, or NULL if no match is found. + */ +RBTNode * +rbt_find_less(RBTree *rbt, const RBTNode *data, bool equal_match) +{ + RBTNode*node = rbt->root; + RBTNode*lesser = NULL; + + while (node != RBTNIL) + { + int cmp = rbt->comparator(data, node, rbt->arg); + + if (equal_match && cmp == 0) + return node; + else if (cmp > 0){ + lesser = node; + node = node->right; + } + else + node = node->left; + } + + return lesser; +} + /* * rbt_leftmost: fetch the leftmost (smallest-valued) tree node. * Returns NULL if tree is empty. diff --git a/src/include/lib/rbtree.h b/src/include/lib/rbtree.h index 580a3e3439..13cf68415e 100644 --- a/src/include/lib/rbtree.h +++ b/src/include/lib/rbtree.h @@ -67,6 +67,8 @@ extern RBTree *rbt_create(Size node_size, void *arg); extern RBTNode *rbt_find(RBTree *rbt, const RBTNode *data); +extern RBTNode *rbt_find_great(RBTree *rbt, const RBTNode *data, bool equal_match); +extern RBTNode *rbt_find_less(RBTree *rbt, const RBTNode *data, bool equal_match); extern RBTNode *rbt_leftmost(RBTree *rbt); extern RBTNode *rbt_insert(RBTree *rbt, const RBTNode *data, bool *isNew); diff --git a/src/test/modules/test_rbtree/test_rbtree.c b/src/test/modules/test_rbtree/test_rbtree.c index 7cb38759a2..c17d524e01 100644 --- a/src/test/modules/test_rbtree/test_rbtree.c +++ b/src/test/modules/test_rbtree/test_rbtree.c @@ -278,6 +278,105 @@ testfind(int si
Re: AIX support - alignment issues
Hi, On 2022-07-02 11:54:16 -0700, Peter Geoghegan wrote: > I tend to agree about dropping AIX. But I wonder if there is an > argument against that proposal that doesn't rely on AIX being relevant > to at least one user. Has supporting AIX ever led to the discovery of > a bug that didn't just affect AIX? Yes, it clearly has. But I tend to think that that's far outweighed by the complications triggered by AIX support. It'd be a different story if AIX hadn't a very peculiar linking model and was more widely accessible. > What is the best argument *against* desupporting AIX that you know of? Hm. - a distinct set of system libraries that can help find portability issues - With IBM's compiler it adds a, not otherwise used, compiler that PG builds with. So the warnings could theoretically help find issues that we wouldn't otherwise see - but I don't think that's been particularly useful (nor monitored). And the compiler is buggy enough to add a fair bit work over the years. > Desupporting AIX doesn't mean that any AIX users will be left in the > lurch immediately. Obviously these users will be able to use a > supported version of Postgres for several more years. Right. Greetings, Andres Freund
Re: doc: Make selectivity example match wording
On Thu Jun 9, 2022 at 11:57 AM EDT, David G. Johnston wrote: > Reposting this to its own thread. > > https://www.postgresql.org/message-id/flat/CAKFQuwby1aMsJDMeibaBaohgoaZhivAo4WcqHC1%3D9-GDZ3TSng%40mail.gmail.com > > doc: make unique non-null join selectivity example match the prose > > The description of the computation for the unique, non-null, > join selectivity describes a division by the maximum of two values, > while the example shows a multiplication by their reciprocal. While > equivalent the max phrasing is easier to understand; which seems > more important here than precisely adhering to the formula used > in the code (for which either variant is still an approximation). > > While both num_distinct and num_rows are equal for a unique column > both the concept and formula use row count (10,000) and the > field num_distinct has already been set to mean the specific value > present in the pg_stats table (i.e, -1), so use num_rows here. Pointing out that n_distinct = -1 is helpful but changing "because" to "and" suggests that the missing MCV info is coincidental or a side effect. Is there any case in which the stronger "because" wouldn't be appropriate? The second parenthetical (num_rows, not shown, but "tenk") took me a minute to get since the row counts are only apparent on looking somewhat closely at the other examples in the chapter. num_rows also isn't a column in pg_stats which the "not shown" could be taken to imply; it's sourced from somewhere else and only given as num_rows in this example. How's '(as num_rowsN, 10,000 for both "tenk" example tables)'? By "this value does get scaled in the non-unique case" do you mean it relies on n_distinct as in the uncorrected algorithm listing? If so I think it'd help to specify that. You didn't take this line on but "This is, subtract the null fraction..." omits the step of multiplying the complements of the null fractions together before dividing. Should n_distinct and num_rows be d in the text?
Re: AIX support - alignment issues
On Sat, Jul 2, 2022 at 12:22 PM Tom Lane wrote: > Now, it's certainly possible that AIX is the only surviving platform > that hasn't adopted bug-compatible-with-glibc interpretations of > POSIX. But I think the standard is the standard, and we ought to > stay within it. So I find value in these fixes. I imagine that there is strong evolutionary pressure pushing minority platforms in the direction of bug-compatible-with-glibc. There is definitely a similar trend around things like endianness and alignment pickiness. But it wasn't always so. It seems fair to wonder if AIX bucks the glibc-compatible trend because it is already on the verge of extinction. If it wasn't just about dead already then somebody would have gone to the trouble of making it bug-compatible-with-glibc by now. (To be clear, I'm not arguing that this is a good thing.) Maybe it is still worth hanging on to AIX support for the time being, but it would be nice to have some idea of where we *will* finally draw the line. If the complaints from Andres aren't a good enough reason now, then what other hypothetical reasons might be good enough in the future? It seems fairly likely that Postgres desupporting AIX will happen (say) at some time in the next decade, no matter what happens today. -- Peter Geoghegan
Re: AIX support - alignment issues
Peter Geoghegan writes: > Maybe it is still worth hanging on to AIX support for the time being, > but it would be nice to have some idea of where we *will* finally draw > the line. If the complaints from Andres aren't a good enough reason > now, then what other hypothetical reasons might be good enough in the > future? It seems fairly likely that Postgres desupporting AIX will > happen (say) at some time in the next decade, no matter what happens > today. Agreed. But I think that this sort of thing is better driven by "when there's no longer anyone willing to do the legwork" than by project policy. IOW, we'll stop when Noah gets tired of doing it (and no one steps up to take his place). In the case at hand, given that the test added by 79b716cfb/c1da0acbb correctly detects troublesome catalog layouts (and no I've not studied it myself), I don't see that we have to do more right now. I am a little concerned though that we don't have access to the latest version of AIX --- that seems like a non-maintainable situation. regards, tom lane
Re: AIX support - alignment issues
Hi, On 2022-07-02 16:34:35 -0400, Tom Lane wrote: > Agreed. But I think that this sort of thing is better driven by > "when there's no longer anyone willing to do the legwork" than > by project policy. IOW, we'll stop when Noah gets tired of doing > it (and no one steps up to take his place). I do think we should take the impact it has on everyone into account, not just Noah's willingness. If it's just "does somebody still kind of maintain it" then we'll bear the distributed cost of complications for irrelevant platforms way longer than worthwhile. > In the case at hand, given that the test added by 79b716cfb/c1da0acbb > correctly detects troublesome catalog layouts (and no I've not studied > it myself), I don't see that we have to do more right now. What made me look at this issue right now is that the alignment issue lead the 56bit relfilenode patch to move the relfilenode field to the start of pg_class (ahead of the oid), because a 64bit value cannot be after a NameData. Now, I think we can do a bit better by moving a few more fields around. But the restriction still seems quite onerous. Greetings, Andres Freund
Re: AIX support - alignment issues
Andres Freund writes: > What made me look at this issue right now is that the alignment issue lead the > 56bit relfilenode patch to move the relfilenode field to the start of pg_class > (ahead of the oid), Agreed, up with that we should not put. However ... > because a 64bit value cannot be after a NameData. ... this coding rule strikes me as utterly ridiculous. Why can't we instead insist that NAMEDATALEN must be a multiple of 8? Anyone who tries to make it different from that is likely to be wasting padding space even on platforms where there's not a deeper problem. regards, tom lane
Re: O(n) tasks cause lengthy startups and checkpoints
On Fri, Jun 24, 2022 at 11:45:22AM +0100, Simon Riggs wrote: > On Thu, 23 Jun 2022 at 18:15, Nathan Bossart wrote: >> I'm grateful for the discussion in this thread so far, but I'm not seeing a >> clear path forward. > > +1 to add the new auxiliary process. I went ahead and put together a new patch set for this in which I've attempted to address most of the feedback from upthread. Notably, I've abandoned 0007 and 0008, added a way for processes to request specific tasks for the custodian, and removed all the checks for ShutdownRequestPending. I haven't addressed the existing transaction ID wraparound risk with the logical replication files. My instinct is that this deserveѕ its own thread, and it might need to be considered a prerequisite to this change based on the prior discussion here. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 68e3005c14ba116e372a1724dad079914108ab2d Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v6 1/6] Introduce custodian. The custodian process is a new auxiliary process that is intended to help offload tasks could otherwise delay startup and checkpointing. This commit simply adds the new process; it does not yet do anything useful. --- src/backend/postmaster/Makefile | 1 + src/backend/postmaster/auxprocess.c | 8 + src/backend/postmaster/custodian.c | 252 src/backend/postmaster/postmaster.c | 44 - src/backend/storage/ipc/ipci.c | 3 + src/backend/storage/lmgr/proc.c | 1 + src/backend/utils/activity/wait_event.c | 3 + src/backend/utils/init/miscinit.c | 3 + src/include/miscadmin.h | 3 + src/include/postmaster/custodian.h | 20 ++ src/include/storage/proc.h | 11 +- src/include/utils/wait_event.h | 1 + 12 files changed, 345 insertions(+), 5 deletions(-) create mode 100644 src/backend/postmaster/custodian.c create mode 100644 src/include/postmaster/custodian.h diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile index 3a794e54d6..e1e1d1123f 100644 --- a/src/backend/postmaster/Makefile +++ b/src/backend/postmaster/Makefile @@ -18,6 +18,7 @@ OBJS = \ bgworker.o \ bgwriter.o \ checkpointer.o \ + custodian.o \ fork_process.o \ interrupt.o \ pgarch.o \ diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c index 39ac4490db..620a0b1bae 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -20,6 +20,7 @@ #include "pgstat.h" #include "postmaster/auxprocess.h" #include "postmaster/bgwriter.h" +#include "postmaster/custodian.h" #include "postmaster/startup.h" #include "postmaster/walwriter.h" #include "replication/walreceiver.h" @@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype) case CheckpointerProcess: MyBackendType = B_CHECKPOINTER; break; + case CustodianProcess: + MyBackendType = B_CUSTODIAN; + break; case WalWriterProcess: MyBackendType = B_WAL_WRITER; break; @@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype) CheckpointerMain(); proc_exit(1); + case CustodianProcess: + CustodianMain(); + proc_exit(1); + case WalWriterProcess: WalWriterMain(); proc_exit(1); diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c new file mode 100644 index 00..db00282658 --- /dev/null +++ b/src/backend/postmaster/custodian.c @@ -0,0 +1,252 @@ +/*- + * + * custodian.c + * + * The custodian process handles a variety of non-critical tasks that might + * otherwise delay startup, checkpointing, etc. Offloaded tasks should not + * be synchronous (e.g., checkpointing shouldn't wait for the custodian to + * complete a task before proceeding). Also, ensure that any offloaded + * tasks are either not required during single-user mode or are performed + * separately during single-user mode. + * + * The custodian is not an essential process and can shutdown quickly when + * requested. The custodian will wake up approximately once every 5 + * minutes to perform its tasks, but backends can (and should) set its + * latch to wake it up sooner. + * + * + * Copyright (c) 2022, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/postmaster/custodian.c + * + *- + */ +#include "postgres.h" + +#include + +#include "libpq/pqsignal.h" +#include "pgstat.h" +#include "postmaster/custodian.h" +#include "postmaster/interrupt.h" +#include "storage/bufmgr.h" +#include "storage/condition_variable.h" +#include "storage/fd.h" +#include "storage/proc.h" +#include "storage/procsignal.h" +#include "storage/smgr.h" +#include "utils/memutils.h" + +#define CUSTODIAN_TIMEOUT_S (300) /* 5 minutes */ + +typedef struct +{ +
Re: replacing role-level NOINHERIT with a grant-level option
On Sat, Jul 02, 2022 at 08:45:50AM -0400, Robert Haas wrote: > I don't think there is a whole lot of point in replacing role-level > flags with predefined roles that work exactly the same way. Maybe > there is some point, but I'm not excited about it. The problem with > these settings in my opinion is that they are too monolithic. Either > you can create any role with basically any privileges or you can > create no roles at all. Either you are a superuser and can bypass all > permissions checks or you are not and can't bypass any permissions > checks. Okay. I see. >> unparenthesized syntax or add WARNINGs when it is used?) For [NO]INHERIT >> and WITH INHERIT DEFAULT, presumably we could do something similar. Down >> the road, those would be removed in favor of only using grant-level >> options. > > I think it'd be hard to do that if WITH INHERIT DEFAULT is actually > state stored in the catalog. Maybe I should revise this again so that > the catalog column is just true or false, and the role-level property > only sets the default for future grants. That might be more like what > Tom had in mind originally. I was thinking that when DEFAULT was removed, pg_dump would just need to generate WITH INHERIT TRUE/FALSE based on the value of rolinherit for older versions. Using the role-level property as the default for future grants seems a viable strategy, although it would break backward compatibility. For example, if I create a NOINHERIT role, grant a bunch of roles to it, and then change it to INHERIT, the role won't begin inheriting the privileges of the roles it is a member of. Right now, it does. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: O(n) tasks cause lengthy startups and checkpoints
Hi, On 2022-07-02 15:05:54 -0700, Nathan Bossart wrote: > + /* Obtain requested tasks */ > + SpinLockAcquire(&CustodianShmem->cust_lck); > + flags = CustodianShmem->cust_flags; > + CustodianShmem->cust_flags = 0; > + SpinLockRelease(&CustodianShmem->cust_lck); Just resetting the flags to 0 is problematic. Consider what happens if there's two tasks and and the one processed first errors out. You'll loose information about needing to run the second task. > + /* TODO: offloaded tasks go here */ Seems we're going to need some sorting of which tasks are most "urgent" / need to be processed next if we plan to make this into some generic facility. > +/* > + * RequestCustodian > + * Called to request a custodian task. > + */ > +void > +RequestCustodian(int flags) > +{ > + SpinLockAcquire(&CustodianShmem->cust_lck); > + CustodianShmem->cust_flags |= flags; > + SpinLockRelease(&CustodianShmem->cust_lck); > + > + if (ProcGlobal->custodianLatch) > + SetLatch(ProcGlobal->custodianLatch); > +} With this representation we can't really implement waiting for a task or such. And it doesn't seem like a great API for the caller to just specify a mix of flags. > + /* Calculate how long to sleep */ > + end_time = (pg_time_t) time(NULL); > + elapsed_secs = end_time - start_time; > + if (elapsed_secs >= CUSTODIAN_TIMEOUT_S) > + continue; /* no sleep for us */ > + cur_timeout = CUSTODIAN_TIMEOUT_S - elapsed_secs; > + > + (void) WaitLatch(MyLatch, > + WL_LATCH_SET | WL_TIMEOUT | > WL_EXIT_ON_PM_DEATH, > + cur_timeout * 1000L /* convert > to ms */ , > + WAIT_EVENT_CUSTODIAN_MAIN); > + } I don't think we should have this thing wake up on a regular basis. We're doing way too much of that already, and I don't think we should add more. Either we need a list of times when tasks need to be processed and wake up at that time, or just wake up if somebody requests a task. > From 5e95666efa31d6c8aa351e430c37ead6e27acb72 Mon Sep 17 00:00:00 2001 > From: Nathan Bossart > Date: Sun, 5 Dec 2021 21:16:44 -0800 > Subject: [PATCH v6 3/6] Split pgsql_tmp cleanup into two stages. > > First, pgsql_tmp directories will be renamed to stage them for > removal. Then, all files in pgsql_tmp are removed before removing > the staged directories themselves. This change is being made in > preparation for a follow-up change to offload most temporary file > cleanup to the new custodian process. > Note that temporary relation files cannot be cleaned up via the > aforementioned strategy and will not be offloaded to the custodian. > --- > src/backend/postmaster/postmaster.c | 8 +- > src/backend/storage/file/fd.c | 174 > src/include/storage/fd.h| 2 +- > 3 files changed, 160 insertions(+), 24 deletions(-) > > diff --git a/src/backend/postmaster/postmaster.c > b/src/backend/postmaster/postmaster.c > index e67370012f..82aa0c6307 100644 > --- a/src/backend/postmaster/postmaster.c > +++ b/src/backend/postmaster/postmaster.c > @@ -1402,7 +1402,8 @@ PostmasterMain(int argc, char *argv[]) >* Remove old temporary files. At this point there can be no other >* Postgres processes running in this directory, so this should be safe. >*/ > - RemovePgTempFiles(); > + RemovePgTempFiles(true, true); > + RemovePgTempFiles(false, false); This is imo hard to read and easy to get wrong. Make it multiple functions or pass named flags in. > + * StagePgTempDirForRemoval > + * > + * This function renames the given directory with a special prefix that > + * RemoveStagedPgTempDirs() will know to look for. An integer is appended to > + * the end of the new directory name in case previously staged pgsql_tmp > + * directories have not yet been removed. > + */ It doesn't seem great to need to iterate through a directory that contains other files, potentially a significant number. How about having a staged_for_removal/ directory, and then only scanning that? > +static void > +StagePgTempDirForRemoval(const char *tmp_dir) > +{ > + DIR*dir; > + charstage_path[MAXPGPATH * 2]; > + charparent_path[MAXPGPATH * 2]; > + struct stat statbuf; > + > + /* > + * If tmp_dir doesn't exist, there is nothing to stage. > + */ > + dir = AllocateDir(tmp_dir); > + if (dir == NULL) > + { > + if (errno != ENOENT) > + ereport(LOG, > + (errcode_for_file_access(), > + errmsg("could not open directory > \"%s\": %m", tmp_dir))); > + return; > + } > +
Re: First draft of the PG 15 release notes
On Fri, Jul 1, 2022 at 09:56:17AM -0700, Peter Geoghegan wrote: > On Fri, Jul 1, 2022 at 9:41 AM Bruce Momjian wrote: > > > It might be worth explaining the shift directly in the release notes. > > > The new approach is simpler and makes a lot more sense -- why should > > > the relfrozenxid be closely tied to freezing? We don't necessarily > > > > I don't think this is an appropriate detail for the release notes. > > Okay. What about saying something about relminmxid advancement where > the database consumes lots of multixacts? No. same issue. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: First draft of the PG 15 release notes
> > Okay, thanks Bruce. -- Peter Geoghegan
Re: replacing role-level NOINHERIT with a grant-level option
On Sat, Jul 2, 2022 at 6:16 PM Nathan Bossart wrote: > I was thinking that when DEFAULT was removed, pg_dump would just need to > generate WITH INHERIT TRUE/FALSE based on the value of rolinherit for older > versions. Using the role-level property as the default for future grants > seems a viable strategy, although it would break backward compatibility. > For example, if I create a NOINHERIT role, grant a bunch of roles to it, > and then change it to INHERIT, the role won't begin inheriting the > privileges of the roles it is a member of. Right now, it does. I think the idea you propose here is interesting, because I think it proves that committing v2 or something like it doesn't really lock us into the role-level property any more than we already are, which at least makes me feel slightly less bad about that option. However, if there's implacable opposition to any compatibility break at any point, then maybe this plan would never actually be implemented in practice. And if there's not, maybe we can be bolder now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Probable memory leak with ECPG and AIX
On Sat, Jul 02, 2022 at 02:53:34PM -0400, Tom Lane wrote: > This looks solid to me. The only nit I can find to pick is that I'd > have added one more comment, along the lines of > > diff --git a/src/interfaces/ecpg/ecpglib/connect.c > b/src/interfaces/ecpg/ecpglib/connect.c > index 9f958b822c..96f99ae072 100644 > --- a/src/interfaces/ecpg/ecpglib/connect.c > +++ b/src/interfaces/ecpg/ecpglib/connect.c > @@ -508,6 +508,11 @@ ECPGconnect(int lineno, int c, const char *name, const > char *user, const char *p > #ifdef ENABLE_THREAD_SAFETY > pthread_mutex_lock(&connections_mutex); > #endif > + > + /* > + * ... but first, make certain we have created ecpg_clocale. Rely on > + * holding connections_mutex to ensure this is done by only one thread. > + */ > #ifdef HAVE_USELOCALE > if (!ecpg_clocale) > { > > I've marked it RFC. Thanks for reviewing. Pushed with that comment. prairiedog complains[1]: ld: common symbols not allowed with MH_DYLIB output format with the -multi_module option connect.o definition of common _ecpg_clocale (size 4) I bet this would fix it: --- a/src/interfaces/ecpg/ecpglib/connect.c +++ b/src/interfaces/ecpg/ecpglib/connect.c @@ -11,7 +11,7 @@ #include "sqlca.h" #ifdef HAVE_USELOCALE -locale_t ecpg_clocale; +locale_t ecpg_clocale = (locale_t) 0; #endif #ifdef ENABLE_THREAD_SAFETY I hear[1] adding -fno-common to compiler options would also fix that. Still, in the absence of other opinions, I'll just add the no-op initialization. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2022-07-03%2001%3A14%3A19 [2] https://gcc.gnu.org/legacy-ml/gcc/2005-06/msg00378.html
Re: Probable memory leak with ECPG and AIX
Noah Misch writes: > Thanks for reviewing. Pushed with that comment. prairiedog complains[1]: > ld: common symbols not allowed with MH_DYLIB output format with the > -multi_module option > connect.o definition of common _ecpg_clocale (size 4) Blah. > I bet this would fix it: > -locale_t ecpg_clocale; > +locale_t ecpg_clocale = (locale_t) 0; Hmm, I was considering suggesting that just on stylistic grounds, but decided it was too nitpicky even for me. Do you want me to test it on prairiedog? > I hear[1] adding -fno-common to compiler options would also fix that. I've got -fno-common turned on on my other macOS animals, but in those cases I did it to detect bugs not fix them. I'm not sure whether prairiedog's ancient toolchain has that switch at all, or whether it behaves the same as in more recent platforms. Still, that gcc.gnu.org message you cite is of the right era. regards, tom lane
Re: Probable memory leak with ECPG and AIX
On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote: > Noah Misch writes: > > Thanks for reviewing. Pushed with that comment. prairiedog complains[1]: > > ld: common symbols not allowed with MH_DYLIB output format with the > > -multi_module option > > connect.o definition of common _ecpg_clocale (size 4) > > Blah. > > > I bet this would fix it: > > > -locale_t ecpg_clocale; > > +locale_t ecpg_clocale = (locale_t) 0; > > Hmm, I was considering suggesting that just on stylistic grounds, > but decided it was too nitpicky even for me. > Do you want me to test it on prairiedog? Sure, if it's easy enough. If not, I'm 87% sure it will suffice.
Re: Probable memory leak with ECPG and AIX
Noah Misch writes: > On Sat, Jul 02, 2022 at 11:37:08PM -0400, Tom Lane wrote: >> Do you want me to test it on prairiedog? > Sure, if it's easy enough. If not, I'm 87% sure it will suffice. On it now, but it'll take a few minutes :-( regards, tom lane
Re: Probable memory leak with ECPG and AIX
I wrote: > On it now, but it'll take a few minutes :-( Confirmed that either initializing ecpg_clocale or adding -fno-common allows the ecpglib build to succeed. (Testing it beyond that would require another hour or so to build the rest of the system, so I won't.) As I said, I was already leaning to the idea that initializing the variable explicitly is better style, so I recommend we do that. regards, tom lane