Re: Direct SSL connection with ALPN and HBA rules
On Tue, Apr 23, 2024 at 01:48:04AM +0300, Heikki Linnakangas wrote: > Here's the patch for that. The error message is: > > "direct SSL connection was established without ALPN protocol negotiation > extension" WFM. > That's accurate, but I wonder if we could make it more useful to a user > who's wondering what went wrong. I'd imagine that if the server doesn't > support ALPN, it's because you have some kind of a (not necessarily > malicious) generic SSL man-in-the-middle that doesn't support it. Or you're > trying to connect to an HTTPS server. Suggestions welcome. Hmm. Is there any point in calling SSL_get0_alpn_selected() in open_client_SSL() to get the ALPN if current_enc_method is not ENC_DIRECT_SSL? In the documentation of PQsslAttribute(), it is mentioned that empty string is returned for "alpn" if ALPN was not used, however the code returns NULL in this case: SSL_get0_alpn_selected(conn->ssl, &data, &len); if (data == NULL || len == 0 || len > sizeof(alpn_str) - 1) return NULL; -- Michael signature.asc Description: PGP signature
Re: pg_stat_statements and "IN" conditions
> On Mon, Apr 15, 2024 at 06:09:29PM +0900, Sutou Kouhei wrote: > > Thanks. I'm not familiar with this code base but I've > reviewed these patches because I'm interested in this > feature too. Thanks for the review! The commentaries for the first patch make sense to me, will apply. > 0003: > > > diff --git a/contrib/pg_stat_statements/pg_stat_statements.c > > b/contrib/pg_stat_statements/pg_stat_statements.c > > index d7841b51cc9..00eec30feb1 100644 > > --- a/contrib/pg_stat_statements/pg_stat_statements.c > > +++ b/contrib/pg_stat_statements/pg_stat_statements.c > > ... > > @@ -2883,12 +2886,22 @@ generate_normalized_query(JumbleState *jstate, > > const char *query, > > /* The firsts merged constant */ > > else if (!skip) > > { > > + static const uint32 powers_of_ten[] = { > > + 1, 10, 100, > > + 1000, 1, 10, > > + 100, 1000, 1, > > + 10 > > + }; > > + int lower_merged = powers_of_ten[magnitude - 1]; > > + int upper_merged = powers_of_ten[magnitude]; > > How about adding a reverse function of decimalLength32() to > numutils.h and use it here? I was pondering that at some point, but eventually decided to keep it this way, because: * the use case is quite specific, I can't image it's being used anywhere else * it would not be strictly reverse, as the transformation itself is not reversible in the strict sense > > - n_quer_loc += sprintf(norm_query + n_quer_loc, "..."); > > + n_quer_loc += sprintf(norm_query + n_quer_loc, "... > > [%d-%d entries]", > > + lower_merged, > > upper_merged - 1); > > Do we still have enough space in norm_query for this change? > It seems that norm_query expects up to 10 additional bytes > per jstate->clocations[i]. As far as I understand there should be enough space, because we're going to replace at least 10 constants with one merge record. But it's a good point, this should be called out in the commentary explaining why 10 additional bytes are added. > It seems that we can merge 0001, 0003 and 0004 to one patch. > (Sorry. I haven't read all discussions yet. If we already > discuss this, sorry for this noise.) There is a certain disagreement about which portion of this feature makes sense to go with first, thus I think keeping all options open is a good idea. In the end a committer can squash the patches if needed.
POC: make mxidoff 64 bits
Hi! I've been trying to introduce 64-bit transaction identifications to Postgres for quite a while [0]. All this implies, of course, an enormous amount of change that will have to take place in various modules. Consider this, the patch set become too big to be committed “at once”. The obvious solutions was to try to split the patch set into smaller ones. But here it comes a new challenge, not every one of these parts, make Postgres better at the moment. Actually, even switching to a FullTransactionId in PGPROC will have disadvantage in increasing of WAL size [1]. In fact, I believe, we're dealing with the chicken and the egg problem here. Not able to commit full patch set since it is too big to handle and not able to commit parts of it, since they make sense all together and do not help improve Postgres at the moment. But it's not that bad. Since the commit 4ed8f0913bfdb5f, added in [3], we are capable to use 64 bits to indexing SLRUs. PROPOSAL Make multixact offsets 64 bit. RATIONALE It is not very difficult to overflow 32-bit mxidoff. Since, it is created for every unique combination of the transaction for each tuple, including XIDs and respective flags. And when a transaction is added to a specific multixact, it is rewritten with a new offset. In other words, it is possible to overflow the offsets of multixacts without overflowing the multixacts themselves and/or ordinary transactions. I believe, there was something about in the hackers mail lists, but I could not find links now. PFA, patch. Here is a WIP version. Upgrade machinery should be added later. As always, any opinions on a subject a very welcome! [0] https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com [1] https://www.postgresql.org/message-id/flat/CACG%3DezY7msw%2Bjip%3Drtfvnfz051dRqz4s-diuO46v3rAoAE0T0g%40mail.gmail.com [3] https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com -- Best regards, Maxim Orlov. 0001-WIP-mxidoff-to-64bit.patch Description: Binary data
Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()
Andrey M. Borodin 于2024年4月8日周一 17:40写道: > > > > On 27 Sep 2023, at 16:06, tender wang wrote: > > > >Do you have any comments or suggestions on this issue? Thanks. > Hi Tender, > > there are some review comments in the thread, that you might be interested > in. > I'll mark this [0] entry "Waiting on Author" and move to next CF. > > Thanks! > > > Best regards, Andrey Borodin. > > [0]https://commitfest.postgresql.org/47/4549/ I have rebased master and fixed a plan diff case. -- Tender Wang OpenPie: https://en.openpie.com/ v3-0001-Support-materializing-inner-path-on-parallel-oute.patch Description: Binary data
Re: POC: make mxidoff 64 bits
On 23/04/2024 11:23, Maxim Orlov wrote: PROPOSAL Make multixact offsets 64 bit. +1, this is a good next step and useful regardless of 64-bit XIDs. @@ -156,7 +148,7 @@ ((uint32) ((0x % MULTIXACT_MEMBERS_PER_PAGE) + 1)) /* page in which a member is to be found */ -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE) +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) MULTIXACT_MEMBERS_PER_PAGE) #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT) /* Location (byte offset within page) of flag word for a given member */ This is really a bug fix. It didn't matter when TransactionId and MultiXactOffset were both typedefs of uint32, but it was always wrong. The argument name 'xid' is also misleading. I think there are some more like that, MXOffsetToFlagsBitShift for example. -- Heikki Linnakangas Neon (https://neon.tech)
Re: query_id, pg_stat_activity, extended query protocol
On 4/23/24 12:49, Michael Paquier wrote: On Tue, Apr 23, 2024 at 11:42:41AM +0700, Andrei Lepikhov wrote: On 23/4/2024 11:16, Imseih (AWS), Sami wrote: + pgstat_report_query_id(linitial_node(Query, psrc->query_list)->queryId, true); set_ps_display("BIND"); @@ -2146,6 +2147,7 @@ exec_execute_message(const char *portal_name, long max_rows) debug_query_string = sourceText; pgstat_report_activity(STATE_RUNNING, sourceText); + pgstat_report_query_id(portal->queryDesc->plannedstmt->queryId, true); cmdtagname = GetCommandTagNameAndLen(portal->commandTag, &cmdtaglen); In exec_bind_message, how can you be sure that queryId exists in query_list before the call of GetCachedPlan(), which will validate and lock the plan? What if some OIDs were altered in the middle? I am also a bit surprised with the choice of using the first Query available in the list for the ID, FWIW. Did you consider using \bind to show how this behaves in a regression test? I'm not sure how to invent a test based on the \bind command - we need some pause in the middle. But simplistic case with a prepared statement shows how the value of queryId can be changed if you don't acquire all the objects needed for the execution: CREATE TABLE test(); PREPARE name AS SELECT * FROM test; EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name; DROP TABLE test; CREATE TABLE test(); EXPLAIN (ANALYSE, VERBOSE, COSTS OFF) EXECUTE name; /* QUERY PLAN --- Seq Scan on public.test (actual time=0.002..0.004 rows=0 loops=1) Query Identifier: 6750745711909650694 QUERY PLAN --- Seq Scan on public.test (actual time=0.004..0.004 rows=0 loops=1) Query Identifier: -2597546769858730762 */ We have different objects which can be changed - I just have invented the most trivial example to discuss. -- regards, Andrei Lepikhov
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
On 2024-Apr-22, Tom Lane wrote: > The main reason there's a delta is that people don't manage to > maintain the in-tree copy perfectly (at least, they certainly > haven't done so for this past year). So we need to do that > to clean up every now and then. Out of curiosity, I downloaded the buildfarm-generated file and re-indented the whole tree. It turns out that most commits seem to have maintained the in-tree typedefs list correctly when adding entries (even if out of alphabetical order), but a few haven't; and some people have added entries that the buildfarm script does not detect. So the import from BF will delete those entries and mess up the overall indent. For example it does stuff like +++ b/src/backend/commands/async.c @@ -399,7 +399,7 @@ typedef struct NotificationList typedef struct NotificationHash { Notification *event;/* => the actual Notification struct */ -} NotificationHash; +} NotificationHash; There's a good half dozen of those. I wonder if we're interested in keeping a (very short) manually- maintained list of symbols that we know are in use but the scripts don't extract for whatever reason. The change of NotificationHash looks surprising at first sight: apparently 095d109ccd7 deleted the only use of that type as a variable anywhere. But then I wonder if that datatype is useful at all anymore, since it only contains one pointer -- it seems we could just remove it. But there are others: InjectionPointEntry, ResourceOwnerData, JsonNonTerminal, JsonParserSem, ... -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "This is what I like so much about PostgreSQL. Most of the surprises are of the "oh wow! That's cool" Not the "oh shit!" kind. :)" Scott Marlowe, http://archives.postgresql.org/pgsql-admin/2008-10/msg00152.php
Minor document typo
Hi, doc/src/sgml/monitoring.sgml seems to have a minor typo: In pg_stat_database_conflicts section (around line 3621) we have: Number of uses of logical slots in this database that have been canceled due to old snapshots or too low a on the primary I think "too low a" should be "too low" ('a' is not necessary). Attached is the patch. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 053da8d6e4..56e79b1304 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3618,7 +3618,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage Number of uses of logical slots in this database that have been - canceled due to old snapshots or too low a + canceled due to old snapshots or too low on the primary
Re: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Wed, Mar 13, 2024 at 11:59 AM vignesh C wrote: > > On Wed, 13 Mar 2024 at 10:12, Zhijie Hou (Fujitsu) > wrote: > > > > > > For 0002, instead of avoid resetting the latch, is it possible to let the > > logical rep worker wake up the launcher once after attaching ? > > Waking up of the launch process uses the same latch that is used for > subscription creation/modification and apply worker process exit. As > the handling of this latch for subscription creation/modification and > worker process exit can be done only by ApplyLauncherMain, we will not > be able to reset the latch in WaitForReplicationWorkerAttach. I feel > waking up the launcher process might not help in this case as > currently we will not be able to differentiate between worker > attached, subscription creation/modification and apply worker process > exit. > IIUC, even if we set the latch once the worker attaches, the other set_latch by subscription creation/modification or apply_worker_exit could also be consumed due to reset of latch in WaitForReplicationWorkerAttach(). Is that understanding correct? If so, can we use some other way to wake up WaitForReplicationWorkerAttach() say condition variable? The current proposal can fix the issue but looks bit adhoc. -- With Regards, Amit Kapila.
Re: Minor document typo
On Tue, 23 Apr 2024 at 23:17, Tatsuo Ishii wrote: >Number of uses of logical slots in this database that have been >canceled due to old snapshots or too low a linkend="guc-wal-level"/> >on the primary > > I think "too low a" should be "too low" ('a' is not > necessary). Attached is the patch. The existing text looks fine to me. The other form would use "of a" and become "too low of a wal_level on the primary". "too low wal_level on the primary" sounds wrong to my native English-speaking ear. There's some discussion in [1] that might be of interest to you. David [1] https://www.reddit.com/r/grammar/comments/qr9z6e/i_need_help_with_sothat_adj_of_a_sing_noun/?ref=share&ref_source=link
Re: subscription/026_stats test is intermittently slow?
On Tue, Apr 23, 2024 at 11:49 AM vignesh C wrote: > > On Sat, 20 Apr 2024 at 10:30, Alexander Lakhin wrote: > > > > Hello Michael and Robert, > > > > 20.04.2024 05:57, Michael Paquier wrote: > > > On Fri, Apr 19, 2024 at 01:57:41PM -0400, Robert Haas wrote: > > >> It looks to me like in the first run it took 3 minutes for the > > >> replay_lsn to catch up to the desired value, and in the second run, > > >> two seconds. I think I have seen previous instances where something > > >> similar happened, although in those cases I did not stop to record any > > >> details. Have others seen this? Is there something we can/should do > > >> about it? > > > FWIW, I've also seen delays as well with this test on a few occasions. > > > Thanks for mentioning it. > > > > It reminds me of > > https://www.postgresql.org/message-id/858a7622-2c81-1687-d1df-1322dfcb2e72%40gmail.com > > Thanks Alexander for the test, I was able to reproduce the issue with > the test you shared and also verify that the patch at [1] fixes the > same. > One of the issues reported in the thread you referred to has the same symptoms [1]. I'll review and analyze your proposal. [1] - https://www.postgresql.org/message-id/858a7622-2c81-1687-d1df-1322dfcb2e72%40gmail.com -- With Regards, Amit Kapila.
Re: Minor document typo
>> I think "too low a" should be "too low" ('a' is not >> necessary). Attached is the patch. > > The existing text looks fine to me. The other form would use "of a" > and become "too low of a wal_level on the primary". > > "too low wal_level on the primary" sounds wrong to my native > English-speaking ear. > > There's some discussion in [1] that might be of interest to you. > > David > > [1] > https://www.reddit.com/r/grammar/comments/qr9z6e/i_need_help_with_sothat_adj_of_a_sing_noun/?ref=share&ref_source=link Thank you for the explanation. English is difficult :-) Just out of a curiosity, is it possible to say "low a wal_level on the primary"? (just "too" is removed) -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
On Tue, Apr 23, 2024 at 6:23 AM Alvaro Herrera wrote: > I wonder if we're interested in keeping a (very short) manually- > maintained list of symbols that we know are in use but the scripts > don't extract for whatever reason. +1. I think this idea has been proposed and rejected before, but I think it's more important to have our code indented correctly than to be able to rely on a 100% automated process for gathering typedefs. There is of course the risk that the manually generated file will accumulate stale cruft over time, but I don't really see that being a big problem. First, it doesn't cost much to have a few extra symbols in there. Second, I suspect someone will go through it at least every couple of years, if not more often, and figure out which entries are still doing something. -- Robert Haas EDB: http://www.enterprisedb.com
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear hackers, Per recent commit (b29cbd3da), our patch needed to be rebased. Here is an updated version. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch Description: v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch Description: v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch v6-0004-Add-force_alter-option.patch Description: v6-0004-Add-force_alter-option.patch
Re: slightly misleading Error message in guc.c
> On 22 Apr 2024, at 18:04, Tom Lane wrote: > Seems like a useful change Agreed. > ... about like this? Patch LGTM. -- Daniel Gustafsson
Re: GUC-ify walsender MAX_SEND_SIZE constant
On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier wrote: > > On Mon, Apr 22, 2024 at 03:40:01PM +0200, Majid Garoosi wrote: > > Any news, comments, etc. about this thread? > > FWIW, I'd still be in favor of doing a GUC-ification of this part, but > at this stage I'd need more time to do a proper study of a case where > this shows benefits to prove your point, or somebody else could come > in and show it. > > Andres has objected to this change, on the ground that this was not > worth it, though you are telling the contrary. I would be curious to > hear from others, first, so as we gather more opinions to reach a > consensus. I'm more with Andres on this one.I vaguely remember researching impact of MAX_SEND_SIZE on independent two occasions (earlier async and more recent sync case where I've investigated variety of ways to keep latency down) and my summary would be: First: it's very hard to get *reliable* replication setup for benchmark, where one could demonstrate correlation between e.g. increasing MAX_SEND_SIZE and observing benefits (in sync rep it is easier, as you are simply stalled in pgbench). Part of the problem are the following things: a) workload can be tricky, for this purpose it needs to be trivial but bulky b) it needs to be on isolated network and with guaranteed bandwidth c) wal_init_zero impact should be ruled out d) OS should be properly tuned autotuning TCP max(3rd value) + have setup rmem_max/wmem_max properly e) more serious TCP congestion should be used that the default one in OS f) one should prevent any I/O stalls on walreceiver writeback during huge WAL activity and restart points on standby (dirty_bytes and so on, isolated pg_xlog, BDI max_ratio) Second: once you perform above and ensure that there are no network or I/O stalls back then I *think* I couldn't see any impact of playing with MAX_SEND_SIZE from what I remember as probably something else is saturated first. I can offer help with trying to test this with artificial tests and even injecting proper latency (WAN-like), but OP (Majid) I think needs first describe his env much better (exact latency, bandwidth, workload, TCP sysctl values, duration of the tests, no# of attempts tried, exact commands used, iperf3 TCP results demonstrating hw used and so on). So in short the patch is easy, but demonstrating the effect and persuading others here would be hard. -J.
Re: POC: make mxidoff 64 bits
> On 23 Apr 2024, at 11:23, Maxim Orlov wrote: > > Make multixact offsets 64 bit. - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), -errmsg("multixact \"members\" limit exceeded"), Personally, I'd be happy with this! We had some incidents where the only mitigation was vacuum settings tweaking. BTW as a side note... I see lot's of casts to (unsigned long long), can't we just cast to MultiXactOffset? Best regards, Andrey Borodin.
Re: POC: make mxidoff 64 bits
Hi Maxim Orlov Thank you so much for your tireless work on this. Increasing the WAL size by a few bytes should have very little impact with today's disk performance(Logical replication of this feature wal log is also increased a lot, logical replication is a milestone new feature, and the community has been improving the logical replication of functions),I believe removing troubled postgresql Transaction ID Wraparound was also a milestone new feature adding a few bytes is worth it! Best regards On Tue, 23 Apr 2024 at 17:37, Heikki Linnakangas wrote: > On 23/04/2024 11:23, Maxim Orlov wrote: > > PROPOSAL > > Make multixact offsets 64 bit. > > +1, this is a good next step and useful regardless of 64-bit XIDs. > > > @@ -156,7 +148,7 @@ > > ((uint32) ((0x % MULTIXACT_MEMBERS_PER_PAGE) + 1)) > > > > /* page in which a member is to be found */ > > -#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) > MULTIXACT_MEMBERS_PER_PAGE) > > +#define MXOffsetToMemberPage(xid) ((xid) / (MultiXactOffset) > MULTIXACT_MEMBERS_PER_PAGE) > > #define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / > SLRU_PAGES_PER_SEGMENT) > > > > /* Location (byte offset within page) of flag word for a given member */ > > This is really a bug fix. It didn't matter when TransactionId and > MultiXactOffset were both typedefs of uint32, but it was always wrong. > The argument name 'xid' is also misleading. > > I think there are some more like that, MXOffsetToFlagsBitShift for example. > > -- > Heikki Linnakangas > Neon (https://neon.tech) > > > >
Re: Minor document typo
On Wed, 24 Apr 2024 at 00:11, Tatsuo Ishii wrote: > Just out of a curiosity, is it possible to say "low a wal_level on the > primary"? (just "too" is removed) Prefixing the adjective with "too" means it's beyond the acceptable range. "This coffee is too hot". https://dictionary.cambridge.org/grammar/british-grammar/too David
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Alvaro Herrera writes: > On 2024-Apr-22, Tom Lane wrote: >> The main reason there's a delta is that people don't manage to >> maintain the in-tree copy perfectly (at least, they certainly >> haven't done so for this past year). So we need to do that >> to clean up every now and then. > Out of curiosity, I downloaded the buildfarm-generated file and > re-indented the whole tree. It turns out that most commits seem to have > maintained the in-tree typedefs list correctly when adding entries (even > if out of alphabetical order), but a few haven't; and some people have > added entries that the buildfarm script does not detect. Yeah. I believe that happens when there is no C variable or field anywhere that has that specific struct type. In your example, NotificationHash appears to only be referenced in a sizeof() call, which suggests that maybe the coding is a bit squirrely and could be done another way. Having said that, there already are manually-curated lists of inclusions and exclusions hard-wired into pgindent (see around line 70). I wouldn't have any great objection to adding more entries there. Or if somebody wanted to do the work, they could be pulled out into separate files. regards, tom lane
pg_trgm comparison bug on cross-architecture replication due to different char implementation
Hi all, I would like to report an issue with the pg_trgm extension on cross-architecture replication scenarios. When an x86_64 standby server is replicating from an aarch64 primary server or vice versa, the gist_trgm_ops opclass returns different results on the primary and standby. Masahiko previously reported a similar issue affecting the pg_bigm extension [1]. To reproduce, execute the following on the x86_64 primary server: CREATE EXTENSION pg_trgm; CREATE TABLE tbl (c text); CREATE INDEX ON tbl USING gist (c gist_trgm_ops); INSERT INTO tbl VALUES ('Bóbr'); On the x86_64 primary server: postgres=> select * from tbl where c like '%Bób%'; c -- Bóbr (1 row) On the aarch64 replica server: postgres=> select * from tbl where c like '%Bób%'; c --- (0 rows) The root cause looks the same as the pg_bigm issue that Masahiko reported. To compare trigrams, pg_trgm uses a numerical comparison of chars [2]. On x86_64 a char is signed by default, whereas on aarch64 it is unsigned by default. gist_trgm_ops expects the trigram list to be sorted, but due to the different signedness of chars, the sort order is broken when replicating the values across architectures. The different sort behaviour can be demonstrated using show_trgm. On the x86_64 primary server: postgres=> SELECT show_trgm('Bóbr'); show_trgm -- {0x89194c," b","br ",0x707c72,0x7f7849} (1 row) On the aarch64 replica server: postgres=> SELECT show_trgm('Bóbr'); show_trgm -- {" b","br ",0x707c72,0x7f7849,0x89194c} (1 row) One simple solution for this specific case is to declare the char signedness in the CMPPCHAR macro. --- a/contrib/pg_trgm/trgm.h +++ b/contrib/pg_trgm/trgm.h @@ -42,7 +42,7 @@ typedef char trgm[3]; #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) ) -#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const char*)(b))+i) ) +#define CMPPCHAR(a,b,i) CMPCHAR( *(((unsigned char*)(a))+i), *(((unsigned char*)(b))+i) ) #define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) ) #define CPTRGM(a,b) do { \ Alternatively, Postgres can be compiled with -funsigned-char or -fsigned-char. I came across a code comment suggesting that this may not be a good idea in general [3]. Given that this has problem has come up before and seems likely to come up again, I'm curious what other broad solutions there might be to resolve it? Looking forward to any feedback, thanks! Best, Adam Guo Amazon Web Services: https://aws.amazon.com [1] https://osdn.net/projects/pgbigm/lists/archive/hackers/2024-February/000370.html [2] https://github.com/postgres/postgres/blob/480bc6e3ed3a5719cdec076d4943b119890e8171/contrib/pg_trgm/trgm.h#L45 [3] https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/cash.c#L114-L123
Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation
"Guo, Adam" writes: > I would like to report an issue with the pg_trgm extension on > cross-architecture replication scenarios. When an x86_64 standby > server is replicating from an aarch64 primary server or vice versa, > the gist_trgm_ops opclass returns different results on the primary > and standby. I do not think that is a supported scenario. Hash functions and suchlike are not guaranteed to produce the same results on different CPU architectures. As a quick example, I get regression=# select hashfloat8(34); hashfloat8 21570837 (1 row) on x86_64 but postgres=# select hashfloat8(34); hashfloat8 -602898821 (1 row) on ppc32 thanks to the endianness difference. > Given that this has problem has come up before and seems likely to > come up again, I'm curious what other broad solutions there might be > to resolve it? Reject as not a bug. Discourage people from thinking that physical replication will work across architectures. regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, Apr 22, 2024 at 5:19 PM Jelte Fennema-Nio wrote: > On Mon, 22 Apr 2024 at 16:26, Robert Haas wrote: > > That's a fair point, but I'm still not seeing much practical > > advantage. It's unlikely that a client is going to set a random bit in > > a format parameter for no reason. > > I think you're missing an important point of mine here. The client > wouldn't be "setting a random bit in a format parameter for no > reason". The client would decide it is allowed to set this bit, > because the PG version it connected to supports column encryption > (e.g. PG18). But this completely breaks protocol and application layer > separation. I can't see what the problem is here. If the client is connected to a database that contains encrypted columns, and its response to seeing an encrypted column is to set this bit, that's fine and nothing should break. If a client doesn't know about encrypted columns and sets that bit at random, that will break things, and formally I think that's a risk, because I don't believe we document anywhere that you shouldn't set unused bits in the format mask. But practically, it's not likely. (And also, maybe we should document that you shouldn't do that.) > It doesn't seem completely outside of the realm of possibility for a > pooler to gather some statistics on the amount of Bind messages that > use text vs binary query parameters. That's very easily doable now, > while looking only at the protocol layer. If a client then sets the > new format parameter bit, this pooler could then get confused and > close the connection. Right, this is the kind of risk I was worried about. I think it's similar to my example of a client setting an unused bit for no reason and breaking everything. Here, you've hypothesized a pooler that tries to interpret the bit and just errors out when it sees something it doesn't understand. I agree that *formally* this is enough to justify bumping the protocol version, but I think *practically* it isn't, because the incompatibility is so minor as to inconvenience almost nobody, whereas changing the protocol version affects everybody. Let's consider a hypothetical country much like Canada except that there are three official languages rather than two: English, French, and Robertish. Robertish is just like English except that the meanings of the words cabbage and rutabaga are reversed. Shall we mandate that all signs in the country be printed in three languages rather than two? Formally, we ought, because the substantial minority of our hypothetical country that proudly speaks Robertish as their mother tongue will not want to feel that they are second class citizens. But practically, there are very few situations where the differences between the two languages are going to inconvenience anyone. Indeed, the French speakers might be a bit put out if English is effectively represented twice on every sign while their mother tongue is there only once. Of course, people are entitled to organize their countries politically in any way that works for the people who live in them, but as a practical matter, English and Robertish are mutually intelligible. And so here. If someone codes a connection pooler in the way you suppose, then it will break. But, first of all, they probably won't do that, both because it's not particularly likely that someone wants to gather that particular set of statistics and also because erroring out seems like an overreaction. And secondly, let's imagine that we do bump the protocol version and think about whether and how that solves the problem. A client will request from the pooler a version 3.1 connection and the pooler will say, sorry, no can do, I only understand 3.0. So the client will now say, oh ok, no problem, I'm going to refrain from setting that parameter format bit. Cool, right? Well, no, not really. First, now the client application is probably broken. If the client is varying its behavior based on the server's protocol version, that must mean that it cares about accessing encrypted columns, and that means that the bit in question is not an optional feature. So actually, the fact that the pooler can force the client to downgrade hasn't fixed anything at all. Second, if the connection pooler were written to do something other than close the connection, like say mask out the one bit that it knows how to deal with or have an "unknown" bucket to count values that it doesn't recognize, then it wouldn't have needed to care about the protocol version in the first place. It would have been better off not even knowing, because then it wouldn't have forced a downgrade onto the client application for no real reason. Throwing an error wasn't a wrong decision on the part of the person writing the pooler, but there are other things they could have done that would have been less brittle. Third, applications, drivers, and connection poolers now all need to worry about handling downgrades smoothly. If a connection pooler requests a v3.1 conne
Background Processes in Postgres Extension
Hey, I'm developing a postgres extension as a custom Table Interface method definition. WIthin the extension, I"m planning to create two background processes using `fork()` that will process data in the background. Are there any recommendations / guidelines around creating background processes within extensions in postgres? Thanks, Sushrut
Re: pgsql: Introduce "builtin" collation provider.
On 2024-03-14 Th 02:39, Jeff Davis wrote: Introduce "builtin" collation provider. The new value "b" for pg_collation.collprovider doesn't seem to be documented. Is that deliberate? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: slightly misleading Error message in guc.c
Daniel Gustafsson writes: >> On 22 Apr 2024, at 18:04, Tom Lane wrote: >> Seems like a useful change > Agreed. >> ... about like this? > Patch LGTM. Pushed. regards, tom lane
Re: Background Processes in Postgres Extension
Sushrut Shivaswamy writes: > I'm developing a postgres extension as a custom Table Interface method > definition. > WIthin the extension, I"m planning to create two background processes using > `fork()` that will process data in the background. > Are there any recommendations / guidelines around creating background > processes within extensions in postgres? fork() is entirely the wrong way to do it, mainly because when the creating session exits the postmaster will be unaware of those now-disconnected child processes. See the mechanisms for creating background worker processes (start with bgworker.h). regards, tom lane
Re: Popcount optimization using AVX512
On Thu, Apr 18, 2024 at 05:13:58PM -0500, Nathan Bossart wrote: > Makes sense, thanks. I'm planning to commit this fix sometime early next > week. Committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: BitmapHeapScan streaming read user and prelim refactoring
On Mon, Apr 22, 2024 at 1:01 PM Melanie Plageman wrote: > > On Thu, Apr 18, 2024 at 5:39 AM Tomas Vondra > wrote: > > > > On 4/18/24 09:10, Michael Paquier wrote: > > > On Sun, Apr 07, 2024 at 10:54:56AM -0400, Melanie Plageman wrote: > > >> I've added an open item [1], because what's one open item when you can > > >> have two? (me) > > > > > > And this is still an open item as of today. What's the plan to move > > > forward here? > > > > AFAIK the plan is to replace the asserts with actually resetting the > > rs_empty_tuples_pending field to 0, as suggested by Melanie a week ago. > > I assume she was busy with the post-freeze AM reworks last week, so this > > was on a back burner. > > yep, sorry. Also I took a few days off. I'm just catching up today. I > want to pop in one of Richard or Tomas' examples as a test, since it > seems like it would add some coverage. I will have a patch soon. The patch with a fix is attached. I put the test in src/test/regress/sql/join.sql. It isn't the perfect location because it is testing something exercisable with a join but not directly related to the fact that it is a join. I also considered src/test/regress/sql/select.sql, but it also isn't directly related to the query being a SELECT query. If there is a better place for a test of a bitmap heap scan edge case, let me know. One other note: there is some concurrency effect in the parallel schedule group containing "join" where you won't trip the assert if all the tests in that group in the parallel schedule are run. But, if you would like to verify that the test exercises the correct code, just reduce the group containing "join". - Melanie From 6ad777979c335f6cc16d3936defb634176a44995 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Tue, 23 Apr 2024 11:45:37 -0400 Subject: [PATCH v1] BitmapHeapScan: Remove incorrect assert 04e72ed617be pushed the skip fetch optimization (allowing bitmap heap scans to operate like index-only scans if none of the underlying data is needed) into heap AM-specific bitmap heap scan code. 04e72ed617be added an assert that all tuples in blocks eligible for the optimization had been NULL-filled and emitted by the end of the scan. This assert is incorrect when not all tuples need be scanned to execute the query; for example: a join in which not all inner tuples need to be scanned before skipping to the next outer tuple. Author: Melanie Plageman Reviewed-by: Richard Guo, Tomas Vondra Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com --- src/backend/access/heap/heapam.c | 4 ++-- src/test/regress/expected/join.out | 36 ++ src/test/regress/sql/join.sql | 24 3 files changed, 62 insertions(+), 2 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 4a4cf76269d..8906f161320 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -1184,7 +1184,7 @@ heap_rescan(TableScanDesc sscan, ScanKey key, bool set_params, scan->rs_vmbuffer = InvalidBuffer; } - Assert(scan->rs_empty_tuples_pending == 0); + scan->rs_empty_tuples_pending = 0; /* * The read stream is reset on rescan. This must be done before @@ -1216,7 +1216,7 @@ heap_endscan(TableScanDesc sscan) if (BufferIsValid(scan->rs_vmbuffer)) ReleaseBuffer(scan->rs_vmbuffer); - Assert(scan->rs_empty_tuples_pending == 0); + scan->rs_empty_tuples_pending = 0; /* * Must free the read stream before freeing the BufferAccessStrategy. diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 8b640c2fc2f..ce73939c267 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -8956,3 +8956,39 @@ where exists (select 1 from j3 (13 rows) drop table j3; +-- Check the case when: +-- 1. A join doesn't require all inner tuples to be scanned for each outer +-- tuple, and +-- 2. The inner side is scanned using a bitmap heap scan, and +-- 3. The bitmap heap scan is eligible for the "skip fetch" optimization. +-- This optimization is usable when no data from the underlying table is +-- needed. +CREATE TABLE skip_fetch (a INT, b INT) WITH (fillfactor=10); +INSERT INTO skip_fetch SELECT i % 3, i FROM GENERATE_SERIES(0,30) i; +CREATE INDEX ON skip_fetch(a); +VACUUM (ANALYZE) skip_fetch; +SET enable_indexonlyscan = off; +set enable_indexscan = off; +SET enable_seqscan = off; +EXPLAIN (COSTS OFF) +SELECT t1.a FROM skip_fetch t1 LEFT JOIN skip_fetch t2 ON t2.a = 1 WHERE t2.a IS NULL; + QUERY PLAN +- + Nested Loop Anti Join + -> Seq Scan on skip_fetch t1 + -> Materialize + -> Bitmap Heap Scan on skip_fetch t2 + Recheck Cond: (a = 1) + -> Bitmap Index Scan on skip_fetch_a_idx + Index Cond: (a = 1) +(7 rows)
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, Apr 23, 2024 at 04:59:09AM +, Bertrand Drouvot wrote: > Hi, > > On Mon, Apr 22, 2024 at 03:00:00PM +0300, Alexander Lakhin wrote: > > 22.04.2024 13:52, Bertrand Drouvot wrote: > > > > > > That's weird, I just launched it several times with the patch applied and > > > I'm not > > > able to see the seg fault (while I can see it constently failing on the > > > master > > > branch). > > > > > > Are you 100% sure you tested it against a binary with the patch applied? > > > > > > > Yes, at least I can't see what I'm doing wrong. Please try my > > self-contained script attached. > > Thanks for sharing your script! > > Yeah your script ensures the patch is applied before the repro is executed. > > I do confirm that I can also see the issue with the patch applied (but I had > to > launch multiple attempts, while on master one attempt is enough). > > I'll have a look. Please find attached v2 that should not produce the issue anymore (I launched a lot of attempts without any issues). v1 was not strong enough as it was not always checking for the dependent object existence. v2 now always checks if the object still exists after the additional lock acquisition attempt while recording the dependency. I still need to think about v2 but in the meantime could you please also give v2 a try on you side? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From f80fdfc32e791463555aa318f26ff5e7363ac3ac Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 29 Mar 2024 15:43:26 + Subject: [PATCH v2] Avoid orphaned objects dependencies It's currently possible to create orphaned objects dependencies, for example: Scenario 1: session 1: begin; drop schema schem; session 2: create a function in the schema schem session 1: commit; With the above, the function created in session 2 would be linked to a non existing schema. Scenario 2: session 1: begin; create a function in the schema schem session 2: drop schema schem; session 1: commit; With the above, the function created in session 1 would be linked to a non existing schema. To avoid those scenarios, a new lock (that conflicts with a lock taken by DROP) has been put in place when the dependencies are being recorded. With this in place, the drop schema in scenario 2 would be locked. Also, after locking the object, the patch checks that the object still exists: with this in place session 2 in scenario 1 would be locked and would report an error once session 1 committs (that would not be the case should session 1 abort the transaction). The patch adds a few tests for some dependency cases (that would currently produce orphaned objects): - schema and function (as the above scenarios) - function and type - table and type --- src/backend/catalog/dependency.c | 48 + src/backend/catalog/objectaddress.c | 70 +++ src/backend/catalog/pg_depend.c | 6 ++ src/include/catalog/dependency.h | 1 + src/include/catalog/objectaddress.h | 1 + src/test/modules/Makefile | 1 + src/test/modules/meson.build | 1 + .../test_dependencies_locks/.gitignore| 3 + .../modules/test_dependencies_locks/Makefile | 14 .../expected/test_dependencies_locks.out | 49 + .../test_dependencies_locks/meson.build | 12 .../specs/test_dependencies_locks.spec| 39 +++ 12 files changed, 245 insertions(+) 38.1% src/backend/catalog/ 30.7% src/test/modules/test_dependencies_locks/expected/ 19.5% src/test/modules/test_dependencies_locks/specs/ 8.7% src/test/modules/test_dependencies_locks/ diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index d4b5b2ade1..e3b66025dd 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -1517,6 +1517,54 @@ AcquireDeletionLock(const ObjectAddress *object, int flags) } } +/* + * depLockAndCheckObject + * + * Lock the object that we are about to record a dependency on. + * After it's locked, verify that it hasn't been dropped while we + * weren't looking. If the object has been dropped, this function + * does not return! + */ +void +depLockAndCheckObject(const ObjectAddress *object) +{ + char *object_description; + + /* + * Those don't rely on LockDatabaseObject() when being dropped (see + * AcquireDeletionLock()). Also it looks like they can not produce + * orphaned dependent objects when being dropped. + */ + if (object->classId == RelationRelationId || object->classId == AuthMemRelationId) + return; + + object_description = getObjectDescription(object, true); + + /* assume we should lock the whole object not a sub-object */ + LockDatabaseObject(object->classId, object->objectId, 0, AccessShareLock); + + /* check if object still exists */ + if (!ObjectByIdExist(object, false)) + { + /* + *
Re: Statistics Import and Export
On Tue, 23 Apr 2024, 05:52 Tom Lane, wrote: > Jeff Davis writes: > > On Mon, 2024-04-22 at 16:19 -0400, Tom Lane wrote: > >> Loading data without stats, and hoping > >> that auto-analyze will catch up sooner not later, is exactly the > >> current behavior that we're doing all this work to get out of. > > > That's the disconnect, I think. For me, the main reason I'm excited > > about this work is as a way to solve the bad-plans-after-upgrade > > problem and to repro planner issues outside of production. Avoiding the > > need to ANALYZE at the end of a data load is also a nice convenience, > > but not a primary driver (for me). > > Oh, I don't doubt that there are use-cases for dumping stats without > data. I'm just dubious about the reverse. I think data+stats should > be the default, even if only because pg_dump's default has always > been to dump everything. Then there should be a way to get stats > only, and maybe a way to get data only. Maybe this does argue for a > four-section definition, despite the ensuing churn in the pg_dump API. I've heard of use cases where dumping stats without data would help with production database planner debugging on a non-prod system. Sure, some planner inputs would have to be taken into account too, but having an exact copy of production stats is at least a start and can help build models and alerts for what'll happen when the tables grow larger with the current stats. As for other planner inputs: table size is relatively easy to shim with sparse files; cumulative statistics can be copied from a donor replica if needed, and btree indexes only really really need to contain their highest and lowest values (and need their height set correctly). Kind regards, Matthias van de Meent
Re: gcc 12.1.0 warning
Hi, On 2024-04-15 11:25:05 +0300, Nazir Bilal Yavuz wrote: > I am able to reproduce this. I regenerated the debian bookworm image > and ran CI on REL_15_STABLE with this image. > > CI Run: https://cirrus-ci.com/task/4978799442395136 Hm, not sure why I wasn't able to repro - now I can. It actually seems like a legitimate warning: The caller allocates the key as static struct config_generic * find_option(const char *name, bool create_placeholders, bool skip_errors, int elevel) { const char **key = &name; and then does res = (struct config_generic **) bsearch((void *) &key, (void *) guc_variables, num_guc_variables, sizeof(struct config_generic *), guc_var_compare); while guc_var_compare() assume it's being passed a full config_generic: static int guc_var_compare(const void *a, const void *b) { const struct config_generic *confa = *(struct config_generic *const *) a; const struct config_generic *confb = *(struct config_generic *const *) b; return guc_name_compare(confa->name, confb->name); } which several versions of gcc then complain about: In function ‘guc_var_compare’, inlined from ‘bsearch’ at /usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:33:23, inlined from ‘find_option’ at /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5640:35: /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5727:38: warning: array subscript ‘const struct config_generic[0]’ is partly outside array bounds of ‘const char[8]’ [-Warray-bounds=] 5727 | return guc_name_compare(confa->name, confb->name); | ~^~ /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c: In function ‘find_option’: /home/andres/src/postgresql-15/src/backend/utils/misc/guc.c:5627:25: note: object ‘name’ of size 8 5627 | find_option(const char *name, bool create_placeholders, bool skip_errors, Which seems entirely legitimate. ISTM that guc_var_compare() ought to only cast the pointers to the key type, i.e. char *. And incidentally that does prevent the warning. The reason it doesn't happen in newer versions of postgres is that we aren't using guc_var_compare() in the relevant places anymore... Greetings, Andres Freund
Re: Cleanup: remove unused fields from nodes
Michael Paquier writes: > On Mon, Apr 22, 2024 at 06:46:27PM +0200, Matthias van de Meent wrote: >> On Mon, 22 Apr 2024 at 17:41, Tom Lane wrote: >>> I think it would be a good idea to push 0003 for v17, just so nobody >>> grows an unnecessary dependency on that field. 0001 and 0005 could >>> be left for v18, but on the other hand they're so trivial that it >>> could also be sensible to just push them to get them out of the way. > Tweaking the APIs should be OK until GA, as long as we agree that the > current interfaces can be improved. > 0003 is new in v17, so let's apply it now. I don't see much a strong > argument in waiting for the removal of 0001 and 0005, either, to keep > the interfaces cleaner moving on. However, this is not a regression > and these have been around for years, so I'd suggest for v18 to open > before moving on with the removal. I went ahead and pushed 0001 and 0003, figuring there was little point in waiting on 0001. I'd intended to push 0005 (remove "isall") as well, but it failed check-world: diff -U3 /home/postgres/pgsql/contrib/pg_stat_statements/expected/utility.out /home/postgres/pgsql/contrib/pg_stat_statements/results/utility.out --- /home/postgres/pgsql/contrib/pg_stat_statements/expected/utility.out 2023-12-08 15:14:55.689347888 -0500 +++ /home/postgres/pgsql/contrib/pg_stat_statements/results/utility.out 2024-04-23 12:17:22.187721947 -0400 @@ -536,12 +536,11 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; calls | rows | query ---+--+ - 2 |0 | DEALLOCATE $1 - 2 |0 | DEALLOCATE ALL + 4 |0 | DEALLOCATE $1 2 |2 | PREPARE stat_select AS SELECT $1 AS a 1 |1 | SELECT $1 as a 1 |1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t -(5 rows) +(4 rows) SELECT pg_stat_statements_reset() IS NOT NULL AS t; That is, query jumbling no longer distinguishes "DEALLOCATE x" from "DEALLOCATE ALL", because the DeallocateStmt.name field is marked query_jumble_ignore. Now maybe that's fine, but it's a point we'd not considered so far in this thread. Thoughts? regards, tom lane
Re: Direct SSL connection with ALPN and HBA rules
On Fri, Apr 19, 2024 at 2:43 PM Heikki Linnakangas wrote: > > On 19/04/2024 19:48, Jacob Champion wrote: > > On Fri, Apr 19, 2024 at 6:56 AM Heikki Linnakangas wrote: > >> With direct SSL negotiation, we always require ALPN. > > > > (As an aside: I haven't gotten to test the version of the patch that > > made it into 17 yet, but from a quick glance it looks like we're not > > rejecting mismatched ALPN during the handshake as noted in [1].) > > Ah, good catch, that fell through the cracks. Agreed, the client should > reject a direct SSL connection if the server didn't send ALPN. I'll add > that to the Open Items so we don't forget again. Yes, the client should also reject, but that's not what I'm referring to above. The server needs to fail the TLS handshake itself with the proper error code (I think it's `no_application_protocol`?); otherwise a client implementing a different protocol could consume the application-level bytes coming back from the server and act on them. That's the protocol confusion attack from ALPACA we're trying to avoid. > > Well, assuming the user is okay with plaintext negotiation at all. > > (Was that fixed before the patch went in? Is GSS negotiation still > > allowed even with requiredirect?) > > To disable sending the startup packet in plaintext, you need to use > sslmode=require. Same as before the patch. GSS is still allowed, as it > takes precedence over SSL if both are enabled in libpq. Per the docs: > > > Note that if gssencmode is set to prefer, a GSS connection is > > attempted first. If the server rejects GSS encryption, SSL is > > negotiated over the same TCP connection using the traditional > > postgres protocol, regardless of sslnegotiation. In other words, the > > direct SSL handshake is not used, if a TCP connection has already > > been established and can be used for the SSL handshake. Oh. That's actually disappointing, since gssencmode=prefer is the default. A question I had in the original thread was, what's the rationale behind a "require direct ssl" option that doesn't actually require it? > >> What would be the use case of requiring > >> direct SSL in the server? What extra security do you get? > > > > You get protection against attacks that could have otherwise happened > > during the plaintext portion of the handshake. That has architectural > > implications for more advanced uses of SCRAM, and it should prevent > > any repeats of CVE-2021-23222/23214. And if the peer doesn't pass the > > TLS handshake, they can't send you anything that you might forget is > > untrusted (like, say, an error message). > > Can you elaborate on the more advanced uses of SCRAM? If you're using SCRAM to authenticate the server, as opposed to just a really strong password auth, then it really helps an analysis of the security to know that there are no plaintext bytes that have been interpreted by the client. This came up briefly in the conversations that led to commit d0f4824a. To be fair, it's a more academic concern at the moment; my imagination can only come up with problems for SCRAM-based TLS that would also be vulnerabilities for standard certificate-based TLS. But whether or not it's an advantage for the code today is also kind of orthogonal to my point. The security argument of direct SSL mode is that it reduces risk for the system as a whole, even in the face of future code changes or regressions. If you can't force its use, you're not reducing that risk very much. (If anything, a "require" option that doesn't actually require it makes the analysis more complicated, not less...) > >> Controlling these in HBA is a bit inconvenient, because you only find > >> out after authentication if it's allowed or not. So if e.g. direct SSL > >> connections are disabled for a user, > > > > Hopefully disabling direct SSL piecemeal is not a desired use case? > > I'm not sure it makes sense to focus on that. Forcing it to be enabled > > shouldn't have the same problem, should it? > > Forcing it to be enabled piecemeal based on role or database has similar > problems. Hm. For some reason I thought it was easier the other direction, but I can't remember why I thought that. I'll withdraw the comment for now :) --Jacob
Re: Direct SSL connection with ALPN and HBA rules
On Mon, Apr 22, 2024 at 10:42 PM Michael Paquier wrote: > > On Mon, Apr 22, 2024 at 10:47:51AM +0300, Heikki Linnakangas wrote: > > On 22/04/2024 10:19, Michael Paquier wrote: > >> As a whole, I can get behind a unique GUC that forces the use of > >> direct. Or, we could extend the existing "ssl" GUC with a new > >> "direct" value to accept only direct connections and restrict the > >> original protocol (and a new "postgres" for the pre-16 protocol, > >> rejecting direct?), while "on" is able to accept both. > > > > I'd be OK with that, although I still don't really see the point of forcing > > this from the server side. We could also add this later. > > I'd be OK with doing something only in v18, if need be. Jacob, what > do you think? I think it would be nice to have an option like that. Whether it's done now or in 18, I don't have a strong opinion about. But I do think it'd be helpful to have a consensus on whether or not this is a security improvement, or a performance enhancement only, before adding said option. As it's implemented, if the requiredirect option doesn't actually requiredirect, I think it looks like security but isn't really. (My ideal server-side option removes all plaintext negotiation and forces the use of direct SSL for every connection, paired with a new postgresqls:// scheme for the client. But I don't have any experience making a switchover like that at scale, and I'd like to avoid a StartTLS-vs-LDAPS sort of situation. That's obviously not a conversation for 17.) As for HBA control: overall, I don't see a burning need for an HBA-based configuration, honestly. I'd prefer to reduce the number of knobs and make it easier to apply the strongest security with a broad brush. --Jacob
Re: soliciting patches to review
Hi, Just a quick update. We have so far had 8 suggested patches from 6 people, if I haven't missed anything. I'm fairly certain that not all of those patches are going to be good candidates for this session, so it would be great if a few more people wanted to volunteer their patches. Thanks, ...Robert
Re: soliciting patches to review
On Tue, Apr 23, 2024 at 1:27 PM Robert Haas wrote: > > Hi, > > Just a quick update. We have so far had 8 suggested patches from 6 > people, if I haven't missed anything. I'm fairly certain that not all > of those patches are going to be good candidates for this session, so > it would be great if a few more people wanted to volunteer their > patches. Since you are going to share the patches anyway at the workshop, do you mind giving an example of a patch that is a good fit for the workshop? Alternatively, you could provide a hypothetical example. I, of course, have patches that I'd like reviewed. But, I'm unconvinced any of them would be particularly interesting in a workshop. - Melanie
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, Apr 22, 2024 at 2:20 PM Jelte Fennema-Nio wrote: > 1. I strongly believe minor protocol version bumps after the initial > 3.1 one can be made painless for clients/poolers (so the ones to > 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not > having to worry about breaking TLS 1.2 communication. Apologies for focusing on a single portion of your argument, but this claim in particular stuck out to me. To my understanding, IETF worried a _lot_ about breaking TLS 1.2 implementations with the TLS 1.3 change, to the point that TLS 1.3 clients and servers advertise themselves as TLS 1.2 and sneak the actual version used into a TLS extension (roughly analogous to the _pq_ stuff). I vaguely recall that the engineering work done for that update was pretty far from painless. --Jacob
Re: Direct SSL connection with ALPN and HBA rules
On Tue, Apr 23, 2024 at 1:22 PM Jacob Champion wrote: > On Mon, Apr 22, 2024 at 10:42 PM Michael Paquier wrote: > > On Mon, Apr 22, 2024 at 10:47:51AM +0300, Heikki Linnakangas wrote: > > > On 22/04/2024 10:19, Michael Paquier wrote: > > >> As a whole, I can get behind a unique GUC that forces the use of > > >> direct. Or, we could extend the existing "ssl" GUC with a new > > >> "direct" value to accept only direct connections and restrict the > > >> original protocol (and a new "postgres" for the pre-16 protocol, > > >> rejecting direct?), while "on" is able to accept both. > > > > > > I'd be OK with that, although I still don't really see the point of > > > forcing > > > this from the server side. We could also add this later. > > > > I'd be OK with doing something only in v18, if need be. Jacob, what > > do you think? > > I think it would be nice to have an option like that. Whether it's > done now or in 18, I don't have a strong opinion about. But I do think > it'd be helpful to have a consensus on whether or not this is a > security improvement, or a performance enhancement only, before adding > said option. As it's implemented, if the requiredirect option doesn't > actually requiredirect, I think it looks like security but isn't > really. I've not followed this thread closely enough to understand the comment about requiredirect maybe not actually requiring direct, but if that were true it seems like it might be concerning. But as far as having a GUC to force direct SSL or not, I agree that's a good idea, and that it's better than only being able to control the behavior through pg_hba.conf, because it removes room for any possible doubt about whether you're really enforcing the behavior you want to be enforcing. It might also mean that the connection can be rejected earlier in the handshaking process on the basis of the GUC value, which could conceivably prevent a client from reaching some piece of code that turns out to have a security vulnerability. For example, if we found out that direct SSL connections let you take over the Pentagon before reaching the authentication stage, but for some reason regular connections don't have the same problem, being able to categorically shut off direct SSL would be valuable. However, I don't really see why this has to be done for this release. It seems like a separate feature from direct SSL itself. If direct SSL hadn't been committed at the very last minute, then it would have been great if this had been done for this release, too. But it was. The moral we ought to take from that is "perhaps get the big features in a bit further in advance of the freeze," not "well we'll just keep hacking after the freeze." -- Robert Haas EDB: http://www.enterprisedb.com
Re: soliciting patches to review
On Tue, Apr 23, 2024 at 1:39 PM Melanie Plageman wrote: > Since you are going to share the patches anyway at the workshop, do > you mind giving an example of a patch that is a good fit for the > workshop? Alternatively, you could provide a hypothetical example. I, > of course, have patches that I'd like reviewed. But, I'm unconvinced > any of them would be particularly interesting in a workshop. Andres and I haven't discussed our selection criteria yet, but my feeling is that we're going to want patches that are somewhat medium-sized. If your patch makes PostgreSQL capable of faster-than-light travel, it's probably too big to be reviewed meaningfully in the time we will have. If your patch changes corrects a bunch of typos, it probably lacks enough substance to be worth discussing. I hesitate to propose more specific parameters. On the one hand, a patch that changes something user-visible that someone could reasonably like or dislike is probably easier to review, in some sense, than a patch that refactors code or tries to improve performance. However, talking about how to review patches where it's less obvious what you should be trying to evaluate might be an important part of the workshop, so my feeling is that I would prefer it if more people would volunteer and then let Andres and I sort through what we think makes sense to include. I would also be happy to have people "blanket submit" without naming patches i.e. if anyone wants to email and say "hey, feel free to include any of my stuff if you want" that is great. Our concern was that we didn't want to look like we were picking on anyone who wasn't up for it. I'm happy to keep getting emails from people with specific patches they want reviewed -- if we can hit a patch that someone wants reviewed that is better for everyone than if we just pick randomly -- but my number one concern is not offending anyone. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Direct SSL connection with ALPN and HBA rules
On Tue, Apr 23, 2024 at 10:43 AM Robert Haas wrote: > I've not followed this thread closely enough to understand the comment > about requiredirect maybe not actually requiring direct, but if that > were true it seems like it might be concerning. It may be my misunderstanding. This seems to imply bad behavior: > If the server rejects GSS encryption, SSL is > negotiated over the same TCP connection using the traditional postgres > protocol, regardless of sslnegotiation. As does this comment: > + /* > +* If enabled, try direct SSL. Unless we have a valid TCP connection that > +* failed negotiating GSSAPI encryption or a plaintext connection in case > +* of sslmode='allow'; in that case we prefer to reuse the connection with > +* negotiated SSL, instead of reconnecting to do direct SSL. The point of > +* direct SSL is to avoid the roundtrip from the negotiation, but > +* reconnecting would also incur a roundtrip. > +*/ but when I actually try those cases, I see that requiredirect does actually cause a direct SSL connection to be done, even with sslmode=allow. So maybe it's just misleading documentation (or my misreading of it) that needs to be expanded? Am I missing a different corner case where requiredirect is ignored, Heikki? I still question the utility of allowing sslmode=allow with sslnegotiation=requiredirect, because it seems like you've made both the performance and security characteristics actively worse if you choose that combination. But I want to make sure I understand the current behavior correctly before I derail the discussion too much... Thanks, --Jacob
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
On Mon, Apr 22, 2024 at 03:20:10PM -0500, Nathan Bossart wrote: > On Mon, Apr 22, 2024 at 04:08:08PM -0400, Tom Lane wrote: >> The problem with the README is that it describes that process, >> rather than the now-typical workflow of incrementally keeping >> the tree indented. I don't think we want to remove the info >> about how to do the full-monty process, but you're right that >> the README needs to explain the incremental method as being >> the one most developers would usually use. >> >> Want to write some text? > > Yup, I'll put something together. Here is a first attempt. I'm not tremendously happy with it, but it at least gets something on the page to build on. I was originally going to copy/paste the relevant steps into the description of the incremental process, but that seemed kind-of silly, so I instead just pointed to the relevant steps of the "full" process, along with the deviations from those steps. That's a little more work for the reader, but maybe it isn't too bad... I plan to iterate on this patch some more. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/tools/pgindent/README b/src/tools/pgindent/README index b649a21f59..221e3fc010 100644 --- a/src/tools/pgindent/README +++ b/src/tools/pgindent/README @@ -1,8 +1,11 @@ pgindent'ing the PostgreSQL source tree === -We run this process at least once in each development cycle, -to maintain uniform layout style in our C and Perl code. +The following process is used to maintain uniform layout style in our C and Perl +code. Once in each development cycle, a complete, independent run is performed. +Additionally, an abridged run should be performed on every new commit to +minimize deviations from pgindent's preferred style. See below for more details +on each type of run. You might find this blog post interesting: http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html @@ -25,7 +28,20 @@ PREREQUISITES: Or if you have cpanm installed, you can just use: cpanm https://cpan.metacpan.org/authors/id/S/SH/SHANCOCK/Perl-Tidy-20230309.tar.gz -DOING THE INDENT RUN: +DOING AN INCREMENTAL, PER-COMMIT INDENT RUN: + +First, complete steps 1-3 of a complete run listed below. Note that there is +no need to download the typedef file from the buildfarm. Standard practice is +to manually update this file as needed in the same commit that adds or removes +types. The once-per-development-cycle run will resolve any differences between +the incrementally updated version of the file and a clean, autogenerated one. + +Finally, complete steps 1-3 of the validation section below. Ensure that any +changes made by pgindent are included in the patch prior to committing. If you +forget to do so, add the missed pgindent changes in a follow-up commit and also +do step 4 of the validation section below. + +DOING A COMPLETE, ONCE-PER-DEVELOPMENT-CYCLE INDENT RUN: 1) Change directory to the top of the source tree.
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 19 Apr 2024, at 10:06, Peter Eisentraut wrote: > > On 19.04.24 07:37, Michael Paquier wrote: >> On Thu, Apr 18, 2024 at 12:53:43PM +0200, Peter Eisentraut wrote: >>> If everything is addressed, I agree that 0001, 0003, and 0004 can go into >>> PG17, the rest later. >> About the PG17 bits, would you agree about a backpatch? Or perhaps >> you disagree? > > I don't think any of these need to be backpatched, at least right now. > > 0001 is just a cosmetic documentation tweak, has no reason to be backpatched. > > 0003 adds new functionality for LibreSSL. While the code looks > straightforward, we have little knowledge about how it works in practice. > How is the buildfarm coverage of LibreSSL (with SSL tests enabled!)? If > people are keen on this, it might be better to get it into PG17 and at least > let to go through a few months of beta testing. > > 0004 effectively just enhances an error message for LibreSSL; there is little > reason to backpatch this. Hearing no objections to this plan (and the posted v10), I'll go ahead with 0001, 0003 and 0004 into v17 tomorrow. -- Daniel Gustafsson
Re: Direct SSL connection with ALPN and HBA rules
On 23/04/2024 22:33, Jacob Champion wrote: On Tue, Apr 23, 2024 at 10:43 AM Robert Haas wrote: I've not followed this thread closely enough to understand the comment about requiredirect maybe not actually requiring direct, but if that were true it seems like it might be concerning. It may be my misunderstanding. This seems to imply bad behavior: If the server rejects GSS encryption, SSL is negotiated over the same TCP connection using the traditional postgres protocol, regardless of sslnegotiation. As does this comment: + /* +* If enabled, try direct SSL. Unless we have a valid TCP connection that +* failed negotiating GSSAPI encryption or a plaintext connection in case +* of sslmode='allow'; in that case we prefer to reuse the connection with +* negotiated SSL, instead of reconnecting to do direct SSL. The point of +* direct SSL is to avoid the roundtrip from the negotiation, but +* reconnecting would also incur a roundtrip. +*/ but when I actually try those cases, I see that requiredirect does actually cause a direct SSL connection to be done, even with sslmode=allow. So maybe it's just misleading documentation (or my misreading of it) that needs to be expanded? Am I missing a different corner case where requiredirect is ignored, Heikki? You're right, the comment is wrong about sslmode=allow. There is no negotiation of a plaintext connection, the client just sends the startup packet directly. The HBA rules can reject it, but the client will have to disconnect and reconnect in that case. The documentation and that comment are misleading about failed GSSAPI encryption too, and I also misremembered that. With sslnegotiation=requiredirect, libpq never uses negotiated SSL mode. It will reconnect after a rejected GSSAPI request. So that comment applies to sslnegotiation=direct, but not sslnegotiation=requiredirect. Attached patch tries to fix and clarify those. (Note that the client will only attempt GSSAPI encryption if it can find kerberos credentials in the environment.) -- Heikki Linnakangas Neon (https://neon.tech) From 664555decb695123a4bf25ea56f789202b926ea0 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 24 Apr 2024 00:10:24 +0300 Subject: [PATCH 1/1] Fix documentation and comments on what happens after GSS rejection The paragraph in the docs and the comment applied to sslnegotiaton=direct, but not sslnegotiation=requiredirect. In 'requiredirect' mode, negotiated SSL is never used. Move the paragraph in the docs under the description of 'direct' mode, and rephrase it. Also the comment's reference to reusing a plaintext connection was bogus. Authentication failure in plaintext mode only happens after sending the startup packet, so the connection cannot be reused. --- doc/src/sgml/libpq.sgml | 19 +-- src/interfaces/libpq/fe-connect.c | 11 ++- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index 9199d0d2e5..7f854edfa2 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -1803,6 +1803,15 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname process adds significant latency if the initial SSL connection fails. + + An exception is if gssencmode is set + to prefer, but the server rejects GSS encryption. + In that case, SSL is negotiated over the same TCP connection using + PostgreSQL protocol negotiation. In + other words, the direct SSL handshake is not used, if a TCP + connection has already been established and can be used for the + SSL handshake. + @@ -1816,16 +1825,6 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname - - -Note that if gssencmode is set -to prefer, a GSS connection is -attempted first. If the server rejects GSS encryption, SSL is -negotiated over the same TCP connection using the traditional postgres -protocol, regardless of sslnegotiation. In other -words, the direct SSL handshake is not used, if a TCP connection has -already been established and can be used for the SSL handshake. - diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index ec20e3f3a9..b1d3bfa59d 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -4430,11 +4430,12 @@ select_next_encryption_method(PGconn *conn, bool have_valid_connection) /* * If enabled, try direct SSL. Unless we have a valid TCP connection that - * failed negotiating GSSAPI encryption or a plaintext connection in case - * of sslmode='allow'; in that case we prefer to reuse the connection with - * negotiated SSL, instead of reconnecting to do direct SSL. The point of - * direct SSL is to avoid the ro
Re: GUC-ify walsender MAX_SEND_SIZE constant
Hi, On 2024-04-23 14:47:31 +0200, Jakub Wartak wrote: > On Tue, Apr 23, 2024 at 2:24 AM Michael Paquier wrote: > > > > > Any news, comments, etc. about this thread? > > > > FWIW, I'd still be in favor of doing a GUC-ification of this part, but > > at this stage I'd need more time to do a proper study of a case where > > this shows benefits to prove your point, or somebody else could come > > in and show it. > > > > Andres has objected to this change, on the ground that this was not > > worth it, though you are telling the contrary. I would be curious to > > hear from others, first, so as we gather more opinions to reach a > > consensus. I think it's a bad idea to make it configurable. It's just one more guc that nobody has a chance of realistically tuning. I'm not saying we shouldn't improve the code - just that making MAX_SEND_SIZE configurable doesn't really seem like a good answer. FWIW, I have a hard time believing that MAX_SEND_SIZE is going to be the the only or even primary issue with high latency, high bandwidth storage devices. > First: it's very hard to get *reliable* replication setup for > benchmark, where one could demonstrate correlation between e.g. > increasing MAX_SEND_SIZE and observing benefits (in sync rep it is > easier, as you are simply stalled in pgbench). Part of the problem are > the following things: Depending on the workload, it's possible to measure streaming-out performance without actually regenerating WAL. E.g. by using pg_receivewal to stream the data out multiple times. Another way to get fairly reproducible WAL workloads is to drive pg_logical_emit_message() from pgbench, that tends to havea lot less variability than running tpcb-like or such. > Second: once you perform above and ensure that there are no network or > I/O stalls back then I *think* I couldn't see any impact of playing > with MAX_SEND_SIZE from what I remember as probably something else is > saturated first. My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is that it determines the max size passed to WALRead(), which in turn determines how much we read from the OS at once. If the storage has high latency but also high throughput, and readahead is disabled or just not aggressive enough after crossing segment boundaries, larger reads reduce the number of times you're likely to be blocked waiting for read IO. Which is also why I think that making MAX_SEND_SIZE configurable is a really poor proxy for improving the situation. We're imo much better off working on read_stream.[ch] support for reading WAL. Greetings, Andres Freund
Tarball builds in the new world order
With one eye on the beta-release calendar, I thought it'd be a good idea to test whether our tarball build script works for the new plan where we'll use "git archive" instead of the traditional process. It doesn't. It makes tarballs all right, but whatever commit ID you specify is semi-ignored, and you get a tarball corresponding to HEAD of master. (The PDFs come from the right version, though!) The reason for that is that the mk-one-release script does this (shorn of not-relevant-here details): export BASE=/home/pgsql export GIT_DIR=$BASE/postgresql.git mkdir pgsql # Export the selected git ref git archive ${gitref} | tar xf - -C pgsql cd pgsql ./configure # Produce .tar.gz and .tar.bz2 files make dist Since 619bc23a1, what "make dist" does is $(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ $(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@ Since GIT_DIR is set, git consults that repo not the current working directory, so HEAD means whatever it means in a just-fetched repo, and mk-one-release's efforts to select the ${gitref} commit mean nothing. (If git had tried to consult the current working directory, it would've failed for lack of any .git subdir therein.) I really really don't want to put version-specific coding into mk-one-release, but fortunately I think we don't have to. What I suggest is doing this in mk-one-release: -make dist +make dist PG_COMMIT_HASH=${gitref} and changing the "make dist" rules to write $(PG_COMMIT_HASH) not HEAD. The extra make variable will have no effect in the back branches, while it should cause the right thing to happen with the new implementation of "make dist". This change seems like a good thing anyway for anyone who's tempted to use "make dist" manually, since they wouldn't necessarily want to package HEAD either. Now, if we just do it exactly like that then trying to "make dist" without setting PG_COMMIT_HASH will fail, since "git archive" has no default for its argument. I can't quite decide if that's a good thing, or if we should hack the makefile a little further to allow PG_COMMIT_HASH to default to HEAD. Thoughts, better ideas? regards, tom lane
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Tue, Apr 23, 2024 at 10:08:13PM +0200, Daniel Gustafsson wrote: > Hearing no objections to this plan (and the posted v10), I'll go ahead with > 0001, 0003 and 0004 into v17 tomorrow. WFM, thanks. -- Michael signature.asc Description: PGP signature
Re: BitmapHeapScan streaming read user and prelim refactoring
On 4/23/24 18:05, Melanie Plageman wrote: > On Mon, Apr 22, 2024 at 1:01 PM Melanie Plageman > wrote: >> >> On Thu, Apr 18, 2024 at 5:39 AM Tomas Vondra >> wrote: >>> >>> On 4/18/24 09:10, Michael Paquier wrote: On Sun, Apr 07, 2024 at 10:54:56AM -0400, Melanie Plageman wrote: > I've added an open item [1], because what's one open item when you can > have two? (me) And this is still an open item as of today. What's the plan to move forward here? >>> >>> AFAIK the plan is to replace the asserts with actually resetting the >>> rs_empty_tuples_pending field to 0, as suggested by Melanie a week ago. >>> I assume she was busy with the post-freeze AM reworks last week, so this >>> was on a back burner. >> >> yep, sorry. Also I took a few days off. I'm just catching up today. I >> want to pop in one of Richard or Tomas' examples as a test, since it >> seems like it would add some coverage. I will have a patch soon. > > The patch with a fix is attached. I put the test in > src/test/regress/sql/join.sql. It isn't the perfect location because > it is testing something exercisable with a join but not directly > related to the fact that it is a join. I also considered > src/test/regress/sql/select.sql, but it also isn't directly related to > the query being a SELECT query. If there is a better place for a test > of a bitmap heap scan edge case, let me know. > I don't see a problem with adding this to join.sql - why wouldn't this count as something related to a join? Sure, it's not like this code matters only for joins, but if you look at join.sql that applies to a number of other tests (e.g. there are a couple btree tests). That being said, it'd be good to explain in the comment why we're testing this particular plan, not just what the plan looks like. > One other note: there is some concurrency effect in the parallel > schedule group containing "join" where you won't trip the assert if > all the tests in that group in the parallel schedule are run. But, if > you would like to verify that the test exercises the correct code, > just reduce the group containing "join". > That is ... interesting. Doesn't that mean that most test runs won't actually detect the problem? That would make the test a bit useless. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_combinebackup does not detect missing files
On 4/22/24 23:53, Robert Haas wrote: On Sun, Apr 21, 2024 at 8:47 PM David Steele wrote: I figured that wouldn't be particularly meaningful, and that's pretty much the only kind of validation that's even theoretically possible without a bunch of extra overhead, since we compute checksums on entire files rather than, say, individual blocks. And you could really only do it for the final backup in the chain, because you should end up accessing all of those files, but the same is not true for the predecessor backups. So it's a very weak form of verification. I don't think it is weak if you can verify that the output is exactly as expected, i.e. all files are present and have the correct contents. I don't understand what you mean here. I thought we were in agreement that verifying contents would cost a lot more. The verification that we can actually do without much cost can only check for missing files in the most recent backup, which is quite weak. pg_verifybackup is available if you want more comprehensive verification and you're willing to pay the cost of it. I simply meant that it is *possible* to verify the output of pg_combinebackup without explicitly verifying all the backups. There would be overhead, yes, but it would be less than verifying each backup individually. For my 2c that efficiency would make it worth doing verification in pg_combinebackup, with perhaps a switch to turn it off if the user is confident in their sources. I think it is a worthwhile change and we are still a month away from beta1. We'll see if anyone disagrees. I don't plan to press forward with this in this release unless we get a couple of +1s from disinterested parties. We're now two weeks after feature freeze and this is design behavior, not a bug. Perhaps the design should have been otherwise, but two weeks after feature freeze is not the time to debate that. It doesn't appear that anyone but me is terribly concerned about verification, even in this weak form, so probably best to hold this patch until the next release. As you say, it is late in the game. Regards, -David
Re: Requiring LLVM 14+ in PostgreSQL 18
Rebased over ca89db5f. I looked into whether we could drop the "old pass manager" code too[1]. Almost, but nope, even the C++ API lacks a way to set the inline threshold before LLVM 16, so that would cause a regression. Although we just hard-code the threshold to 512 with a comment that sounds like it's pretty arbitrary, a change to the default (225?) would be unjustifiable just for code cleanup. Oh well. [1] https://github.com/macdice/postgres/commit/0d40abdf1feb75210c3a3d2a35e3d6146185974c v2-0001-jit-Require-at-least-LLVM-14-if-enabled.patch Description: Binary data v2-0002-jit-Use-opaque-pointers-in-all-supported-LLVM-ver.patch Description: Binary data
Re: query_id, pg_stat_activity, extended query protocol
> I am also a bit surprised with the choice of using the first Query > available in the list for the ID, FWIW. IIUC, the query trees returned from QueryRewrite will all have the same queryId, so it appears valid to use the queryId from the first tree in the list. Right? Here is an example I was working with that includes user-defined rules that has a list with more than 1 tree. postgres=# explain (verbose, generic_plan) insert into mytab values ($1) RETURNING pg_sleep($1), id ; QUERY PLAN --- Insert on public.mytab (cost=0.00..0.01 rows=1 width=4) Output: pg_sleep(($1)::double precision), mytab.id -> Result (cost=0.00..0.01 rows=1 width=4) Output: $1 Query Identifier: 3703848357297795425 Insert on public.mytab2 (cost=0.00..0.01 rows=0 width=0) -> Result (cost=0.00..0.01 rows=1 width=4) Output: $1 Query Identifier: 3703848357297795425 Insert on public.mytab3 (cost=0.00..0.01 rows=0 width=0) -> Result (cost=0.00..0.01 rows=1 width=4) Output: $1 Query Identifier: 3703848357297795425 Insert on public.mytab4 (cost=0.00..0.01 rows=0 width=0) -> Result (cost=0.00..0.01 rows=1 width=4) Output: $1 Query Identifier: 3703848357297795425 (20 rows) > Did you consider using \bind to show how this behaves in a regression > test? Yes, this is precisely how I tested. Without the patch, I could not see a queryId after 9 seconds of a pg_sleep, but with the patch it appears. See the test below. ## test query select pg_sleep($1) \bind 30 ## unpatched postgres=# select query_id, query, now()-query_start query_duration, state from pg_stat_activity where pid <> pg_backend_pid() and state = 'active'; query_id |query | query_duration | state --+--+-+ | select pg_sleep($1) +| 00:00:08.604845 | active | ;| | (1 row) ## patched postgres=# truncate table large;^C postgres=# select query_id, query, now()-query_start query_duration, state from pg_stat_activity where pid <> pg_backend_pid() and state = 'active'; query_id |query | query_duration | state -+--++ 2433215470630378210 | select pg_sleep($1) +| 00:00:09.6881 | active | ;|| (1 row) For exec_execute_message, I realized that to report queryId for Utility and non-utility statements, we need to report the queryId inside the portal routines where PlannedStmt contains the queryId. Attached is the first real attempt at the fix. Regards, Sami 0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch Description: 0001-v1-Fix-Extended-QUery-Protocol-handling-of-queryId.patch
Re: Streaming I/O, vectored I/O (WIP)
I've attached a patch with a few typo fixes and what looks like an incorrect type for max_ios. It's an int16 and I think it needs to be an int. Doing "max_ios = Min(max_ios, PG_INT16_MAX);" doesn't do anything when max_ios is int16. David diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 634cf4f0d1..74b9bae631 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -26,12 +26,12 @@ * * B) I/O is necessary, but fadvise is undesirable because the access is * sequential, or impossible because direct I/O is enabled or the system - * doesn't support advice. There is no benefit in looking ahead more than - * io_combine_limit, because in this case only goal is larger read system + * doesn't support fadvise. There is no benefit in looking ahead more than + * io_combine_limit, because in this case the only goal is larger read system * calls. Looking further ahead would pin many buffers and perform * speculative work looking ahead for no benefit. * - * C) I/O is necesssary, it appears random, and this system supports fadvise. + * C) I/O is necessary, it appears random, and this system supports fadvise. * We'll look further ahead in order to reach the configured level of I/O * concurrency. * @@ -418,7 +418,7 @@ read_stream_begin_relation(int flags, ReadStream *stream; size_t size; int16 queue_size; - int16 max_ios; + int max_ios; int strategy_pin_limit; uint32 max_pinned_buffers; Oid tablespace_id; @@ -447,6 +447,8 @@ read_stream_begin_relation(int flags, max_ios = get_tablespace_maintenance_io_concurrency(tablespace_id); else max_ios = get_tablespace_io_concurrency(tablespace_id); + + /* Cap to INT16_MAX to avoid overflowing below */ max_ios = Min(max_ios, PG_INT16_MAX); /*
Extend ALTER DEFAULT PRIVILEGES for large objects
Hi, Currently, ALTER DEFAULT PRIVILEGE doesn't support large objects, so if we want to allow users other than the owner to use the large object, we need to grant a privilege on it every time a large object is created. One of our clients feels that this is annoying, so I would like propose to extend ALTER DEFAULT PRIVILEGE to large objects. Here are the new actions allowed in abbreviated_grant_or_revoke; +GRANT { { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] +REVOKE [ GRANT OPTION FOR ] +{ { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +FROM { [ GROUP ] role_name | PUBLIC } [, ...] +[ CASCADE | RESTRICT ] A new keyword OBJECTS is introduced for using plural form in the syntax as other supported objects. A schema name is not allowed to be specified for large objects since any large objects don't belong to a schema. The attached patch is originally proposed by Haruka Takatsuka and some fixes and tests are made by me. Regards, Yugo Nagata -- Yugo NAGATA >From fe2cb39bd83d09a052d5d63889acd0968c1817b6 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Fri, 8 Mar 2024 17:43:43 +0900 Subject: [PATCH] Extend ALTER DEFAULT PRIVILEGES for large objects Original patch by Haruka Takatsuka, some fixes and tests by Yugo Nagata. --- doc/src/sgml/catalogs.sgml| 3 +- .../sgml/ref/alter_default_privileges.sgml| 15 ++- src/backend/catalog/aclchk.c | 21 src/backend/catalog/pg_largeobject.c | 18 ++- src/backend/parser/gram.y | 5 +- src/include/catalog/pg_default_acl.h | 1 + src/include/parser/kwlist.h | 1 + src/test/regress/expected/privileges.out | 104 +- src/test/regress/sql/privileges.sql | 47 9 files changed, 208 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 2907079e2a..b8cc822aeb 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3330,7 +3330,8 @@ SCRAM-SHA-256$:&l S = sequence, f = function, T = type, - n = schema + n = schema, + L = large object diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml index 1de4c5c1b4..3b358b7a88 100644 --- a/doc/src/sgml/ref/alter_default_privileges.sgml +++ b/doc/src/sgml/ref/alter_default_privileges.sgml @@ -50,6 +50,11 @@ GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] } ON SCHEMAS TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] +GRANT { { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] + REVOKE [ GRANT OPTION FOR ] { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER | MAINTAIN } [, ...] | ALL [ PRIVILEGES ] } @@ -81,6 +86,13 @@ REVOKE [ GRANT OPTION FOR ] ON SCHEMAS FROM { [ GROUP ] role_name | PUBLIC } [, ...] [ CASCADE | RESTRICT ] + +REVOKE [ GRANT OPTION FOR ] +{ { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +FROM { [ GROUP ] role_name | PUBLIC } [, ...] +[ CASCADE | RESTRICT ] @@ -159,7 +171,8 @@ REVOKE [ GRANT OPTION FOR ] If IN SCHEMA is omitted, the global default privileges are altered. IN SCHEMA is not allowed when setting privileges - for schemas, since schemas can't be nested. + for schemas and large objects, since schemas can't be nested and + large objects don't belong to a schema. diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 7abf3c2a74..41baf81a1d 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -1077,6 +1077,10 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s all_privileges = ACL_ALL_RIGHTS_SCHEMA; errormsg = gettext_noop("invalid privilege type %s for schema"); break; + case OBJECT_LARGEOBJECT: + all_privileges = ACL_ALL_RIGHTS_LARGEOBJECT; + errormsg = gettext_noop("invalid privilege type %s for large object"); + break; default: elog(ERROR, "unrecognized GrantStmt.objtype: %d", (int) action->objtype); @@ -1268,6 +1272,16 @@ SetDefaultACL(InternalDefaultACL *iacls) this_privileges = ACL_ALL_RIGHTS_SCHEMA; break; + case OBJECT_LARGEOBJECT: + if (OidIsValid(iacls->nspid)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_GRANT_OPERATION), + errmsg("cannot use IN SCHEMA clause when using GRANT/REVOKE ON LARGE OBJECTS"))); + objtype = DEFACLOBJ_LARGEOBJECT; + if (iacls->all_privs && this_privileges == ACL_NO_RIGHTS) +this_privileges = ACL_ALL_RIGHTS_LARGEOBJECT; + break; + default: elog(ERROR, "unrecogniz
Re: Cleanup: remove unused fields from nodes
On Tue, Apr 23, 2024 at 01:01:04PM -0400, Tom Lane wrote: > That is, query jumbling no longer distinguishes "DEALLOCATE x" from > "DEALLOCATE ALL", because the DeallocateStmt.name field is marked > query_jumble_ignore. Now maybe that's fine, but it's a point > we'd not considered so far in this thread. Thoughts? And of course, I've managed to forget about bb45156f342c and the reason behind the addition of the field is to be able to make the difference between the named and ALL cases for DEALLOCATE, around here: https://www.postgresql.org/message-id/ZNq9kRwWbKzvR%2B2a%40paquier.xyz This is new in v17, so perhaps it could be changed, but I think that's important to make the difference here for monitoring purposes as DEALLOCATE ALL could be used as a way to clean up prepared statements in connection poolers (for example, pgbouncer's server_reset_query). And doing this tweak in the Node structure of DeallocateStmt is simpler than having to maintain a new pg_node_attr for query jumbling. -- Michael signature.asc Description: PGP signature
Re: Cleanup: remove unused fields from nodes
Michael Paquier writes: > On Tue, Apr 23, 2024 at 01:01:04PM -0400, Tom Lane wrote: >> That is, query jumbling no longer distinguishes "DEALLOCATE x" from >> "DEALLOCATE ALL", because the DeallocateStmt.name field is marked >> query_jumble_ignore. Now maybe that's fine, but it's a point >> we'd not considered so far in this thread. Thoughts? > And of course, I've managed to forget about bb45156f342c and the > reason behind the addition of the field is to be able to make the > difference between the named and ALL cases for DEALLOCATE, around > here: > https://www.postgresql.org/message-id/ZNq9kRwWbKzvR%2B2a%40paquier.xyz Hah. Seems like the comment for isall needs to explain that it exists for this purpose, so we don't make this mistake again. regards, tom lane
Re: Row pattern recognition
Hi Vik and Champion, I think the current RPR patch is not quite correct in handling count(*). (using slightly modified version of Vik's example query) SELECT v.a, count(*) OVER w FROM (VALUES ('A'),('B'),('B'),('C')) AS v (a) WINDOW w AS ( ORDER BY v.a ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING PATTERN (B+) DEFINE B AS a = 'B' ) a | count ---+--- A | 0 B | 2 B | C | 0 (4 rows) Here row 3 is skipped because the pattern B matches row 2 and 3. In this case I think cont(*) should return 0 rathern than NULL for row 3. What do you think? Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Disallow changing slot's failover option in transaction block
On Mon, Apr 22, 2024 at 2:31 PM Bertrand Drouvot wrote: > > On Mon, Apr 22, 2024 at 11:32:14AM +0530, shveta malik wrote: > > On Mon, Apr 22, 2024 at 5:57 AM Zhijie Hou (Fujitsu) > > wrote: > > > Attach the V3 patch which also addressed Shveta[1] and Bertrand[2]'s > > > comments. > > Thanks! > > > Tested the patch, works well. > > Same here. > Pushed! -- With Regards, Amit Kapila.
Re: Extend ALTER DEFAULT PRIVILEGES for large objects
Yugo NAGATA writes: > Currently, ALTER DEFAULT PRIVILEGE doesn't support large objects, > so if we want to allow users other than the owner to use the large > object, we need to grant a privilege on it every time a large object > is created. One of our clients feels that this is annoying, so I would > like propose to extend ALTER DEFAULT PRIVILEGE to large objects. I wonder how this plays with pg_dump, and in particular whether it breaks the optimizations that a45c78e32 installed for large numbers of large objects. The added test cases seem to go out of their way to leave no trace behind that the pg_dump/pg_upgrade tests might encounter. I think you broke psql's \ddp, too. And some other places; grepping for DEFACLOBJ_NAMESPACE finds other oversights. On the whole I find this proposed feature pretty unexciting and dubiously worthy of the implementation/maintenance effort. regards, tom lane
Re: promotion related handling in pg_sync_replication_slots()
On Tue, Apr 23, 2024 at 9:07 AM Amit Kapila wrote: > > On Mon, Apr 22, 2024 at 7:04 PM Masahiko Sawada wrote: > > > > On Mon, Apr 22, 2024 at 9:02 PM shveta malik wrote: > > > > > > Thanks for the patch, the changes look good Amit. Please find the merged > > > patch. > > > > > > > I've reviewed the patch and have some comments: Thanks for the comments. > > --- > > /* > > -* Early initialization. > > +* Register slotsync_worker_onexit() before we register > > +* ReplicationSlotShmemExit() in BaseInit(), to ensure that during the > > +* exit of the slot sync worker, ReplicationSlotShmemExit() is called > > +* first, followed by slotsync_worker_onexit(). The startup process > > during > > +* promotion invokes ShutDownSlotSync() which waits for slot sync to > > +* finish and it does that by checking the 'syncing' flag. Thus worker > > +* must be done with the slots' release and cleanup before it marks > > itself > > +* as finished syncing. > > */ > > > > I'm slightly worried that we register the slotsync_worker_onexit() > > callback before BaseInit(), because it could be a blocker when we want > > to add more work in the callback, for example sending the stats. > > > > The other possibility is that we do slot release/clean up in the > slotsync_worker_onexit() call itself and then we can do it after > BaseInit(). Do you have any other/better idea for this? I have currently implemented it this way in v11. > > --- > > synchronize_slots(wrconn); > > + > > + /* Cleanup the temporary slots */ > > + ReplicationSlotCleanup(); > > + > > + /* We are done with sync, so reset sync flag */ > > + reset_syncing_flag(); > > > > I think it ends up removing other temp slots that are created by the > > same backend process using > > pg_create_{physical,logical_replication_slots() function, which could > > be a large side effect of this function for users. Yes, this is a problem. Thanks for catching it. > > True, I think here we should either remove only temporary and synced > marked slots. The other possibility is to create slots as RS_EPHEMERAL > initially when called from the SQL function but that doesn't sound > like a neat approach. Modified the logic to remove only synced temporary slots during SQL-function exit. Please find v11 with above changes. thanks Shveta v11-0001-Handle-stopSignaled-during-sync-function-call.patch Description: Binary data
Re: promotion related handling in pg_sync_replication_slots()
On Tue, Apr 23, 2024 at 12:37 PM Amit Kapila wrote: > > On Mon, Apr 22, 2024 at 7:04 PM Masahiko Sawada wrote: > > > > On Mon, Apr 22, 2024 at 9:02 PM shveta malik wrote: > > > > > > Thanks for the patch, the changes look good Amit. Please find the merged > > > patch. > > > > > > > I've reviewed the patch and have some comments: > > > > --- > > /* > > -* Early initialization. > > +* Register slotsync_worker_onexit() before we register > > +* ReplicationSlotShmemExit() in BaseInit(), to ensure that during the > > +* exit of the slot sync worker, ReplicationSlotShmemExit() is called > > +* first, followed by slotsync_worker_onexit(). The startup process > > during > > +* promotion invokes ShutDownSlotSync() which waits for slot sync to > > +* finish and it does that by checking the 'syncing' flag. Thus worker > > +* must be done with the slots' release and cleanup before it marks > > itself > > +* as finished syncing. > > */ > > > > I'm slightly worried that we register the slotsync_worker_onexit() > > callback before BaseInit(), because it could be a blocker when we want > > to add more work in the callback, for example sending the stats. > > > > The other possibility is that we do slot release/clean up in the > slotsync_worker_onexit() call itself and then we can do it after > BaseInit(). This approach sounds clearer and safer to me. The current approach relies on the callback registration order of ReplicationSlotShmemExit(). If it changes in the future, we will silently have the same problem. Every slot sync related work should be done before allowing someone to touch synced slots by clearing the 'syncing' flag. > > > --- > > synchronize_slots(wrconn); > > + > > + /* Cleanup the temporary slots */ > > + ReplicationSlotCleanup(); > > + > > + /* We are done with sync, so reset sync flag */ > > + reset_syncing_flag(); > > > > I think it ends up removing other temp slots that are created by the > > same backend process using > > pg_create_{physical,logical_replication_slots() function, which could > > be a large side effect of this function for users. > > > > True, I think here we should either remove only temporary and synced > marked slots. The other possibility is to create slots as RS_EPHEMERAL > initially when called from the SQL function but that doesn't sound > like a neat approach. > > > > Also, if users want > > to have a process periodically calling pg_sync_replication_slots() > > instead of the slotsync worker, it doesn't support a case where we > > create a temp not-ready slot and turn it into a persistent slot if > > it's ready for sync. > > > > True, but eventually the API should be able to directly create the > persistent slots and anyway this can happen only for the first time > (till the slots are created and marked persistent) and one who wants > to use this function periodically should be able to see regular syncs. I agree that we remove temp-and-synced slots created via the API at the end of the API . We end up creating and dropping slots in every API call but since the pg_sync_replication_slots() function is a kind of debug-purpose function and it will not be common to call this function regularly instead of using the slot sync worker, we can live with such overhead. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Add memory context type to pg_backend_memory_contexts view
On Tue, 16 Apr 2024 at 13:30, David Rowley wrote: > In [1] Andres mentioned that there's no way to determine the memory > context type in pg_backend_memory_contexts. This is a bit annoying as > I'd like to add a test to exercise BumpStats(). > > Having the context type in the test's expected output helps ensure we > are exercising BumpStats() and any future changes to the choice of > context type in tuplesort.c gets flagged up by the test breaking. bea97cd02 added a new regression test in sysviews.sql to call pg_backend_memory_contexts to test the BumpStats() function. The attached updates the v1 patch to add the new type column to the new call to pg_backend_memory_contexts() to ensure the type = "Bump" No other changes. David From 632be6de363e8f9975722debbd620665f3a0ea71 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Tue, 16 Apr 2024 13:05:34 +1200 Subject: [PATCH v2] Add context type field to pg_backend_memory_contexts --- doc/src/sgml/system-views.sgml | 9 + src/backend/utils/adt/mcxtfuncs.c | 50 ++ src/include/catalog/pg_proc.dat| 6 ++-- src/test/regress/expected/rules.out| 5 +-- src/test/regress/expected/sysviews.out | 16 - src/test/regress/sql/sysviews.sql | 4 +-- 6 files changed, 61 insertions(+), 29 deletions(-) diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 7ed617170f..18ae5b9138 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -463,6 +463,15 @@ + + + type text + + + Type of the memory context + + + name text diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c index 4d4a70915b..fe52d57fd4 100644 --- a/src/backend/utils/adt/mcxtfuncs.c +++ b/src/backend/utils/adt/mcxtfuncs.c @@ -36,12 +36,13 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, TupleDesc tupdesc, MemoryContext context, const char *parent, int level) { -#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS9 +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS10 Datum values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; boolnulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS]; MemoryContextCounters stat; MemoryContext child; + const char *type; const char *name; const char *ident; @@ -67,10 +68,31 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, memset(values, 0, sizeof(values)); memset(nulls, 0, sizeof(nulls)); + switch (context->type) + { + case T_AllocSetContext: + type = "AllocSet"; + break; + case T_GenerationContext: + type = "Generation"; + break; + case T_SlabContext: + type = "Slab"; + break; + case T_BumpContext: + type = "Bump"; + break; + default: + type = "???"; + break; + } + + values[0] = CStringGetTextDatum(type); + if (name) - values[0] = CStringGetTextDatum(name); + values[1] = CStringGetTextDatum(name); else - nulls[0] = true; + nulls[1] = true; if (ident) { @@ -86,22 +108,22 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore, memcpy(clipped_ident, ident, idlen); clipped_ident[idlen] = '\0'; - values[1] = CStringGetTextDatum(clipped_ident); + values[2] = CStringGetTextDatum(clipped_ident); } else - nulls[1] = true; + nulls[2] = true; if (parent) - values[2] = CStringGetTextDatum(parent); + values[3] = CStringGetTextDatum(parent); else - nulls[2] = true; - - values[3] = Int32GetDatum(level); - values[4] = Int64GetDatum(stat.totalspace); - values[5] = Int64GetDatum(stat.nblocks); - values[6] = Int64GetDatum(stat.freespace); - values[7] = Int64GetDatum(stat.freechunks); - values[8] = Int64GetDatum(stat.totalspace - stat.freespace); + nulls[3] = true; + + values[4] = Int32GetDatum(level); + values[5] = Int64GetDatum(stat.totalspace); + values[6] = Int64GetDatum(stat.nblocks); + values[7] = Int64GetDatum(stat.freespace); + values[8] = Int64GetDatum(stat.freechunks); + values[9] = Int64GetDatum(stat.totalspace - stat.freespace); tuplestore_putvalues(tupstore, tupdesc, values, nulls); for (child = context->firstchild; child != NULL; child = child->nextchild)
Re: POC: GROUP BY optimization
hi. I found an interesting case. CREATE TABLE t1 AS SELECT (i % 10)::numeric AS x,(i % 10)::int8 AS y,'abc' || i % 10 AS z, i::int4 AS w FROM generate_series(1, 100) AS i; CREATE INDEX t1_x_y_idx ON t1 (x, y); ANALYZE t1; SET enable_hashagg = off; SET enable_seqscan = off; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,y,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,y,z; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,z,w,y; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY x,w,z,y; the above part will use: -> Incremental Sort Sort Key: x, $, $, $ Presorted Key: x -> Index Scan using t1_x_y_idx on t1 EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY z,y,w,x; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY w,y,z,x; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,z,x,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,w,x,z; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,z,w; EXPLAIN (COSTS OFF) SELECT count(*) FROM t1 GROUP BY y,x,w,z; these will use: -> Incremental Sort Sort Key: x, y, $, $ Presorted Key: x, y -> Index Scan using t1_x_y_idx on t1 I guess this is fine, but not optimal?
Re: Race condition in FetchTableStates() breaks synchronization of subscription tables
On Wed, Mar 13, 2024 at 9:19 AM vignesh C wrote: > > On Tue, 12 Mar 2024 at 09:34, Ajin Cherian wrote: > > > > > > > > On Tue, Mar 12, 2024 at 2:59 PM vignesh C wrote: > >> > >> > >> Thanks, I have created the following Commitfest entry for this: > >> https://commitfest.postgresql.org/47/4816/ > >> > >> Regards, > >> Vignesh > > > > > > Thanks for the patch, I have verified that the fix works well by following > > the steps mentioned to reproduce the problem. > > Reviewing the patch, it seems good and is well documented. Just one minor > > comment I had was probably to change the name of the variable > > table_states_valid to table_states_validity. The current name made sense > > when it was a bool, but now that it is a tri-state enum, it doesn't fit > > well. > > Thanks for reviewing the patch, the attached v6 patch has the changes > for the same. > v6_0001* looks good to me. This should be backpatched unless you or others think otherwise. -- With Regards, Amit Kapila.
Re: ecpg_config.h symbol missing with meson
On 17.04.24 18:15, Tom Lane wrote: Peter Eisentraut writes: I checked the generated ecpg_config.h with make and meson, and the meson one is missing #define HAVE_LONG_LONG_INT 1 This is obviously quite uninteresting, since that is required by C99. But it would be more satisfactory if we didn't have discrepancies like that. Note that we also kept ENABLE_THREAD_SAFETY in ecpg_config.h for compatibility. ... Alternatively, we could remove the symbol from the make side. Think I'd vote for removing it, since we use it nowhere. The ENABLE_THREAD_SAFETY precedent feels a little bit different, since there's not the C99-requires-the-feature angle. Ok, fixed by removing instead.
Re: Extend ALTER DEFAULT PRIVILEGES for large objects
On Tue, 23 Apr 2024 23:47:38 -0400 Tom Lane wrote: > Yugo NAGATA writes: > > Currently, ALTER DEFAULT PRIVILEGE doesn't support large objects, > > so if we want to allow users other than the owner to use the large > > object, we need to grant a privilege on it every time a large object > > is created. One of our clients feels that this is annoying, so I would > > like propose to extend ALTER DEFAULT PRIVILEGE to large objects. > > I wonder how this plays with pg_dump, and in particular whether it > breaks the optimizations that a45c78e32 installed for large numbers > of large objects. The added test cases seem to go out of their way > to leave no trace behind that the pg_dump/pg_upgrade tests might > encounter. Thank you for your comments. The previous patch did not work with pg_dump since I forgot some fixes. I attached a updated patch including fixes. I believe a45c78e32 is about already-existing large objects and does not directly related to default privileges, so will not be affected by this patch. > I think you broke psql's \ddp, too. And some other places; grepping > for DEFACLOBJ_NAMESPACE finds other oversights. Yes, I did. The attached patch include fixes for psql, too. > On the whole I find this proposed feature pretty unexciting > and dubiously worthy of the implementation/maintenance effort. I believe this feature is beneficial to some users allows because this enables to omit GRANT that was necessary every large object creation. It seems to me that implementation/maintenance cost is not so high compared to other objects (e.g. default privileges on schemas) unless I am still missing something wrong. Regards, Yugo Nagata -- Yugo NAGATA >From 0cfcdc2b297556248cfb64d67779d5fcb8dab227 Mon Sep 17 00:00:00 2001 From: Yugo Nagata Date: Fri, 8 Mar 2024 17:43:43 +0900 Subject: [PATCH v2] Extend ALTER DEFAULT PRIVILEGES for large objects Original patch by Haruka Takatsuka, some fixes and tests by Yugo Nagata. --- doc/src/sgml/catalogs.sgml| 3 +- .../sgml/ref/alter_default_privileges.sgml| 15 ++- src/backend/catalog/aclchk.c | 21 src/backend/catalog/objectaddress.c | 18 ++- src/backend/catalog/pg_largeobject.c | 18 ++- src/backend/parser/gram.y | 5 +- src/bin/pg_dump/dumputils.c | 3 +- src/bin/pg_dump/pg_dump.c | 3 + src/bin/psql/describe.c | 6 +- src/bin/psql/tab-complete.c | 2 +- src/include/catalog/pg_default_acl.h | 1 + src/include/parser/kwlist.h | 1 + src/test/regress/expected/privileges.out | 104 +- src/test/regress/sql/privileges.sql | 47 14 files changed, 235 insertions(+), 12 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 2907079e2a..b8cc822aeb 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -3330,7 +3330,8 @@ SCRAM-SHA-256$:&l S = sequence, f = function, T = type, - n = schema + n = schema, + L = large object diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml index 1de4c5c1b4..3b358b7a88 100644 --- a/doc/src/sgml/ref/alter_default_privileges.sgml +++ b/doc/src/sgml/ref/alter_default_privileges.sgml @@ -50,6 +50,11 @@ GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] } ON SCHEMAS TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] +GRANT { { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] + REVOKE [ GRANT OPTION FOR ] { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER | MAINTAIN } [, ...] | ALL [ PRIVILEGES ] } @@ -81,6 +86,13 @@ REVOKE [ GRANT OPTION FOR ] ON SCHEMAS FROM { [ GROUP ] role_name | PUBLIC } [, ...] [ CASCADE | RESTRICT ] + +REVOKE [ GRANT OPTION FOR ] +{ { SELECT | UPDATE } +[, ...] | ALL [ PRIVILEGES ] } +ON LARGE OBJECTS +FROM { [ GROUP ] role_name | PUBLIC } [, ...] +[ CASCADE | RESTRICT ] @@ -159,7 +171,8 @@ REVOKE [ GRANT OPTION FOR ] If IN SCHEMA is omitted, the global default privileges are altered. IN SCHEMA is not allowed when setting privileges - for schemas, since schemas can't be nested. + for schemas and large objects, since schemas can't be nested and + large objects don't belong to a schema. diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 7abf3c2a74..41baf81a1d 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -1077,6 +1077,10 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s all_privileges = ACL_ALL_RIGHTS_SCHEMA; errormsg = gettext_noop(
Re: Fix parallel vacuum buffer usage reporting
On Mon, Apr 22, 2024 at 5:07 PM Anthonin Bonnefoy wrote: > > On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina > wrote: >> >> Hi, thank you for your work with this subject. >> >> While I was reviewing your code, I noticed that your patch conflicts with >> another patch [0] that been committed. >> >> [0] >> https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com > > > I've rebased the patch and also split the changes: Thank you for updating the patch! > 1: Use pgBufferUsage in Vacuum and Analyze block reporting I think that if the anayze command doesn't have the same issue, we don't need to change it. Making the vacuum and the analyze consistent is a good point but I'd like to avoid doing unnecessary changes in back branches. I think the patch set would contain: (a) make lazy vacuum use BufferUsage instead of VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13). (b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty} variables for consistency and simplicity (only for HEAD, if we agree). BTW I realized that VACUUM VERBOSE running on a temp table always shows the number of dirtied buffers being 0, which seems to be a bug. The patch (a) will resolve it as well. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Feature request: schema diff tool
Hello, A diff tool that would generate create, drop, alter, etc. commands from the differences between 2 specified schemes would be very useful. The diff could even manage data, so there would be insert, delete update command outputs, although I think the schema diff management is much more important and necessary. Today, all modern applications are version-tracked, including the sql scheme. Now the schema changes must be handled twice: on the one hand, the schema must be modified, and on the other hand, the schema modification commands must also be written for the upgrade process. A good diff tool would allow only the schema to be modified. Such a tool already exists because the community needed it, e.g. apgdiff. I think the problem with this is that the concept isn't even good. I think this tool should be part of postgresql, because postgresql always knows what the 100% sql syntax is current, an external program, for example apgdiff can only follow changes afterwards, generating continuous problems. Not to mention that an external application can stop, e.g. apgdiff is also no longer actively developed, so users who built on a diff tool are now in trouble. Furthermore, it is the least amount of work to do this on the postgresql development side, you have the expertise, the sql language processor, etc. What is your opinion on this? Regards, Neszt Tibor
Small filx on the documentation of ALTER DEFAULT PRIVILEGES
Hi, Hi, We can specify more than one privilege type in "ALTER DEFAULT PRIVILEGES GRANT/REVOKE ON SCHEMAS", for example, ALTER DEFAULT PRIVILEGES GRANT USAGE,CREATE ON SCHEMAS TO PUBLIC; However, the syntax described in the documentation looks to be allowing only one, GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] } ON SCHEMAS TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] while the syntaxes for tables and sequences are described correctly. e.g. GRANT { { SELECT | INSERT | UPDATE | DELETE | TRUNCATE | REFERENCES | TRIGGER } [, ...] | ALL [ PRIVILEGES ] } ON TABLES TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] I attached a small patch to fix the description. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/doc/src/sgml/ref/alter_default_privileges.sgml b/doc/src/sgml/ref/alter_default_privileges.sgml index 1de4c5c1b4..89aacec4fa 100644 --- a/doc/src/sgml/ref/alter_default_privileges.sgml +++ b/doc/src/sgml/ref/alter_default_privileges.sgml @@ -46,7 +46,8 @@ GRANT { USAGE | ALL [ PRIVILEGES ] } ON TYPES TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] -GRANT { USAGE | CREATE | ALL [ PRIVILEGES ] } +GRANT { { USAGE | CREATE } +[, ...] | ALL [ PRIVILEGES ] } ON SCHEMAS TO { [ GROUP ] role_name | PUBLIC } [, ...] [ WITH GRANT OPTION ] @@ -77,7 +78,8 @@ REVOKE [ GRANT OPTION FOR ] [ CASCADE | RESTRICT ] REVOKE [ GRANT OPTION FOR ] -{ USAGE | CREATE | ALL [ PRIVILEGES ] } +{ { USAGE | CREATE } +[, ...] | ALL [ PRIVILEGES ] } ON SCHEMAS FROM { [ GROUP ] role_name | PUBLIC } [, ...] [ CASCADE | RESTRICT ]
Re: GUC-ify walsender MAX_SEND_SIZE constant
Hi, > My understanding of Majid's use-case for tuning MAX_SEND_SIZE is that the > bottleneck is storage, not network. The reason MAX_SEND_SIZE affects that is > that it determines the max size passed to WALRead(), which in turn determines > how much we read from the OS at once. If the storage has high latency but > also high throughput, and readahead is disabled or just not aggressive enough > after crossing segment boundaries, larger reads reduce the number of times > you're likely to be blocked waiting for read IO. > > Which is also why I think that making MAX_SEND_SIZE configurable is a really > poor proxy for improving the situation. > > We're imo much better off working on read_stream.[ch] support for reading WAL. Well then that would be a consistent message at least, because earlier in [1] it was rejected to have prefetch the WAL segment but on the standby side, where the patch was only helping in configurations having readahead *disabled* for some reason. Now Majid stated that he uses "RBD" - Majid, any chance to specify what that RBD really is ? What's the tech? What fs? Any ioping or fio results? What's the blockdev --report /dev/XXX output ? (you stated "high" latency and "high" bandwidth , but it is relative, for me 15ms+ is high latency and >1000MB/s sequential, but it would help others in future if you could specify it by the exact numbers please). Maybe it's just a matter of enabling readahead (line in [1]) there and/or using a higher WAL segment during initdb. [1] - https://www.postgresql.org/message-id/flat/CADVKa1WsQMBYK_02_Ji%3DpbOFnms%2BCT7TV6qJxDdHsFCiC9V_hw%40mail.gmail.com