Re: CFbot does not recognize patch contents
Hi, On Sun, 12 May 2024 at 09:50, Tatsuo Ishii wrote: > > After sending out my v18 patches: > https://www.postgresql.org/message-id/20240511.162307.2246647987352188848.t-ishii%40sranhm.sra.co.jp > > CFbot complains that the patch was broken: > http://cfbot.cputube.org/patch_48_4460.log > > === Applying patches on top of PostgreSQL commit ID > 31e8f4e619d9b5856fa2bd5713cb1e2e170a9c7d === > === applying patch > ./v18-0001-Row-pattern-recognition-patch-for-raw-parser.patch > gpatch: Only garbage was found in the patch input. > > The patch was generated by git-format-patch (same as previous > patches). I failed to find any patch format problem in the > patch. Does anybody know what's wrong here? I am able to apply all your patches. I found that a similar thing happened before [0] and I guess your case is similar. Adding Thomas to CC, he may be able to help more. Nitpick: There is a trailing space warning while applying one of your patches: Applying: Row pattern recognition patch (docs). .git/rebase-apply/patch:81: trailing whitespace. company | tdate| price | first_value | max | count [0] postgr.es/m/CA%2BhUKGLiY1e%2B1%3DpB7hXJOyGj1dJOfgde%2BHmiSnv3gDKayUFJMA%40mail.gmail.com -- Regards, Nazir Bilal Yavuz Microsoft
Re: CFbot does not recognize patch contents
> I am able to apply all your patches. I found that a similar thing > happened before [0] and I guess your case is similar. Adding Thomas to > CC, he may be able to help more. Ok. Thanks for the info. > Nitpick: There is a trailing space warning while applying one of your patches: > Applying: Row pattern recognition patch (docs). > .git/rebase-apply/patch:81: trailing whitespace. > company | tdate| price | first_value | max | count Yes, I know. The reason why there's a trailing whitespace is, I copied the psql output and pasted it into the docs. I wonder why psql adds the whitespace. Unless there's a good reason to do that, I think it's better to fix psql so that it does not emit trailing spaces in its output. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Why is citext/regress failing on hamerkop?
Hello Tom, 12.05.2024 08:34, Tom Lane wrote: BTW, I've also been wondering why hamerkop has been failing isolation-check in the 12 and 13 branches for the last six months or so. It is surely unrelated to this issue, and it looks like it must be due to some platform change rather than anything we committed at the time. I'm not planning on looking into that question myself, but really somebody ought to. Or is Windows just as dead as AIX, in terms of anybody being willing to put effort into supporting it? I've reproduced the failure locally with GSS enabled, so I'll try to figure out what's going on here in the next few days. Best regards, Alexander
Re: pg_stat_statements and "IN" conditions
> On Tue, Apr 23, 2024 at 10:18:15AM +0200, Dmitry Dolgov wrote: > > On Mon, Apr 15, 2024 at 06:09:29PM +0900, Sutou Kouhei wrote: > > > > Thanks. I'm not familiar with this code base but I've > > reviewed these patches because I'm interested in this > > feature too. > > Thanks for the review! The commentaries for the first patch make sense > to me, will apply. Here is the new version. It turned out you were right about memory for the normalized query, if the number of constants goes close to INT_MAX, there were indeed not enough allocated. I've added a fix for this on top of the applied changes, and also improved readability for pg_stat_statements part. >From 324707496d7ec9a71b16f58d8df25e957e41c073 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Wed, 3 Apr 2024 20:02:51 +0200 Subject: [PATCH v20 1/4] Prevent jumbling of every element in ArrayExpr pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is jumbled. In certain situations it's undesirable, especially if the list becomes too large. Make an array of Const expressions contribute only the first/last elements to the jumble hash. Allow to enable this behavior via the new pg_stat_statements parameter query_id_const_merge with the default value off. Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane, Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier, Sutou Kouhei Tested-by: Chengxi Sun, Yasuo Honda --- contrib/pg_stat_statements/Makefile | 2 +- .../pg_stat_statements/expected/merging.out | 167 ++ contrib/pg_stat_statements/meson.build| 1 + .../pg_stat_statements/pg_stat_statements.c | 74 +++- contrib/pg_stat_statements/sql/merging.sql| 58 ++ doc/src/sgml/pgstatstatements.sgml| 57 +- src/backend/nodes/gen_node_support.pl | 21 ++- src/backend/nodes/queryjumblefuncs.c | 105 ++- src/backend/postmaster/launch_backend.c | 3 + src/backend/utils/misc/postgresql.conf.sample | 1 - src/include/nodes/nodes.h | 3 + src/include/nodes/primnodes.h | 2 +- src/include/nodes/queryjumble.h | 9 +- 13 files changed, 478 insertions(+), 25 deletions(-) create mode 100644 contrib/pg_stat_statements/expected/merging.out create mode 100644 contrib/pg_stat_statements/sql/merging.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 414a30856e4..03a62d685f3 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -19,7 +19,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS)) REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf REGRESS = select dml cursors utility level_tracking planning \ - user_activity wal entry_timestamp cleanup oldextversions + user_activity wal entry_timestamp cleanup oldextversions merging # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", # which typical installcheck users do not have (e.g. buildfarm clients). NO_INSTALLCHECK = 1 diff --git a/contrib/pg_stat_statements/expected/merging.out b/contrib/pg_stat_statements/expected/merging.out new file mode 100644 index 000..1e58283afe8 --- /dev/null +++ b/contrib/pg_stat_statements/expected/merging.out @@ -0,0 +1,167 @@ +-- +-- Const merging functionality +-- +CREATE EXTENSION pg_stat_statements; +CREATE TABLE test_merge (id int, data int); +-- IN queries +-- No merging is performed, as a baseline result +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9); + id | data ++-- +(0 rows) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); + id | data ++-- +(0 rows) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11); + id | data ++-- +(0 rows) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; +query| calls +-+--- + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) | 1 + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) | 1 + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1 + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 +(4 rows) + +-- Normal scenario, too many simple constants for an IN query +SET pg_stat_statements.query_id_const_merge = on; +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT * FROM test_merge WHERE id IN (1); + id |
Re: Add TAP tests for backtrace functionality (was Re: Add test module for verifying backtrace functionality)
On 16.03.24 05:25, Bharath Rupireddy wrote: Postgres has a good amount of code for dealing with backtraces - two GUCs backtrace_functions and backtrace_on_internal_error, errbacktrace; all of which use core function set_backtrace from elog.c. I've not seen this code being tested at all, see code coverage report - https://coverage.postgresql.org/src/backend/utils/error/elog.c.gcov.html. I think adding a simple test module (containing no .c files) with only TAP tests will help cover this code. I ended up having it as a separate module under src/test/modules/test_backtrace as I was not able to find an existing TAP file in src/test to add these tests. I'm able to verify the backtrace related code with the attached patch consistently. The TAP tests rely on the fact that the server emits text "BACKTRACE: " to server logs before logging the backtrace, and the backtrace contains the function name in which the error occurs. I've turned off query statement logging (set log_statement = none, log_min_error_statement = fatal) so that the tests get to see the functions only in the backtrace. Although the CF bot is happy with the attached patch https://github.com/BRupireddy2/postgres/tree/add_test_module_for_bcktrace_functionality_v1, there might be some more flakiness to it. Thoughts? Ran pgperltidy on the new TAP test file added. Please see the attached v2 patch. I've now moved the new TAP test file to src/test/modules/test_misc/t as opposed to a new test module to keep it simple. I was not sure why I hadn't done that in the first place. Note that backtrace_on_internal_error has been removed, so this patch will need to be adjusted for that. I suggest you consider joining forces with thread [0] where a replacement for backtrace_on_internal_error would be discussed. Having some test coverage for whatever is being developed there might be useful. [0]: https://www.postgresql.org/message-id/flat/CAGECzQTpdujCEt2SH4DBwRLoDq4HJArGDaxJSsWX0G=tnnz...@mail.gmail.com
Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs
On 14.12.23 14:40, Nazir Bilal Yavuz wrote: On Fri, 6 Oct 2023 at 17:07, Tom Lane wrote: As a quick cross-check, I searched our commit log to see how many README-only commits there were so far this year. I found 11 since January. (Several were triggered by the latest round of pgindent code and process changes, so maybe this is more than typical.) Not sure what that tells us about the value of changing the CI logic, but it does seem like it could be worth the one-liner change needed to teach buildfarm animals to ignore READMEs. I agree that it could be worth implementing this logic on buildfarm animals. In case we want to implement the same logic on the CI, I added a new version of the patch; it skips CI completely if the changes are only in the README files. I don't see how this could be applicable widely enough to be useful: - While there are some patches that touch on README files, very few of those end up in a commit fest. - If someone manually pushes a change to their own CI environment, I don't see why we need to second-guess that. - Buildfarm runs generally batch several commits together, so it is very unlikely that this would be hit. I think unless some concrete reason for this change can be shown, we should drop it.
Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS
On 19.04.24 11:47, Aleksander Alekseev wrote: Thanks. I see a few pieces of code that use special FOO_NUMBER enum values instead of a macro. Should we refactor these pieces accordingly? PFA another patch. I think this is a sensible improvement. But please keep the trailing commas on the last enum items.
Re: Logging which interface was connected to in log_line_prefix
On 06.03.24 16:59, Greg Sabino Mullane wrote: Someone on -general was asking about this, as they are listening on multiple IPs and would like to know which exact one clients were hitting. I took a quick look and we already have that information, so I grabbed some stuff from inet_server_addr and added it as part of a "%L" (for 'local interface'). Quick patch / POC attached. I was confused by this patch title. This feature does not log the interface (like "eth0" or "lo"), but the local address. Please adjust the terminology. I noticed that for Unix-domain socket connections, %r and %h write "[local]". I think that should be done for this new placeholder as well.
Re: Logging which interface was connected to in log_line_prefix
On 01.05.24 19:04, Greg Sabino Mullane wrote: Thank you for taking the time to review this. I've attached a new rebased version, which has no significant changes. There is a comment in the patch that states: /* We do not need clean_ipv6_addr here: just report verbatim */ I am not quite sure what it means, but I am guessing it means that the patch does not need to format the IPv6 addresses in any specific way. Yes, basically correct. There is a kluge (their word, not mine) in utils/adt/network.c to strip the zone - see the comment for the clean_ipv6_addr() function in that file. I added the patch comment in case some future person wonders why we don't "clean up" the ipv6 address, like other places in the code base do. We don't need to pass it back to anything else, so we can simply output the correct version, zone and all. clean_ipv6_addr() needs to be called before trying to convert a string representation into inet/cidr types. This is not what is happening here. So the comment is not applicable.
Re: Why is citext/regress failing on hamerkop?
On 2024-05-12 Su 01:34, Tom Lane wrote: BTW, I've also been wondering why hamerkop has been failing isolation-check in the 12 and 13 branches for the last six months or so. It is surely unrelated to this issue, and it looks like it must be due to some platform change rather than anything we committed at the time. Possibly. It looks like this might be the issue: +Connection 2 failed: could not initiate GSSAPI security context: Unspecified GSS failure. Minor code may provide more information: Credential cache is empty +FATAL: sorry, too many clients already There are several questions here, including: 1. why isn't it failing on later branches? 2. why isn't it failing on drongo (which has more modern compiler and OS)? I think we'll need the help of the animal owner to dig into the issue. I'm not planning on looking into that question myself, but really somebody ought to. Or is Windows just as dead as AIX, in terms of anybody being willing to put effort into supporting it? Well, this is more or less where I came in back in about 2002 :-) I've been trying to help support it ever since, mainly motivated by stubborn persistence than anything else. Still, I agree that the lack of support for the Windows port from Microsoft over the years has been more than disappointing. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: SQL:2011 application time
On Sun, 12 May 2024 at 05:26, Paul Jungwirth wrote: > On 5/9/24 17:44, Matthias van de Meent wrote: > > I haven't really been following this thread, but after playing around > > a bit with the feature I feel there are new gaps in error messages. I > > also think there are gaps in the functionality regarding the (lack of) > > support for CREATE UNIQUE INDEX, and attaching these indexes to > > constraints > Thank you for trying this out and sharing your thoughts! I think these are > good points about CREATE > UNIQUE INDEX and then creating the constraint by handing it an existing > index. This is something > that I am hoping to add, but it's not covered by the SQL:2011 standard, so I > think it needs some > discussion, and I don't think it needs to go into v17. Okay. > For instance you are saying: > > > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > > ERROR: access method "gist" does not support unique indexes > > To me that error message seems correct. The programmer hasn't said anything > about the special > temporal behavior they are looking for. But I showed that I had a GIST index that does have the indisunique flag set, which shows that GIST does support indexes with unique semantics. That I can't use CREATE UNIQUE INDEX to create such an index doesn't mean the feature doesn't exist, which is what the error message implies. > To get non-overlapping semantics from an index, this more > explicit syntax seems better, similar to PKs in the standard: Yes, agreed on that part. > > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during > WITHOUT OVERLAPS); > > ERROR: access method "gist" does not support unique indexes > > We could also support *non-temporal* unique GiST indexes, particularly now > that we have the stratnum > support function. Those would use the syntax you gave, omitting WITHOUT > OVERLAPS. But that seems > like a separate effort to me. No objection on that. Kind regards, Matthias van de Meent
Re: Comments about TLS (no SSLRequest) and ALPN
On Sat, 11 May 2024 at 21:36, AJ ONeal wrote: > > I just joined the mailing list and I don't know how to respond to old > messages. However, I have a few suggestions on the upcoming TLS and ALPN > changes. > > TL;DR > > Prefer TLS over SSLRequest or plaintext (from the start) > > ?sslmode=default # try tls, then sslrequest, then plaintext > ?sslmode=tls|tlsv1.3 # require tls, no fallback > ?sslmode=tls-noverify|tlsv1.3-noverify # require tls, ignore CA I'm against adding a separate mini configuration language within our options. > Allow the user to specify ALPN (i.e. for privacy or advanced routing) > > ?alpn=pg3|disable| > --alpn 'pg3|disable|' # same as curl, openssl > (I don't have much to argue against the long form "postgres/3" other than > that the trend is to keep it short and sweet and all mindshare (and SEO) for > "pg" is pretty-well captured by Postgres already) The "postgresql" alpn identifier has been registered, and I don't think it's a good idea to further change this unless you have good arguments as to why we'd need to change this. Additionally, I don't think psql needs to request any protocol other than Postgres' own protocol, so I don't see any need for an "arbitrary string" option, as it'd incorrectly imply that we support arbitrary protocols. > Rationales > > I don't really intend to sway anyone who has considered these things and > decided against them. My intent is just to shed light for any of these > aspects that haven't been carefully considered already. > > Prefer the Happy Path [...] > Postgres versions (naturally) take years to make it into mainstream LTS > server distros (without explicit user interaction anyway) Usually, the latest version is picked up by the LTS distro on release. Add a feature freeze window, and you're likely no more than 1 major version behind on launch. Using an LTS release for its full support window would then indeed imply a long time of using that version, but that's the user's choice for choosing to use the LTS distro. > Prefer Standard TLS > > As I experience it (and understand others to experience it), the one-time > round trip isn't the main concern for switch to standard TLS, it's the > ability to route and proxy. No, the one RTT saved is one of the main benefits here for both the clients and servers. Server *owners* may benefit by the improved routing capabilities, but we're not developing a database connection router, but database clients and servers. > Having an extra round trip (try TLS first, then SSLRequest) for increasingly > older versions of Postgres will, definitionally, become even less and less > important as time goes on. Yes. But right now, there are approximately 0 servers that use the latest (not even beta) version of PostgreSQL that supports direct SSL/TLS connections. So, for now, we need to support connecting to older databases, and I don't think we can just decide to regress those users' connections when they upgrade their client binaries. > Having postgres TLS/SNI/ALPN routable by default will just be more intuitive > (it's what I assumed would have been the default anyway), and help increase > adoption in cloud, enterprise, and other settings. AFAIK, there are very few companies that actually route PostgreSQL client traffic without a bouncer that load-balances the contents of those connections. While TLS/SNI/SLPN does bring benefits to these companies, I don't think the use of these features is widespread enough to default to a more expensive path for older server versions, and newer servers that can't or won't support direct ssl connections for some reason. > We live in the world of ACME / Let's Encrypt / ZeroSSL. Many proxies have > that built in. As such optimizing for unverified TLS takes the user down a > path that's just more difficult to begin with (it's easier to get a valid TLS > cert than it is to get a self-signed cert these days), and more nuanced > (upcoming implementors are accustomed to TLS being verified). It's easy to > document how to use the letsencrypt client with postgres. It will also be > increasingly easy to configure an ACME-enable proxy for postgres and not > worry about it in the server at all. I don't think we should build specifically to support decrypting connection proxies, and thus I don't think that proxy argument holds value. > With all that, there's still this issue of downgrade attacks that can't be > solved without a breaking change (or unless the user is skilled enough to > know to be explicit). I wish that could change with the next major version of > postgres - for the client to have to opt-in to insecure connections (I assume > that more and more TLS on the serverside will be handled by proxies). AFAIK, --sslmode=require already prevents downgrade attacks (assuming your ssl library does its job correctly). What more would PostgreSQL need to do? > I assume that more and more TLS on the serverside will be handled by proxies I see only negative va
Re: Requiring LLVM 14+ in PostgreSQL 18
On 24.04.24 01:43, Thomas Munro wrote: Rebased over ca89db5f. These patches look fine to me. The new cut-off makes sense, and it does save quite a bit of code. We do need to get the Cirrus CI Debian images updated first, as you had already written. As part of this patch, you also sneak in support for LLVM 18 (llvm-config-18, clang-18 in configure). Should this be a separate patch? And as I'm looking up how this was previously handled, I notice that this list of clang-NN versions was last updated equally sneakily as part of your patch to trim off LLVM <10 (820b5af73dc). I wonder if the original intention of that configure code was that maintaining the versioned list above clang-7/llvm-config-7 was not needed, because the unversioning programs could be used, or maybe because pkg-config could be used. It would be nice if we could get rid of having to update that.
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Hi! Attached draft version of fix for [1]. [1] https://www.postgresql.org/message-id/86b4f1e3-0b5d-315c-9225-19860d64d685%40gmail.com -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.comFrom ece01564aeb848bab2a61617412a1d175e45b934 Mon Sep 17 00:00:00 2001 From: Koval Dmitry Date: Sun, 12 May 2024 17:17:10 +0300 Subject: [PATCH v1 3/3] Fix for the search of temporary partition for the SPLIT SECTION operation --- src/backend/commands/tablecmds.c | 1 + src/test/regress/expected/partition_split.out | 8 src/test/regress/sql/partition_split.sql | 8 3 files changed, 17 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index fe66d9e58d..a5babcfbc6 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -21389,6 +21389,7 @@ ATExecSplitPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, * against concurrent drop, and mark stmt->relation as * RELPERSISTENCE_TEMP if a temporary namespace is selected. */ + sps->name->relpersistence = rel->rd_rel->relpersistence; namespaceId = RangeVarGetAndCheckCreationNamespace(sps->name, NoLock, NULL); diff --git a/src/test/regress/expected/partition_split.out b/src/test/regress/expected/partition_split.out index 461318db86..b1108c92a2 100644 --- a/src/test/regress/expected/partition_split.out +++ b/src/test/regress/expected/partition_split.out @@ -1569,6 +1569,14 @@ Partition constraint: ((i IS NOT NULL) AND (i >= 0) AND (i < 1)) Partition of: t FOR VALUES FROM (1) TO (2) Partition constraint: ((i IS NOT NULL) AND (i >= 1) AND (i < 2)) +DROP TABLE t; +-- Check that the search for a temporary partition is correct during +-- the SPLIT PARTITION operation. +CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a); +CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (2) ; +ALTER TABLE t SPLIT PARTITION tp_0 INTO + (PARTITION tp_0 FOR VALUES FROM (0) TO (1), + PARTITION tp_1 FOR VALUES FROM (1) TO (2)); DROP TABLE t; -- DROP SCHEMA partition_split_schema; diff --git a/src/test/regress/sql/partition_split.sql b/src/test/regress/sql/partition_split.sql index dc7424256e..7f231b0d39 100644 --- a/src/test/regress/sql/partition_split.sql +++ b/src/test/regress/sql/partition_split.sql @@ -939,6 +939,14 @@ ALTER TABLE t SPLIT PARTITION tp_0_2 INTO \d+ tp_1_2 DROP TABLE t; +-- Check that the search for a temporary partition is correct during +-- the SPLIT PARTITION operation. +CREATE TEMP TABLE t (a int) PARTITION BY RANGE (a); +CREATE TEMP TABLE tp_0 PARTITION OF t FOR VALUES FROM (0) TO (2) ; +ALTER TABLE t SPLIT PARTITION tp_0 INTO + (PARTITION tp_0 FOR VALUES FROM (0) TO (1), + PARTITION tp_1 FOR VALUES FROM (1) TO (2)); +DROP TABLE t; -- DROP SCHEMA partition_split_schema; -- 2.34.1
Re: An improved README experience for PostgreSQL
On 17.04.24 04:36, Nathan Bossart wrote: On Wed, Feb 28, 2024 at 02:21:49PM -0600, Nathan Bossart wrote: I see many projects have files like SECURITY.md, CODE_OF_CONDUCT.md, and CONTRIBUTING.md, and I think it would be relatively easy to add content to each of those for PostgreSQL, even if they just link elsewhere. Here's a first attempt at this. You can see some of the effects of these files at [0]. More information about these files is available at [1] [2] [3]. I figured we'd want to keep these pretty bare-bones to avoid duplicating information that's already available at postgresql.org, but perhaps it's worth filling them out a bit more. Anyway, I just wanted to gauge interest in this stuff. I don't know, I find these files kind of "yelling". It's fine to have a couple, but now it's getting a bit much, and there are more that could be added. If we want to enhance the GitHub experience, we can also add these files to the organization instead: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file
Re: SQL:2011 application time
On 5/12/24 05:55, Matthias van de Meent wrote: > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > ERROR: access method "gist" does not support unique indexes To me that error message seems correct. The programmer hasn't said anything about the special temporal behavior they are looking for. But I showed that I had a GIST index that does have the indisunique flag set, which shows that GIST does support indexes with unique semantics. That I can't use CREATE UNIQUE INDEX to create such an index doesn't mean the feature doesn't exist, which is what the error message implies. True, the error message is not really telling the truth anymore. I do think most people who hit this error are not thinking about temporal constraints at all though, and for non-temporal constraints it is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the *constraint*. So how about adding a hint, something like this?: ERROR: access method "gist" does not support unique indexes HINT: To create a unique constraint with non-overlap behavior, use ADD CONSTRAINT ... WITHOUT OVERLAPS. Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: Why is citext/regress failing on hamerkop?
On 2024-05-12 Su 08:26, Andrew Dunstan wrote: On 2024-05-12 Su 01:34, Tom Lane wrote: BTW, I've also been wondering why hamerkop has been failing isolation-check in the 12 and 13 branches for the last six months or so. It is surely unrelated to this issue, and it looks like it must be due to some platform change rather than anything we committed at the time. Possibly. It looks like this might be the issue: +Connection 2 failed: could not initiate GSSAPI security context: Unspecified GSS failure. Minor code may provide more information: Credential cache is empty +FATAL: sorry, too many clients already There are several questions here, including: 1. why isn't it failing on later branches? 2. why isn't it failing on drongo (which has more modern compiler and OS)? I think we'll need the help of the animal owner to dig into the issue. Aha! drongo doesn't have GSSAPI enabled. Will work on that. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Hot standby queries see transient all-zeros pages
XLogReadBufferForRedoExtended() precedes RestoreBlockImage() with RBM_ZERO_AND_LOCK. Per src/backend/storage/buffer/README: Once one has determined that a tuple is interesting (visible to the current transaction) one may drop the content lock, yet continue to access the tuple's data for as long as one holds the buffer pin. The use of RBM_ZERO_AND_LOCK is incompatible with that. See a similar argument at https://postgr.es/m/flat/5101.1328219...@sss.pgh.pa.us that led me to the cause. Adding a 10ms sleep just after RBM_ZERO_AND_LOCK, I got 2 failures in 7 runs of 027_stream_regress.pl, at Assert(ItemIdIsNormal(lpp)) in heapgettup_pagemode(). In the core file, lpp pointed into an all-zeros page. RestoreBkpBlocks() had been doing RBM_ZERO years before hot standby existed, but it wasn't a bug until queries could run concurrently. I suspect the fix is to add a ReadBufferMode specified as, "If the block is already in shared_buffers, do RBM_NORMAL and exclusive-lock the buffer. Otherwise, do RBM_ZERO_AND_LOCK." That avoids RBM_NORMAL for a block past the current end of the file. Like RBM_ZERO_AND_LOCK, it avoids wasting disk reads on data we discard. Are there other strategies to consider? I got here from a Windows CI failure, https://cirrus-ci.com/task/6247605141766144. That involved patched code, but adding the sleep suffices on Linux, with today's git master: --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -388,6 +388,8 @@ XLogReadBufferForRedoExtended(XLogReaderState *record, *buf = XLogReadBufferExtended(rlocator, forknum, blkno, get_cleanup_lock ? RBM_ZERO_AND_CLEANUP_LOCK : RBM_ZERO_AND_LOCK, prefetch_buffer); + if (!get_cleanup_lock) + pg_usleep(10 * 1000); page = BufferGetPage(*buf); if (!RestoreBlockImage(record, block_id, page)) ereport(ERROR,
Re: Weird test mixup
On Fri, May 10, 2024 at 10:04:17AM +0900, Michael Paquier wrote: > Attached is an updated patch for now, indented with a happy CI. I am > still planning to look at that a second time on Monday with a fresher > mind, in case I'm missing something now. This looks correct, and it works well in my tests. Thanks.
Re: elog/ereport VS misleading backtrace_function function address
On 07.05.24 09:43, Jakub Wartak wrote: NOTE: in case one will be testing this: one cannot ./configure with --enable-debug as it prevents the compiler optimizations that actually end up with the ".cold" branch optimizations that cause backtrace() to return the wrong address. Is that configuration useful? If you're interested in backtraces, wouldn't you also want debug symbols? Don't production builds use debug symbols nowadays as well? I recall speculating about whether we could somehow invoke gdb to get a more comprehensive and accurate backtrace, but I don't really have a concrete idea how that could be made to work. Oh no, I'm more about micro-fix rather than doing some bigger revolution. The patch simply adds this one instruction in macro, so that now backtrace_functions behaves better: 0x00773d28 <+105>: call 0x79243f 0x00773d2d <+110>: movzbl -0x12(%rbp),%eax << this ends up being added by the patch 0x00773d31 <+114>: call 0xdc1a0 I'll put that as for PG18 in CF, but maybe it could be backpatched too - no hard opinion on that (I don't see how that ERROR/FATAL path could cause any performance regressions) I'm missing a principled explanation of what this does. I just see that it sprinkles some no-op code to make this particular thing work a bit differently, but this might just behave differently with the next compiler next year. I'd like to see a bit more of an abstract explanation of what is happening here.
Re: 039_end_of_wal: error in "xl_tot_len zero" test
On Mon, 6 May 2024 at 15:06, Tom Lane wrote: > My best guess is that that changed the amount of WAL generated by > initdb just enough to make the problem reproduce on this animal. > However, why's it *only* happening on this animal? The amount of > WAL we generate isn't all that system-specific. I'd say that's a good theory as it's now passing again [1] after the recent system_views.sql change done in 521a7156ab. David [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skimmer&dt=2024-05-06%2017%3A43%3A38
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Sat, May 11, 2024 at 4:13 AM Mark Dilger wrote: > > On May 10, 2024, at 12:05 PM, Alexander Korotkov > > wrote: > > The only bt_target_page_check() caller is > > bt_check_level_from_leftmost(), which overrides state->target in the > > next iteration anyway. I think the patch is just refactoring to > > eliminate the confusion pointer by Peter Geoghegan upthread. > > I find your argument unconvincing. > > After bt_target_page_check() returns at line 919, and before > bt_check_level_from_leftmost() overrides state->target in the next iteration, > bt_check_level_from_leftmost() conditionally fetches an item from the page > referenced by state->target. See line 963. > > I'm left with four possibilities: > > > 1) bt_target_page_check() never gets to the code that uses "rightpage" > rather than "state->target" in the same iteration where > bt_check_level_from_leftmost() conditionally fetches an item from > state->target, so the change you're making doesn't matter. > > 2) The code prior to v2-0003 was wrong, having changed state->target in an > inappropriate way, causing the wrong thing to happen at what is now line 963. > The patch fixes the bug, because state->target no longer gets overwritten > where you are now using "rightpage" for the value. > > 3) The code used to work, having set up state->target correctly in the place > where you are now using "rightpage", but v2-0003 has broken that. > > 4) It's been broken all along and your patch just changes from wrong to > wrong. > > > If you believe (1) is true, then I'm complaining that you are relying far to > much on action at a distance, and that you are not documenting it. Even with > documentation of this interrelationship, I'd be unhappy with how brittle the > code is. I cannot easily discern that the two don't ever happen in the same > iteration, and I'm not at all convinced one way or the other. I tried to set > up some Asserts about that, but none of the test cases actually reach the new > code, so adding Asserts doesn't help to investigate the question. > > If (2) is true, then I'm complaining that the commit message doesn't mention > the fact that this is a bug fix. Bug fixes should be clearly documented as > such, otherwise future work might assume the commit can be reverted with only > stylistic consequences. > > If (3) is true, then I'm complaining that the patch is flat busted. > > If (4) is true, then maybe we should revert the entire feature, or have a > discussion of mitigation efforts that are needed. > > Regardless of which of 1..4 you pick, I think it could all do with more > regression test coverage. > > > For reference, I said something similar earlier today in another email to > this thread: > > This patch introduces a change that stores a new page into variable > "rightpage" rather than overwriting "state->target", which the old > implementation most certainly did. That means that after returning from > bt_target_page_check() into the calling function > bt_check_level_from_leftmost() the value in state->target is not what it > would have been prior to this patch. Now, that'd be irrelevant if nobody > goes on to consult that value, but just 44 lines further down in > bt_check_level_from_leftmost() state->target is clearly used. So the > behavior at that point is changing between the old and new versions of the > code, and I think I'm within reason to ask if it was wrong before the patch, > wrong after the patch, or something else? Is this a bug being introduced, > being fixed, or ... ? Thank you for your analysis. I'm inclined to believe in 2, but not yet completely sure. It's really pity that our tests don't cover this. I'm investigating this area. -- Regards, Alexander Korotkov
Re: Direct SSL connection with ALPN and HBA rules
On 11/05/2024 23:45, Jelte Fennema-Nio wrote: On Fri, 10 May 2024 at 15:50, Heikki Linnakangas wrote: New proposal: - Remove the "try both" mode completely, and rename "requiredirect" to just "direct". So there would be just two modes: "postgres" and "direct". On reflection, the automatic fallback mode doesn't seem very useful. It would make sense as the default, because then you would get the benefits automatically in most cases but still be compatible with old servers. But if it's not the default, you have to fiddle with libpq settings anyway to enable it, and then you might as well use the "requiredirect" mode when you know the server supports it. There isn't anything wrong with it as such, but given how much confusion there's been on how this all works, I'd prefer to cut this back to the bare minimum now. We can add it back in the future, and perhaps make it the default at the same time. This addresses points 2. and 3. above. and: - Only allow sslnegotiation=direct with sslmode=require or higher. This is what you, Jacob, wanted to do all along, and addresses point 1. Thoughts? Sounds mostly good to me. But I think we'd want to automatically increase sslmode to require if it is unset, but sslnegotation is set to direct. Similar to how we bump sslmode to verify-full if sslrootcert is set to system, but sslmode is unset. i.e. it seems unnecessary/unwanted to throw an error if the connection string only contains sslnegotiation=direct I find that error-prone. For example: 1. Try to connect to a server with direct negotiation: psql "host=foobar dbname=mydb sslnegotiation=direct" 2. It fails. Maybe it was an old server? Let's change it to sslnegotiation=postgres. 3. Now it succeeds. Great! You might miss that by changing sslnegotiation to 'postgres', or by removing it altogether, you not only made it compatible with older server versions, but you also allowed falling back to a plaintext connection. Maybe you're fine with that, but maybe not. I'd like to nudge people to use sslmode=require, not rely on implicit stuff like this just to make connection strings a little shorter. I'm not a fan of sslrootcert=system implying sslmode=verify-full either, for the same reasons. But at least "sslrootcert" is a clearly security-related setting, so removing it might give you a pause, whereas sslnegotition is about performance and compatibility. In v18, I'd like to make sslmode=require the default. Or maybe introduce a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we want to encourage encryption, that's the right way to do it. (I'd still recommend everyone to use an explicit sslmode=require in their connection strings for many years, though, because you might be using an older client without realizing it.) -- Heikki Linnakangas Neon (https://neon.tech)
Re: 039_end_of_wal: error in "xl_tot_len zero" test
David Rowley writes: > On Mon, 6 May 2024 at 15:06, Tom Lane wrote: >> My best guess is that that changed the amount of WAL generated by >> initdb just enough to make the problem reproduce on this animal. >> However, why's it *only* happening on this animal? The amount of >> WAL we generate isn't all that system-specific. > I'd say that's a good theory as it's now passing again [1] after the > recent system_views.sql change done in 521a7156ab. Hm. It occurs to me that there *is* a system-specific component to the amount of WAL emitted during initdb: the number of locales that "locale -a" prints translates directly to the number of rows inserted into pg_collation. So maybe skimmer has a locale set that's a bit different from anybody else's, and that's what let it see this issue. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
On Mon, May 13, 2024 at 12:26 AM Andrew Dunstan wrote: > Well, this is more or less where I came in back in about 2002 :-) I've been > trying to help support it ever since, mainly motivated by stubborn > persistence than anything else. Still, I agree that the lack of support for > the Windows port from Microsoft over the years has been more than > disappointing. I think "state of the Windows port" would make a good discussion topic at pgconf.dev (with write-up for those who can't be there). If there is interest, I could organise that with a short presentation of the issues I am aware of so far and possible improvements and smaller-things-we-could-drop-instead-of-dropping-the-whole-port. I would focus on technical stuff, not who-should-be-doing-what, 'cause I can't make anyone do anything. For citext_utf8, I pushed cff4e5a3. Hamerkop runs infrequently, so here's hoping for 100% green on master by Tuesday or so.
Re: Weird test mixup
On Sun, May 12, 2024 at 10:48:51AM -0700, Noah Misch wrote: > This looks correct, and it works well in my tests. Thanks. Thanks for looking. While looking at it yesterday I've decided to split the change into two commits, one for the infra and one for the module. While doing so, I've noticed that the case of a private area passed as NULL was not completely correct as memcpy would be undefined. The open item has been marked as fixed. -- Michael signature.asc Description: PGP signature
Re: Weird test mixup
On Sat, May 11, 2024 at 11:45:33AM +0500, Andrey M. Borodin wrote: > I see that you store condition in private_data. So "private" means > that this is a data specific to extension, do I understand it right? Yes, it is possible to pass down some custom data to the callbacks registered, generated in a module. One example would be more complex condition grammar, like a JSON-based thing. I don't really see the need for this amount of complexity in the tree yet, but one could do that outside the tree easily. > As long as I started anyway, I also want to ask some more stupid > questions: > 1. Where is the border between responsibility of an extension and > the core part? I mean can we define in simple words what > functionality must be in extension? Rule 0 I've been using here: keep the footprint on the backend as simple as possible. These have as absolute minimum requirement: - A function name. - A library name. - A point name. The private area contents and size are added to address the concurrency cases with runtime checks. I didn't see a strong use for that first, but Noah has been convincing enough with his use cases and the fact that the race between detach and run was not completely closed because we lacked consistency with the shmem hash table lookup. > 2. If we have some concurrency issues, why can't we just protect > everything with one giant LWLock\SpinLock. We have some locking > model instead of serializing access from enter until exit. This reduces the test infrastructure flexibility, because one may want to attach or detach injection points while a point is running. So it is by design that the LWLock protecting the shmem hash table is not hold when a point is running. This has been covered a bit upthread, and I want to be able to do that as well. -- Michael signature.asc Description: PGP signature
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
On Sat, May 11, 2024 at 5:00 PM Alexander Lakhin wrote: > 11.05.2024 07:25, Thomas Munro wrote: > > On Sat, May 11, 2024 at 4:00 PM Alexander Lakhin > > wrote: > >> 11.05.2024 06:26, Thomas Munro wrote: > >>> Perhaps a no-image, no-change registered buffer should not be > >>> including an image, even for XLR_CHECK_CONSISTENCY? It's actually > >>> useless for consistency checking too I guess, this issue aside, > >>> because it doesn't change anything so there is nothing to check. > >> Yes, I think something wrong is here. I've reduced the reproducer to: > > Does it reproduce if you do this? > > > > - include_image = needs_backup || (info & > > XLR_CHECK_CONSISTENCY) != 0; > > + include_image = needs_backup || > > + ((info & XLR_CHECK_CONSISTENCY) != 0 && > > +(regbuf->flags & REGBUF_NO_CHANGE) == 0); > > No, it doesn't (at least with the latter, more targeted reproducer). OK so that seems like a candidate fix, but ... > > Unfortunately the back branches don't have that new flag from 00d7fb5e > > so, even if this is the right direction (not sure, I don't understand > > this clean registered buffer trick) then ... but wait, why are there > > are no failures like this in the back branches (yet at least)? Does > > your reproducer work for 16? I wonder if something relevant changed > > recently, like f56a9def. CC'ing Michael and Amit K for info. > > Maybe it's hard to hit (autovacuum needs to process the index page in a > narrow time frame), but locally I could reproduce the issue even on > ac27c74de(~1 too) from 2018-09-06 (I tried several last commits touching > hash indexes, didn't dig deeper). ... we'd need to figure out how to fix this in the back-branches too. One idea would be to back-patch REGBUF_NO_CHANGE, and another might be to deduce that case from other variables. Let me CC a couple more people from this thread, which most recently hacked on this stuff, to see if they have insights: https://www.postgresql.org/message-id/flat/d2c31606e6bb9b83a02ed4835d65191b38d4ba12.camel%40j-davis.com
Re: SQL:2011 application time
On 5/5/24 20:01, jian he wrote: hi. I hope I understand the problem correctly. my understanding is that we are trying to solve a corner case: create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS)); insert into t values ('[1,2]','empty'), ('[1,2]','empty'); I think the entry point is ATAddCheckNNConstraint and index_create. in a chain of DDL commands, you cannot be sure which one (primary key constraint or check constraint) is being created first, you just want to make sure that after both constraints are created, then add a dependency between primary key and check constraint. so you need to validate at different functions (ATAddCheckNNConstraint, index_create) that these two constraints are indeed created, only after that we have a dependency linking these two constraints. I've attached a patch trying to solve this problem. the patch is not totally polished, but works as expected, and also has lots of comments. Thanks for this! I've incorporated it into the CHECK constraint patch with some changes. In particular I thought index_create was a strange place to change the conperiod value of a pg_constraint record, and it is not actually needed if we are copying that value correctly. Some other comments on the patch file: > N.B. we also need to have special care for case > where check constraint was readded, e.g. ALTER TYPE. > if ALTER TYPE is altering the PERIOD column of the primary key, > alter column of primary key makes the index recreate, check constraint recreate, > however, former interally also including add a check constraint. > so we need to take care of merging two check constraint. This is a good point. I've included tests for this based on your patch. > N.B. the check constraint name is hard-wired, so if you create the constraint > with the same name, PERIOD primary key cannot be created. Yes, it may be worth doing something like other auto-named constraints and trying to avoid duplicates. I haven't taken that on yet; I'm curious what others have to say about it. > N.B. what about UNIQUE constraint? See my previous posts on this thread about allowing 'empty' in UNIQUE constraints. > N.B. seems ok to not care about FOREIGN KEY regarding this corner case? Agreed. v3 patches attached, rebased to 3ca43dbbb6. Yours, -- Paul ~{:-) p...@illuminatedcomputing.comFrom 4f4428fb41ea79056a13e425826fdac9c7b5d349 Mon Sep 17 00:00:00 2001 From: "Paul A. Jungwirth" Date: Tue, 2 Apr 2024 15:39:04 -0700 Subject: [PATCH v3 1/2] Don't treat WITHOUT OVERLAPS indexes as unique in planner Because the special rangetype 'empty' never overlaps another value, it is possible for WITHOUT OVERLAPS tables to have two rows with the same key, despite being indisunique, if the application-time range is 'empty'. So to be safe we should not treat WITHOUT OVERLAPS indexes as unique in any proofs. This still needs a test, but I'm having trouble finding a query that gives wrong results. --- src/backend/optimizer/path/indxpath.c | 5 +++-- src/backend/optimizer/plan/analyzejoins.c | 6 +++--- src/backend/optimizer/util/plancat.c | 1 + src/include/nodes/pathnodes.h | 2 ++ 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index c0fcc7d78df..72346f78ebe 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3498,13 +3498,14 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel, /* * If the index is not unique, or not immediately enforced, or if it's - * a partial index, it's useless here. We're unable to make use of + * a partial index, or if it's a WITHOUT OVERLAPS index (so not + * literally unique), it's useless here. We're unable to make use of * predOK partial unique indexes due to the fact that * check_index_predicates() also makes use of join predicates to * determine if the partial index is usable. Here we need proofs that * hold true before any joins are evaluated. */ - if (!ind->unique || !ind->immediate || ind->indpred != NIL) + if (!ind->unique || !ind->immediate || ind->indpred != NIL || ind->hasperiod) continue; /* diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index c3fd4a81f8a..dc8327d5769 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -814,8 +814,8 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel) * For a plain relation, we only know how to prove uniqueness by * reference to unique indexes. Make sure there's at least one * suitable unique index. It must be immediately enforced, and not a - * partial index. (Keep these conditions in sync with - * relation_has_unique_index_for!) + * partial index, and not WITHOUT OVERLAPS (Keep these conditions + * in sync with relation_has_unique_index_for!) *
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov wrote: > On Sat, May 11, 2024 at 4:13 AM Mark Dilger > wrote: > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov > > > wrote: > > > The only bt_target_page_check() caller is > > > bt_check_level_from_leftmost(), which overrides state->target in the > > > next iteration anyway. I think the patch is just refactoring to > > > eliminate the confusion pointer by Peter Geoghegan upthread. > > > > I find your argument unconvincing. > > > > After bt_target_page_check() returns at line 919, and before > > bt_check_level_from_leftmost() overrides state->target in the next > > iteration, bt_check_level_from_leftmost() conditionally fetches an item > > from the page referenced by state->target. See line 963. > > > > I'm left with four possibilities: > > > > > > 1) bt_target_page_check() never gets to the code that uses "rightpage" > > rather than "state->target" in the same iteration where > > bt_check_level_from_leftmost() conditionally fetches an item from > > state->target, so the change you're making doesn't matter. > > > > 2) The code prior to v2-0003 was wrong, having changed state->target in an > > inappropriate way, causing the wrong thing to happen at what is now line > > 963. The patch fixes the bug, because state->target no longer gets > > overwritten where you are now using "rightpage" for the value. > > > > 3) The code used to work, having set up state->target correctly in the > > place where you are now using "rightpage", but v2-0003 has broken that. > > > > 4) It's been broken all along and your patch just changes from wrong to > > wrong. > > > > > > If you believe (1) is true, then I'm complaining that you are relying far > > to much on action at a distance, and that you are not documenting it. Even > > with documentation of this interrelationship, I'd be unhappy with how > > brittle the code is. I cannot easily discern that the two don't ever > > happen in the same iteration, and I'm not at all convinced one way or the > > other. I tried to set up some Asserts about that, but none of the test > > cases actually reach the new code, so adding Asserts doesn't help to > > investigate the question. > > > > If (2) is true, then I'm complaining that the commit message doesn't > > mention the fact that this is a bug fix. Bug fixes should be clearly > > documented as such, otherwise future work might assume the commit can be > > reverted with only stylistic consequences. > > > > If (3) is true, then I'm complaining that the patch is flat busted. > > > > If (4) is true, then maybe we should revert the entire feature, or have a > > discussion of mitigation efforts that are needed. > > > > Regardless of which of 1..4 you pick, I think it could all do with more > > regression test coverage. > > > > > > For reference, I said something similar earlier today in another email to > > this thread: > > > > This patch introduces a change that stores a new page into variable > > "rightpage" rather than overwriting "state->target", which the old > > implementation most certainly did. That means that after returning from > > bt_target_page_check() into the calling function > > bt_check_level_from_leftmost() the value in state->target is not what it > > would have been prior to this patch. Now, that'd be irrelevant if nobody > > goes on to consult that value, but just 44 lines further down in > > bt_check_level_from_leftmost() state->target is clearly used. So the > > behavior at that point is changing between the old and new versions of the > > code, and I think I'm within reason to ask if it was wrong before the > > patch, wrong after the patch, or something else? Is this a bug being > > introduced, being fixed, or ... ? > > Thank you for your analysis. I'm inclined to believe in 2, but not > yet completely sure. It's really pity that our tests don't cover > this. I'm investigating this area. It seems that I got to the bottom of this. Changing BtreeCheckState.target for a cross-page unique constraint check is wrong, but that happens only for leaf pages. After that BtreeCheckState.target is only used for setting the low key. The low key is only used for non-leaf pages. So, that didn't lead to any visible bug. I've revised the commit message to reflect this. So, the picture for the patches is the following now. 0001 – optimization, but rather simple and giving huge effect 0002 – refactoring 0003 – fix for the bug 0004 – better error reporting -- Regards, Alexander Korotkov v3-0002-amcheck-Refactoring-the-storage-of-the-last-visib.patch Description: Binary data v3-0004-amcheck-Report-an-error-when-the-next-page-to-a-l.patch Description: Binary data v3-0003-amcheck-Don-t-load-the-right-sibling-page-into-Bt.patch Description: Binary data v3-0001-amcheck-Optimize-speed-of-checking-for-unique-con.patch Description: Binary data
Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence
On Fri, May 10, 2024 at 02:23:09PM +0200, Alvaro Herrera wrote: > Ah, I ran 'git clean -dfx' and now it works correctly. I must have had > an incomplete rebuild. I am going to assume that this is an incorrect build. It seems to me that src/common/ was compiled with a past version not sufficient to make the new test pass as more bytes got pushed to the error output as the pre-855517307db8 code could point to some random junk. -- Michael signature.asc Description: PGP signature
Convert sepgsql tests to TAP
The sepgsql tests have not been integrated into the Meson build system yet. I propose to fix that here. One problem there was that the tests use a very custom construction where a top-level shell script internally calls make. I have converted this to a TAP script that does the preliminary checks and then calls pg_regress directly, without make. This seems to get the job done. Also, once you have your SELinux environment set up as required, the test now works fully automatically; you don't have to do any manual prep work. The whole thing is guarded by PG_TEST_EXTRA=sepgsql now. Some comments and questions: - Do we want to keep the old way to run the test? I don't know all the testing scenarios that people might be interested in, but of course it would also be good to cut down on the duplication in the test files. - Strangely, there was apparently so far no way to get to the build directory from a TAP script. They only ever want to read files from the source directory. So I had to add that. - If you go through the pre-test checks in contrib/sepgsql/test_sepgsql, I have converted most of these checks to the Perl script. Some of the checks are obsolete, because they check whether the database has been correctly initialized, which is now done by the TAP script anyway. One check that I wasn't sure about is the # 'psql' command must be executable from test domain The old test was checking the installation tree, which I guess could be set up in random ways. But do we need this kind of check if we are using a temporary installation? As mentioned in the patch, the documentation needs to be updated. This depends on the outcome of the question above whether we want to keep the old tests in some way. From 2f8e73932b1068caf696582487de9e100fcd46be Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 13 May 2024 07:55:55 +0200 Subject: [PATCH v1] Convert sepgsql tests to TAP Add a TAP test for sepgsql. This automates the previously required manual setup before the test. The actual tests are still run by pg_regress, but not called from within the TAP Perl script. TODO: remove the old test scripts? --- contrib/sepgsql/.gitignore | 3 + contrib/sepgsql/Makefile | 3 + contrib/sepgsql/meson.build | 11 +- contrib/sepgsql/t/001_sepgsql.pl | 248 +++ doc/src/sgml/regress.sgml| 11 ++ doc/src/sgml/sepgsql.sgml| 2 + meson.build | 1 + src/Makefile.global.in | 1 + 8 files changed, 279 insertions(+), 1 deletion(-) create mode 100644 contrib/sepgsql/t/001_sepgsql.pl diff --git a/contrib/sepgsql/.gitignore b/contrib/sepgsql/.gitignore index 31613e011f5..7cadb9419e9 100644 --- a/contrib/sepgsql/.gitignore +++ b/contrib/sepgsql/.gitignore @@ -1,7 +1,10 @@ /sepgsql.sql +# FIXME /sepgsql-regtest.fc /sepgsql-regtest.if /sepgsql-regtest.pp /tmp # Generated subdirectories +/log/ /results/ +/tmp_check/ diff --git a/contrib/sepgsql/Makefile b/contrib/sepgsql/Makefile index afca75b693f..5cc9697736c 100644 --- a/contrib/sepgsql/Makefile +++ b/contrib/sepgsql/Makefile @@ -15,6 +15,9 @@ OBJS = \ DATA_built = sepgsql.sql PGFILEDESC = "sepgsql - SELinux integration" +TAP_TESTS = 1 + +# FIXME # Note: because we don't tell the Makefile there are any regression tests, # we have to clean those result files explicitly EXTRA_CLEAN = -r $(pg_regress_clean_files) tmp/ *.pp sepgsql-regtest.if sepgsql-regtest.fc diff --git a/contrib/sepgsql/meson.build b/contrib/sepgsql/meson.build index 9544efe0287..5817ba30a58 100644 --- a/contrib/sepgsql/meson.build +++ b/contrib/sepgsql/meson.build @@ -40,4 +40,13 @@ contrib_targets += custom_target('sepgsql.sql', install_dir: contrib_data_args['install_dir'], ) -# TODO: implement sepgsql tests +tests += { + 'name': 'sepgsql', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'tap': { +'tests': [ + 't/001_sepgsql.pl', +], + }, +} diff --git a/contrib/sepgsql/t/001_sepgsql.pl b/contrib/sepgsql/t/001_sepgsql.pl new file mode 100644 index 000..82bba5774ce --- /dev/null +++ b/contrib/sepgsql/t/001_sepgsql.pl @@ -0,0 +1,248 @@ + +# Copyright (c) 2024, PostgreSQL Global Development Group + +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; + +use Test::More; + +if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsepgsql\b/) +{ + plan skip_all => + 'Potentially unsafe test sepgsql not enabled in PG_TEST_EXTRA'; +} + +note "checking selinux environment"; + +# matchpathcon must be present to assess whether the installation environment +# is OK. +note "checking for matchpathcon"; +if (system('matchpathcon -n . >/dev/null 2>&1') != 0) +{ + diag &1') != 0) +{ + diag &1') != 0) +{ + diag
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for v8-0003. == src/sgml/ref/alter_subscription.sgml 1. + + The two_phase parameter can only be altered when the + subscription is disabled. When altering the parameter from on + to off, the backend process checks prepared + transactions done by the logical replication worker and aborts them. + The text may be OK as-is, but I was wondering if it might be better to give a more verbose explanation. BEFORE ... the backend process checks prepared transactions done by the logical replication worker and aborts them. SUGGESTION ... the backend process checks for any incomplete prepared transactions done by the logical replication worker (from when two_phase parameter was still "on") and, if any are found, those are aborted. == src/backend/commands/subscriptioncmds.c 2. AlterSubscription - /* - * Since the altering two_phase option of subscriptions - * also leads to the change of slot option, this command - * cannot be rolled back. So prevent we are in the - * transaction block. + * If two_phase was enabled, there is a possibility the + * transactions has already been PREPARE'd. They must be + * checked and rolled back. */ BEFORE ... there is a possibility the transactions has already been PREPARE'd. SUGGESTION ... there is a possibility that transactions have already been PREPARE'd. ~~~ 3. AlterSubscription + /* + * Since the altering two_phase option of subscriptions + * (especially on->off case) also leads to the + * change of slot option, this command cannot be rolled + * back. So prevent we are in the transaction block. + */ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase = off)"); This comment is a bit vague and includes some typos, but IIUC these problems will already be addressed by the 0002 patch changes.AFAIK patch 0003 is only moving the 0002 comment. == src/test/subscription/t/099_twophase_added.pl 4. +# Check the case that prepared transactions exist on the subscriber node +# +# If the two_phase is altering from "on" to "off" and there are prepared +# transactions on the subscriber, they must be aborted. This test checks it. + Similar to the comment that I gave for v8-0002. I think there should be comment for the major test comment to distinguish it from comments for the sub-steps. ~~~ 5. +# Verify the prepared transaction are aborted because two_phase is changed to +# "off". +$result = $node_subscriber->safe_psql('postgres', +"SELECT count(*) FROM pg_prepared_xacts;"); +is($result, q(0), "prepared transaction done by worker is aborted"); + /the prepared transaction are aborted/any prepared transactions are aborted/ == Kind Regards, Peter Smith Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some comments for v8-0004 == 0.1 General - Patch name /SUBSCIRPTION/SUBSCRIPTION/ == 0.2 General - Apply FYI, there are whitespace warnings: git apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch:191: trailing whitespace. # command will abort the prepared transaction and succeed. warning: 1 line adds whitespace errors. == 0.3 General - Regression test fails The subscription regression tests are not working. ok 158 + publication 1187 ms not ok 159 + subscription 123 ms See review comments #4 and #5 below for the reason why. == src/sgml/ref/alter_subscription.sgml 1. The two_phase parameter can only be altered when the - subscription is disabled. When altering the parameter from on - to off, the backend process checks prepared - transactions done by the logical replication worker and aborts them. + subscription is disabled. Altering the parameter from on + to off will be failed when there are prepared + transactions done by the logical replication worker. If you want to alter + the parameter forcibly in this case, force_alter + option must be set to on at the same time. If + specified, the backend process aborts prepared transactions. 1a. That "will be failed when..." seems strange. Maybe say "will give an error when..." ~ 1b. Because "force" is a verb, I think true/false is more natural than on/off for this new boolean option. e.g. it acts more like a "flag" than a "mode". See all the other boolean options in CREATE SUBSCRIPTION -- those are mostly all verbs too and are all true/false AFAIK. == 2. CREATE SUBSCRIPTION For my previous review, two comments [v7-0004#2] and [v7-0004#3] were not addressed. Kuroda-san wrote: Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should we modify to accept and add the description in the doc? ~ Yes, that is what I am suggesting. IMO it is odd for the user to be able to ALTER a parameter that cannot be included in the CREATE SUBSCRIPTION in the first place. AFAIK there are no other parameters that behave that way. == src/backend/commands/subscriptioncmds.c 3. AlterSubscription + if (!opts.force_alter) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot alter %s when there are prepared transactions", + "two_phase = off"), + errhint("Resolve these transactions or set %s at the same time, and then try again.", + "force_alter = true"))); I think saying "at the same time" in the hint is unnecessary. Surely the user is allowed to set this parameter separately if they want to? e.g. ALTER SUBSCRIPTION sub SET (force_alter=true); ALTER SUBSCRIPTION sub SET (two_phase=off); == src/test/regress/expected/subscription.out 4. +-- fail - force_alter cannot be set alone +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); +ERROR: force_alter must be specified with two_phase This error cannot happen. You removed that error! == src/test/regress/sql/subscription.sql 5. +-- fail - force_alter cannot be set alone +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); Why is this being tested? You removed that error condition. == src/test/subscription/t/099_twophase_added.pl 6. +# Try altering the two_phase option to "off." The command will fail since there +# is a prepared transaction and the force option is not specified. +my $stdout; +my $stderr; + +($result, $stdout, $stderr) = $node_subscriber->psql( + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);"); +ok($stderr =~ /cannot alter two_phase = off when there are prepared transactions/, + 'ALTER SUBSCRIPTION failed'); /force option is not specified./'force_alter' option is not specified as true./ ~~~ 7. +# Verify the prepared transaction still exists +$result = $node_subscriber->safe_psql('postgres', +"SELECT count(*) FROM pg_prepared_xacts;"); +is($result, q(1), "prepared transaction still exits"); + TYPO: /exits/exists/ ~~~ 8. +# Alter the two_phase with the force_alter option. Apart from the above, the +# command will abort the prepared transaction and succeed. +$node_subscriber->safe_psql('postgres', +"ALTER SUBSCRIPTION regress_sub SET (two_phase = off, force_alter = true);"); +$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub ENABLE;"); + What does "Apart from the above" mean? Be more explicit. ~~~ 9. +# Verify the prepared transaction are aborted $result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;"); is($result, q(0), "prepared transaction done by worker is aborted"); /transaction are aborted/transaction was aborted/ == Response to my v7-0004 review -- https://www.postgresql.org/message-id/OSBPR01MB2552F738ACF1DA6838025C4FF5E62%40OSBPR01MB2
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for v8-0002. == Commit message 1. Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case; however, we must connect to the publisher and expressly change the parameter. The operation cannot be rolled back, and altering the parameter from "on" to "off" within a transaction is prohibited. ~ I think the difference between "off"-to"on" and "on"-to"off" needs to be explained in more detail. Specifically "already has a mechanism for it" seems very vague. == src/backend/commands/subscriptioncmds.c 2. /* - * The changed two_phase option of the slot can't be rolled - * back. + * Since the altering two_phase option of subscriptions + * also leads to the change of slot option, this command + * cannot be rolled back. So prevent we are in the + * transaction block. */ - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + if (!opts.twophase) + PreventInTransactionBlock(isTopLevel, + 2a. There is a typo: "So prevent we are". SUGGESTION (minor adjustment and typo fix) Since altering the two_phase option of subscriptions also leads to changing the slot option, this command cannot be rolled back. So prevent this if we are in a transaction block. ~ 2b. But, in my previous review [v7-0002#3] I asked if the comment could explain why this check is only needed for two_phase "on"-to-"off" but not for "off"-to-"on". That explanation/reason is still not yet given in the latest comment. ~~~ 3. /* - * Try to acquire the connection necessary for altering slot. + * Check the need to alter the replication slot. Failover and two_phase + * options are controlled by both the publisher (as a slot option) and the + * subscriber (as a subscription option). + */ + update_failover = replaces[Anum_pg_subscription_subfailover - 1]; + update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && + !opts.twophase); (This is similar to the previous comment). In my previous review [v7-0002#3a] I asked why update_two_phase is TRUE only if 'two-phase' is being updated "on"-to-"off", but not when it is being updated "off"-to-"on". That explanation/reason is still not yet given in the latest comment. == src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 4. - appendStringInfo(&cmd, "ALTER_REPLICATION_SLOT %s ( FAILOVER %s, TWO_PHASE %s )", - quote_identifier(slotname), - failover ? "true" : "false", - two_phase ? "true" : "false"); + appendStringInfo(&cmd, "ALTER_REPLICATION_SLOT %s ( ", + quote_identifier(slotname)); + + if (failover) + appendStringInfo(&cmd, "FAILOVER %s", + *failover ? "true" : "false"); + + if (failover && two_phase) + appendStringInfo(&cmd, ", "); + + if (two_phase) + appendStringInfo(&cmd, "TWO_PHASE %s", + *two_phase ? "true" : "false"); + + appendStringInfoString(&cmd, ");"); It will be better if that last line includes the extra space like I had suggested in [v7-0002#4a] so the result will have the same spacing as in the original code. e.g. + appendStringInfoString(&cmd, " );"); == src/test/subscription/t/099_twophase_added.pl 5. +# Check the case that prepared transactions exist on the publisher node. +# +# Since the two_phase is "off", then normally, this PREPARE will do nothing +# until the COMMIT PREPARED, but in this test, we toggle the two_phase to "on" +# again before the COMMIT PREPARED happens. This is a major test part so IMO this comment should have # like it had before, to distinguish it from all the sub-step comments. == My v7-0002 review - https://www.postgresql.org/message-id/CAHut%2BPtu_w_UCGR-5DbenA%2By7wRiA8QPi_ZP%3DCCJ3SGdTn-%3D%3Dg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Re: Use WALReadFromBuffers in more places
On Wed, May 8, 2024 at 9:51 AM Jingtang Zhang wrote: > > Hi, Bharath. I've been testing this. It's cool. Is there any way we could > monitor the hit rate about directly reading from WAL buffers by exporting > to some views? Thanks for looking into this. I used purpose-built patches for verifying the WAL buffers hit ratio, please check USE-ON-HEAD-Collect-WAL-read-from-file-stats.txt and USE-ON-PATCH-Collect-WAL-read-from-buffers-and-file-stats.txt at https://www.postgresql.org/message-id/CALj2ACU9cfAcfVsGwUqXMace_7rfSBJ7%2BhXVJfVV1jnspTDGHQ%40mail.gmail.com. In the long run, it's better to extend what's proposed in the thread https://www.postgresql.org/message-id/CALj2ACU_f5_c8F%2BxyNR4HURjG%3DJziiz07wCpQc%3DAqAJUFh7%2B8w%40mail.gmail.com. I haven't had a chance to dive deep into that thread yet, but a quick glance over v8 patch tells me that it hasn't yet implemented the idea of collecting WAL read stats for both walsenders and the backends reading WAL. If that's done, we can extend it for WAL read from WAL buffers. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: SQL:2011 application time
On 5/12/24 08:51, Paul Jungwirth wrote: On 5/12/24 05:55, Matthias van de Meent wrote: > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during); > ERROR: access method "gist" does not support unique indexes To me that error message seems correct. The programmer hasn't said anything about the special temporal behavior they are looking for. But I showed that I had a GIST index that does have the indisunique flag set, which shows that GIST does support indexes with unique semantics. That I can't use CREATE UNIQUE INDEX to create such an index doesn't mean the feature doesn't exist, which is what the error message implies. True, the error message is not really telling the truth anymore. I do think most people who hit this error are not thinking about temporal constraints at all though, and for non-temporal constraints it is still true. It's also true for CREATE INDEX, since WITHOUT OVERLAPS is only available on the *constraint*. So how about adding a hint, something like this?: ERROR: access method "gist" does not support unique indexes HINT: To create a unique constraint with non-overlap behavior, use ADD CONSTRAINT ... WITHOUT OVERLAPS. I thought a little more about eventually implementing WITHOUT OVERLAPS support for CREATE INDEX, and how it relates to this error message in particular. Even when that is done, it will still depend on the stratnum support function for the keys' opclasses, so the GiST AM itself will still have false amcanunique, I believe. Probably the existing error message is still the right one. The hint won't need to mention ADD CONSTRAINT anymore. It should still point users to WITHOUT OVERLAPS, and possibly the stratnum support function too. I think what we are doing for v17 is all compatible with that plan. Yours, -- Paul ~{:-) p...@illuminatedcomputing.com