Re: Log connection establishment timings
On 12.03.25 16:43, Melanie Plageman wrote: On Tue, Mar 11, 2025 at 6:27 PM Melanie Plageman wrote: I did more manual testing of my patches, and I think they are mostly ready for commit except for the IsConnectionBackend() macro (if we have something to change it to). I've committed this and marked it as such in the CF app. Thanks to everyone for the review. log_connections has been changed from a Boolean parameter to a string one, but a number of places in the documentation and in various pieces of test code still use the old values. I think it would be better if they were adjusted to the new style. There are two places in doc/src/sgml/config.sgml where log_connections=yes is used as an example. This is a relatively prominent place, so it should not use deprecated values. In src/backend/tcop/postgres.c, there is a call SetConfigOption("log_connections", "true", context, source); that could be adjusted. Various uses in test code: src/interfaces/libpq/t/005_negotiate_encryption.pl src/test/authentication/t/001_password.pl src/test/authentication/t/003_peer.pl src/test/authentication/t/005_sspi.pl src/test/authentication/t/007_pre_auth.pl src/test/kerberos/t/001_auth.pl src/test/ldap/t/001_auth.pl src/test/ldap/t/002_bindpasswd.pl src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl src/test/modules/oauth_validator/t/001_server.pl src/test/modules/oauth_validator/t/002_client.pl src/test/postmaster/t/002_connection_limits.pl src/test/postmaster/t/003_start_stop.pl src/test/recovery/t/013_crash_restart.pl src/test/recovery/t/022_crash_temp_files.pl src/test/recovery/t/032_relfilenode_reuse.pl src/test/recovery/t/037_invalid_database.pl src/test/ssl/t/SSL/Server.pm src/tools/ci/pg_ci_base.conf I suspect in some of these cases using one of the new more granular values would be appropriate? This could also serve as examples and testing of the parameter itself.
Re: Regression in statement locations
On Tue, May 20, 2025 at 09:58:04AM +0200, Anthonin Bonnefoy wrote: > Looking at the tests, there are 2 additionals empty DO blocks: > +DO $$ > +DECLARE BEGIN > +END $$; > > What's the point of those? They won't be visible in the output since > we have 'toplevel IS FALSE' in the pg_stat_statements calls and they > don't fit the "DO block --- multiple inner queries with separators". That's a copy-pasto. Will remove. > Other than that, the patch looks good. Thanks for the review, Anthonin and Jian. -- Michael signature.asc Description: PGP signature
Re: Make wal_receiver_timeout configurable per subscription
On Tue, 20 May 2025 at 03:16, Michael Paquier wrote: > > On Mon, May 19, 2025 at 11:19:48AM -0400, Robert Haas wrote: > > The advantage of Fujii-san's proposal is that it is very simple to > > implement. A subscription option would indeed be better, but it would > > also be considerably more complex. Why not start simple and if someone > > wants to do the work to add something more complicated, that is fine? > > Logically, adding that as an option of CREATE SUBSCRIPTION would just > be a duplication of what a connection strings are already able to do > with "options='-c foo=fooval'", isn't it? Although the value is set in the session that creates the subscription, it will not be used by the apply worker because the launcher process, which starts the apply worker after subscription creation, is unaware of session-specific settings. > It seems to me that the issue of downgrading wal_receiver_timeout to > become user-settable is if we're OK to allow non-superusers play with > it in the code path where it's used currently. Knowing that physical > WAL receivers are only spawned in a controlled manner by the startup > process, this does not sound like an issue. If we set the wal_receiver_timeout configuration using ALTER ROLE for the subscription owner's role, the apply worker will start with that value. However, any changes made via ALTER ROLE ... SET wal_receiver_timeout will not take effect for an already running apply worker unless the subscription is disabled and re-enabled. In contrast, this is handled automatically during CREATE SUBSCRIPTION, where parameter changes are detected. Regards, Vignesh
Re: PG 18 release notes draft committed
On Thu, May 15, 2025 at 02:02:02PM -0400, Álvaro Herrera wrote: > As related anecdote, the rename of columns in pg_stat_activity a few > years back caused some pg_upgrade pain, which IMO was understandable; > it's reasonable to call that kind of thing out as a possible > incompatibility, at least IMO. > > > The "Migration" section also doesn't list changes to the exported > > PostgreSQL functins, which has bitten me as extension developer several > > times. > > Hmm. I have thought about such issues, and I feel there are so many items that would be interesting to extension developers that adding just a few to the release notes would not be very helpful, and if we add all of them it would be distracting for the majority of users. It might be helpful for someone to write a wiki page specifically for that audience. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Andrei, > > ``` > > -- imagine replacing inefficient array_sample(array_agg(t), 10) > > -- with more efficient array_sample_reservoir(t, 10) > > SELECT (unnest(agg)).* AS k FROM > > ( SELECT array_sample(array_agg(t), 10) AS agg FROM ( > > ... here goes the subquery ... > > ) AS t > > ); > > ``` > > > > ... if only we supported such a column expansion for not registered > > records. Currently such a query fails with: > > > > ``` > > ERROR: record type has not been registered > > ``` > I know about this issue. Having resolved it in a limited number of local > cases (like FDW push-down of row types), I still do not have a universal > solution worth proposing upstream. Do you have any public implementation > of the array_sample_reservoir to play with? array_sample_reservoir() is purely a figment of my imagination at the moment. Semantically it does the same as array_sample(array_agg(t), N) except the fact that array_sample(..., N) requires the array to have at least N items. You can experiment with array_sample(array_agg(...), N) as long as the subquery returns much more than N rows. -- Best regards, Aleksander Alekseev
Re: Please update the pgconf.dev Unconference notes
On 5/20/25 08:49, Hannu Krosing wrote: > Hi to all note-takers > > (I added two who I *think* I remember took notes) > > Please upload the notes to the Unconference section in > https://wiki.postgresql.org/wiki/PGConf.dev_2025 > > I have also some random notes on scraps of paper from the two sessions > I attended and did not present and would like to add these if I think > something relevant is missing once the main notes are there > I only took notes during the "better logging" session, and I already posted those to the wiki. The discussion moved too quickly in different directions for me to catch all the details, so the notes have gaps etc. If others can improve that / clarify, that'd be great. regards -- Tomas Vondra
Re: Conflict detection for update_deleted in logical replication
On Fri, May 16, 2025 at 5:01 PM Amit Kapila wrote: > Please find some more comments on the 0001 patch: 1. We need to know about such transactions + * for conflict detection and resolution in logical replication. See + * GetOldestTransactionIdInCommit and its use. Do we need to mention resolution in the above sentence? This patch is all about detecting conflict reliably. 2. In wait_for_publisher_status(), we use remote_epoch, remote_nextxid, and remote_oldestxid to compute full transaction id's. Why can't we send FullTransactionIds for remote_oldestxid and remote_nextxid from publisher? If these are required, maybe a comment somewhere for that would be good. 3. /* + * Note it is important to set committs value after marking ourselves as + * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because + * we want to ensure all such transactions are finished before we allow + * the logical replication client to advance its xid which is used to hold + * back dead rows for conflict detection. See + * maybe_advance_nonremovable_xid. + */ + committs = GetCurrentTimestamp(); How does setting committs after setting DELAY_CHKPT_IN_COMMIT help in advancing client-side xid? IIUC, on client-side, we simply wait for such an xid to be finished based on the remote_oldestxid and remote_nextxid sent via the server. So, the above comment is not completely clear to me. I have updated this and related comments in the attached diff patch to make it clear. See if that makes sense to you. 4. In 0001's commit message, we have: "Furthermore, the preserved commit timestamps and origin data are essential for consistently detecting update_origin_differs conflicts." But it is not clarified how this patch helps in consistently detecting update_origin_differs conflict? 5. I have tried to add some more details in comments on why oldest_nonremovable_xid needs to be FullTransactionId. See attached. -- With Regards, Amit Kapila. diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 41fb4fc5025..ea508542745 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -2329,9 +2329,9 @@ RecordTransactionCommitPrepared(TransactionId xid, /* * Note it is important to set committs value after marking ourselves as * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because -* we want to ensure all such transactions are finished before we allow -* the logical replication client to advance its xid which is used to hold -* back dead rows for conflict detection. See +* we want to ensure all transactions that have acquired commit timestamp +* are finished before we allow the logical replication client to advance +* its xid which is used to hold back dead rows for conflict detection. See * maybe_advance_nonremovable_xid. */ committs = GetCurrentTimestamp(); diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index dfc5108da3c..6e1dc76744f 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1431,6 +1431,10 @@ RecordTransactionCommit(void) * without holding the ProcArrayLock, since we're the only one * modifying it. This makes checkpoint's determination of which xacts * are delaying the checkpoint a bit fuzzy, but it doesn't matter. +* +* Note, it is important to get the commit timestamp after marking the +* transaction in the commit critical section. See +* RecordTransactionCommitPrepared. */ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_IN_COMMIT) == 0); START_CRIT_SECTION(); diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h index 0fdf49a1938..7d1e2096232 100644 --- a/src/include/replication/worker_internal.h +++ b/src/include/replication/worker_internal.h @@ -99,7 +99,9 @@ typedef struct LogicalRepWorker * conditions. Such scenarios might occur if the replication slot is not * yet created by the launcher while the apply worker has already * initialized this field. During this period, a transaction ID wraparound -* could falsely make this ID appear as if it originates from the future. +* could falsely make this ID appear as if it originates from the future +* w.r.t the transaction ID stored in the slot maintained by launcher. See +* advance_conflict_slot_xmin. */ FullTransactionId oldest_nonremovable_xid; diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index 65ecf3280fb..c6f5ebceefd 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -133,10 +133,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend; * * Setting DELAY_CHK
Re: Regression in statement locations
Tested the patch and it looks good to me. Not that I thought it would fail, but I also confirmed the pgaudit case works as expected. ``` LOG: AUDIT: SESSION,10,2,DDL,CREATE TABLE,,,"CREATE TABLE do_table (""weird name"" INT)", LOG: AUDIT: SESSION,10,3,DDL,DROP TABLE,,,DROP table do_table, DO ``` -- Sami
Re: queryId constant squashing does not support prepared statements
> On Tue, May 20, 2025 at 11:03:52AM GMT, Dmitry Dolgov wrote: > > By the way, the new test cases for ARRAY lists are sent in the last > > patch of the series posted on this thread: > > https://www.postgresql.org/message-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4y > > > > These should be first in the list, IMO, so as it is possible to track > > what the behavior was before the new logic as of HEAD, and what the > > behavior would become after the new logic. > > Sure, I can reshuffle that. Here is it, but the results are pretty much expected. >From 8e22941aab88976631729ed43e3d4afaf616b691 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Tue, 20 May 2025 16:12:05 +0200 Subject: [PATCH v3 1/3] Extend ARRAY squashing tests Testing coverage for ARRAY expressions is not enough. Add more test cases, similar to already existing ones. --- .../pg_stat_statements/expected/squashing.out | 184 ++ contrib/pg_stat_statements/sql/squashing.sql | 60 ++ 2 files changed, 244 insertions(+) diff --git a/contrib/pg_stat_statements/expected/squashing.out b/contrib/pg_stat_statements/expected/squashing.out index 7b138af098c..730a52d6917 100644 --- a/contrib/pg_stat_statements/expected/squashing.out +++ b/contrib/pg_stat_statements/expected/squashing.out @@ -429,3 +429,187 @@ SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 (2 rows) +-- Nested arrays are squashed only at constants level +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT ARRAY[ +ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], +ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], +ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], +ARRAY[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] +]; + array +--- + {{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10},{1,2,3,4,5,6,7,8,9,10}} +(1 row) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; + query| calls ++--- + SELECT ARRAY[ +| 1 + ARRAY[$1 /*, ... */], +| + ARRAY[$2 /*, ... */], +| + ARRAY[$3 /*, ... */], +| + ARRAY[$4 /*, ... */] +| + ] | + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 +(2 rows) + +-- Relabel type +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT ARRAY[1::oid, 2::oid, 3::oid, 4::oid, 5::oid, 6::oid, 7::oid, 8::oid, 9::oid]; +array +- + {1,2,3,4,5,6,7,8,9} +(1 row) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; + query| calls ++--- + SELECT ARRAY[$1 /*, ... */::oid] | 1 + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 +(2 rows) + +-- Some casting expression are simplified to Const +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT ARRAY[ +('"1"')::jsonb, ('"2"')::jsonb, ('"3"')::jsonb, ('"4"')::jsonb, + ( '"5"')::jsonb, ( '"6"')::jsonb, ( '"7"')::jsonb, ( '"8"')::jsonb, + ( '"9"')::jsonb, ( '"10"')::jsonb +]; + array + + {"\"1\"","\"2\"","\"3\"","\"4\"","\"5\"","\"6\"","\"7\"","\"8\"","\"9\"","\"10\""} +(1 row) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; + query| calls ++--- + SELECT ARRAY[ +| 1 + ($1 /*, ... */)::jsonb+| + ] | + SELECT pg_stat_statements_reset() IS NOT NULL AS t | 1 +(2 rows) + +-- CoerceViaIO +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT ARRAY[ + 1::int4::casttesttype, 2::int4::casttesttype, 3::int4::casttesttype, + 4::int4::casttesttype, 5::int4::casttesttype, 6::int4::casttesttype, + 7::int4::casttesttype, 8::int4::casttesttype, 9::int4::casttesttype, + 10::int4::casttesttype, 11::int4::casttesttype +]; + array +--- + {1,2,3,4,5,6,7,8,9,10,11} +(1 row) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; + query
Re: PG 18 release notes draft committed
On Tue, May 20, 2025 at 05:15:54PM +0300, Nazir Bilal Yavuz wrote: > > As of the date of the commit, "Co-authored-by:" is listed as: > > > > https://wiki.postgresql.org/wiki/Commit_Message_Guidance > > > > "Co-authored-by:" is used by committers when they want to give full > > credit > > to the named individuals, but also indicate that they made > > significant > > changes. > > > > > A minor fix; I co-authored this with Thomas Munro, he is the actual > > > author. > > > > Uh, does this mean I should add Thomas Munro before or after your name, > > or remove your name and list only Thomas Munro? > > Sorry for taking your time, I did not know that. Then, I am okay with > how it is right now. No problem. The "Co-authored-by:" guidance was only written down in January of this year. I assume Thomas Munro was following that guidance when he wrote the commit message. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Proposal for enabling auto-vectorization for checksum calculations
Hi, On Tue, 20 May 2025 at 02:54, Matthew Sterrett wrote: > > Hello! Thanks for helping me with this. > I'm still trying to figure out what is going on with the Bookworm test > failures. I'm pretty sure this patchset should resolve all the issues > with the macOS build, but I don't think it will help the linux > failures unfortunately. You can see the failure at the artifacts -> 'log/tmp_install/log/install.log' file on the CI web page [1]. If you want to replicate that on your local: $ ./configure --with-llvm CLANG="ccache clang-16" $ make -s -j8 world-bin $ make -j8 check-world should be enough. I was able to replicate it with these commands. I hope these help. [1] https://cirrus-ci.com/task/4834162550505472 -- Regards, Nazir Bilal Yavuz Microsoft
Re: Log connection establishment timings
On Tue, May 20, 2025 at 4:52 AM Peter Eisentraut wrote: > log_connections has been changed from a Boolean parameter to a string > one, but a number of places in the documentation and in various pieces > of test code still use the old values. I think it would be better if > they were adjusted to the new style. > > There are two places in doc/src/sgml/config.sgml where > log_connections=yes is used as an example. This is a relatively > prominent place, so it should not use deprecated values. > > In earlier versions of my patch, I played around with replacing these references in the docs. I ended up not doing it because I wasn't sure we had consensus on deprecating the "on", "true", "yes" options and that we would continue to support them indefinitely. Thinking about it now, by no longer documenting "on" and "off", I was obviously deprecating them (not to mention removing support for log_connections = "y", "ye", etc). I'll write a patch to change these. > In src/backend/tcop/postgres.c, there is a call > > SetConfigOption("log_connections", "true", context, source); > > that could be adjusted. > Do you think the debug option should be 'all' or a list of the options covered by "true" (which is a subset of 'all')? > Various uses in test code: > > src/interfaces/libpq/t/005_negotiate_encryption.pl > src/test/authentication/t/001_password.pl > src/test/authentication/t/003_peer.pl > src/test/authentication/t/005_sspi.pl > src/test/authentication/t/007_pre_auth.pl > src/test/kerberos/t/001_auth.pl > src/test/ldap/t/001_auth.pl > src/test/ldap/t/002_bindpasswd.pl > src/test/modules/ldap_password_func/t/001_mutated_bindpasswd.pl > src/test/modules/oauth_validator/t/001_server.pl > src/test/modules/oauth_validator/t/002_client.pl > src/test/postmaster/t/002_connection_limits.pl > src/test/postmaster/t/003_start_stop.pl > src/test/recovery/t/013_crash_restart.pl > src/test/recovery/t/022_crash_temp_files.pl > src/test/recovery/t/032_relfilenode_reuse.pl > src/test/recovery/t/037_invalid_database.pl > src/test/ssl/t/SSL/Server.pm > src/tools/ci/pg_ci_base.conf > > I suspect in some of these cases using one of the new more granular > values would be appropriate? This could also serve as examples and > testing of the parameter itself. Yep, for these, some of them specifically parse and test output generated by one or more of the log_connections options. In those cases, obviously those options should be provided at a minimum. However, I assume, based on how I use the logs from TAP tests, that when a test fails, folks use the logging to understand more about what went wrong even when the log output is not specifically parsed and tested against. The reason I didn't change these was that I was unsure which of these tests would want which options. And I didn't want to risk increasing the volume of logging output by replacing any of them with "all". I can, of course, go through and make my best guess. Or I could just replace them with the equivalent subset of options. In the absence of clarity on this, I did add tests to 001_password.pl specifically exercising and testing combinations of log_connections options. - Melanie
Re: strange perf regression with data checksums
On Tue, May 20, 2025 at 8:43 AM Peter Geoghegan wrote: > I imagine bitmap index scans will be similar to plain index scans. To be clear, I meant that the magnitude of the problem will be similar. I do not mean that they aren't fixed by my v2. They should be fully fixed by v2. -- Peter Geoghegan
Re: generic plans and "initial" pruning
Hi Tom, On Tue, May 20, 2025 at 12:06 PM Tom Lane wrote: > My attention was drawn to commit 525392d57 after observing that > Valgrind complained about a memory leak in some code that commit added > to BuildCachedPlan(). I tried to make sense of said code so I could > remove the leak, and eventually arrived at the attached patch, which > is part of a series of leak-fixing things hence the high sequence > number. > > Unfortunately, the bad things I speculated about in the added comments > seem to be reality. The second attached file is a test case that > triggers > > TRAP: failed Assert("list_length(plan_list) == > list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087 > > because it adds a DO ALSO rule that causes the rewriter to generate > more PlannedStmts than it did before. > > This is quite awful, because it does more than simply break the klugy > (and undocumented) business about keeping the top-level List in a > different context. What it means is that any outside code that is > busy iterating that List is very fundamentally broken: it's not clear > what List index it ought to resume at, except that "the one it was at" > is demonstrably incorrect. > > I also don't really believe the (also undocumented) assumption that > such outside code is in between executions of PlannedStmts of the > List and hence can tolerate those being ripped out and replaced. > I have not attempted to build an example, because the one I have > seems sufficiently damning. But I bet that a recursive function > could be constructed in such a way that an outer execution is > still in progress when an inner call triggers UpdateCachedPlan. > > Another small problem (much more easily fixable than the above, > probably) is that summarily setting "plan->is_valid = true" > at the end is not okay. We could already have received an > invalidation that should result in marking the plan stale. > (Holding locks on the tables involved is not sufficient to > prevent that, as there are other sources of inval events.) Thanks for pointing out the hole in the current handling of CachedPlan->stmt_list. You're right that the approach of preserving the list structure while replacing its contents in-place doesn’t hold up when the rewriter adds or removes statements dynamically. There might be other cases that neither of us have tried. I don’t think that mechanism is salvageable. To address the issue without needing a full revert, I’m considering dropping UpdateCachedPlan() and removing the associated MemoryContext dance to preserve CachedPlan->stmt_list structure. Instead, the executor would replan the necessary query into a transient list of PlannedStmts, leaving the original CachedPlan untouched. That avoids mutating shared plan state during execution and still enables deferred locking in the vast majority of cases. There are two variants of this approach. In the simpler form, the transient PlannedStmt list exists only in executor-local memory and isn’t registered with the invalidation machinery. That might be acceptable in practice, since all referenced relations are locked at that point -- but it would mean any invalidation events delivered during execution are ignored. The more robust variant is to build a one-query standalone CachedPlan using something like GetTransientCachedPlanForQuery(), which I had proposed back in [1]. This gets added to a standalone_plan_list so that invalidation callbacks can still reach it. I dropped that design earlier [2] due to the cleanup overhead, but I’d be happy to bring it back in a simplified form if that seems preferable. One open question in either case is what to do if the number of PlannedStmts in the rewritten plan changes as with your example. Would it be reasonable to just go ahead and execute the additional statements from the transient plan, even though the original CachedPlan wouldn’t have known about them until the next use? That would avoid introducing any new failure behavior while still handling the invalidation correctly for the current execution. > It's possible that this code can be fixed, but I fear it's > going to involve some really fundamental redesign, which > probably shouldn't be happening after beta1. I think there > is no alternative but to revert for v18. ...Beyond that, I think I’ve run out of clean options for making deferred locking executor-local while keeping invalidation safe. I know you'd previously objected (with good reason) to making GetCachedPlan() itself run pruning logic to determine which partitions to lock -- and to the idea of carrying or sharing the result of that pruning back to the executor via interface changes in the path from plancache.c through its callers down to ExecutorStart(). So I’ve steered away from revisiting that direction. But if we’re not comfortable with either of the transient replanning options, then we may end up shelving the deferred locking idea entirely -- which would be unfortunate, given how much it helps
Re: PG 18 release notes draft committed
On Tue, May 20, 2025 at 03:46:44PM +0300, Nazir Bilal Yavuz wrote: > Hi, > > Thanks for working on this! > > On Fri, 2 May 2025 at 05:44, Bruce Momjian wrote: > > > > I will continue improving it until beta 1, and until the final release. > > I will probably add markup in 1-3 weeks. Let the feedback begin. ;-) > > + > + > +Add server variable file_copy_method to control the file copying > method (Nazir Bilal Yavuz) > +§ > + Uh, the commit is: commit f78ca6f3ebb Author: Thomas Munro Date: Tue Apr 8 20:52:47 2025 +1200 Introduce file_copy_method setting. It can be set to either COPY (the default) or CLONE if the system supports it. CLONE causes callers of copydir(), currently CREATE DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET TABLESPACE = ..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) to copy files instead of a read-write loop over the contents. CLONE gives the kernel the opportunity to share block ranges on copy-on-write file systems and push copying down to storage on others, depending on configuration. On some systems CLONE can be used to clone large databases quickly with CREATE DATABASE ... TEMPLATE=source STRATEGY=FILE_COPY. Other operating systems could be supported; patches welcome. Co-authored-by: Nazir Bilal Yavuz Reviewed-by: Robert Haas Reviewed-by: Ranier Vilela Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com As of the date of the commit, "Co-authored-by:" is listed as: https://wiki.postgresql.org/wiki/Commit_Message_Guidance "Co-authored-by:" is used by committers when they want to give full credit to the named individuals, but also indicate that they made significant changes. > A minor fix; I co-authored this with Thomas Munro, he is the actual author. Uh, does this mean I should add Thomas Munro before or after your name, or remove your name and list only Thomas Munro? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Regression in statement locations
On 5/20/25 07:34, Sami Imseih wrote: Tested the patch and it looks good to me. Not that I thought it would fail, but I also confirmed the pgaudit case works as expected. I also tested and everything looks good with the patch. I did a careful examination of the remaining diffs (there are quite a few) and in every case I consider them to be beneficial, i.e. they make the output more targeted and readable. I did not do a real code review, but I did notice that the test table column is called weird_name as in our tests. I would argue that since it is missing the quotes and space it is not really all that weird and should maybe get a normal name so developers in the future don't wonder what is weird about it. Thank you for this improvement and the quick fix! Regards, -David
Re: PG 18 release notes draft committed
Hi, On Tue, 20 May 2025 at 16:52, Bruce Momjian wrote: > > On Tue, May 20, 2025 at 03:46:44PM +0300, Nazir Bilal Yavuz wrote: > > Hi, > > > > Thanks for working on this! > > > > On Fri, 2 May 2025 at 05:44, Bruce Momjian wrote: > > > > > > I will continue improving it until beta 1, and until the final release. > > > I will probably add markup in 1-3 weeks. Let the feedback begin. ;-) > > > > + > > + > > +Add server variable file_copy_method to control the file copying > > method (Nazir Bilal Yavuz) > > +§ > > + > > Uh, the commit is: > > commit f78ca6f3ebb > Author: Thomas Munro > Date: Tue Apr 8 20:52:47 2025 +1200 > > Introduce file_copy_method setting. > > It can be set to either COPY (the default) or CLONE if the system > supports it. CLONE causes callers of copydir(), currently CREATE > DATABASE ... STRATEGY=FILE_COPY and ALTER DATABASE ... SET > TABLESPACE = > ..., to use copy_file_range (Linux, FreeBSD) or copyfile (macOS) > to copy > files instead of a read-write loop over the contents. > > CLONE gives the kernel the opportunity to share block ranges on > copy-on-write file systems and push copying down to storage on > others, > depending on configuration. On some systems CLONE can be used to > clone > large databases quickly with CREATE DATABASE ... TEMPLATE=source > STRATEGY=FILE_COPY. > > Other operating systems could be supported; patches welcome. > > Co-authored-by: Nazir Bilal Yavuz > Reviewed-by: Robert Haas > Reviewed-by: Ranier Vilela > Discussion: > https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com > > As of the date of the commit, "Co-authored-by:" is listed as: > > https://wiki.postgresql.org/wiki/Commit_Message_Guidance > > "Co-authored-by:" is used by committers when they want to give full > credit > to the named individuals, but also indicate that they made significant > changes. > > > A minor fix; I co-authored this with Thomas Munro, he is the actual author. > > Uh, does this mean I should add Thomas Munro before or after your name, > or remove your name and list only Thomas Munro? Sorry for taking your time, I did not know that. Then, I am okay with how it is right now. -- Regards, Nazir Bilal Yavuz Microsoft
Re: PG 18 release notes draft committed
Hi, Thanks for working on this! On Fri, 2 May 2025 at 05:44, Bruce Momjian wrote: > > I will continue improving it until beta 1, and until the final release. > I will probably add markup in 1-3 weeks. Let the feedback begin. ;-) + + +Add server variable file_copy_method to control the file copying method (Nazir Bilal Yavuz) +§ + A minor fix; I co-authored this with Thomas Munro, he is the actual author. -- Regards, Nazir Bilal Yavuz Microsoft
Re: RFC: Logging plan of the running query
On Sat, Apr 5, 2025 at 3:14 PM Atsushi Torikoshi wrote: On Thu, Apr 3, 2025 at 11:10 PM Robert Haas wrote: Do we really need ExecProcNodeOriginal? Can we find some way to reuse ExecProcNodeReal instead of making the structure bigger? I also wanted to implement this without adding elements to PlanState if possible, but I haven't found a good solution, so the patch uses ExecSetExecProcNode. I tackled this again and the attached patch removes ExecProcNodeOriginal from Planstate. Instead of adding a new field, this version builds the behavior into the existing wrapper function, ExecProcNodeFirst(). Since ExecProcNodeFirst() is already handling instrumentation-related logic, the patch has maybe become a bit more complex to accommodate both that and the new behavior. While it might make sense to introduce a more general mechanism that allows for stacking an arbitrary number of wrappers around ExecProcNode, I’m not sure it's possible or worth the added complexity—such layered wrapping doesn't seem like something we typically need. What do you think? -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 41944eb943f8f6b2fb731125ed0d50ad29bbd338 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Tue, 20 May 2025 22:01:40 +0900 Subject: [PATCH v45] Add function to log the plan of the currently running query Currently, we have to wait for the query execution to finish to know its plan either using EXPLAIN ANALYZE or auto_explain. This is not so convenient, for example when investigating long-running queries on production environments. To improve this situation, this patch adds pg_log_query_plan() function that requests to log the plan of the currently executing query. On receipt of the request, codes for logging plan is wrapped to every ExecProcNode under the current executing plan node, and when the executor runs one of ExecProcNode, the plan is actually logged. These wrappers are unwrapped when once the plan is logged. In this way, we can avoid adding the overhead which we'll face when adding CHECK_FOR_INTERRUPTS() like mechanisms in somewhere in executor codes safely. Our initial idea was to send a signal to the target backend process, which invokes EXPLAIN logic at the next CHECK_FOR_INTERRUPTS() call. However, we realized during prototyping that EXPLAIN is complex and may not be safely executed at arbitrary interrupt points. By default, only superusers are allowed to request to log the plans because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. --- contrib/auto_explain/auto_explain.c | 24 +- doc/src/sgml/func.sgml | 55 +++ src/backend/access/transam/xact.c| 13 + src/backend/catalog/system_functions.sql | 2 + src/backend/commands/Makefile| 1 + src/backend/commands/dynamic_explain.c | 387 +++ src/backend/commands/explain.c | 38 +- src/backend/commands/meson.build | 1 + src/backend/executor/execMain.c | 10 + src/backend/executor/execProcnode.c | 13 +- src/backend/storage/ipc/procsignal.c | 4 + src/backend/tcop/postgres.c | 4 + src/backend/utils/init/globals.c | 2 + src/include/catalog/pg_proc.dat | 6 + src/include/commands/dynamic_explain.h | 28 ++ src/include/commands/explain.h | 5 + src/include/commands/explain_state.h | 1 + src/include/executor/executor.h | 1 + src/include/miscadmin.h | 1 + src/include/storage/procsignal.h | 2 + src/include/tcop/pquery.h| 1 - src/test/regress/expected/misc_functions.out | 54 ++- src/test/regress/sql/misc_functions.sql | 41 +- 23 files changed, 648 insertions(+), 46 deletions(-) create mode 100644 src/backend/commands/dynamic_explain.c create mode 100644 src/include/commands/dynamic_explain.h diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index cd6625020a..e720ddf39f 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -15,6 +15,7 @@ #include #include "access/parallel.h" +#include "commands/dynamic_explain.h" #include "commands/explain.h" #include "commands/explain_format.h" #include "commands/explain_state.h" @@ -412,26 +413,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc) es->format = auto_explain_log_format; es->settings = auto_explain_log_settings; - ExplainBeginOutput(es); - ExplainQueryText(es, queryDesc); - ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length); - ExplainPrintPlan(es, queryDesc); - if (es->analyze && auto_explain_log_triggers) -ExplainPrintTriggers(es, queryDesc); - if (es->costs) -ExplainPrintJITSummary(es, quer
Re: Violation of principle that plan trees are read-only
Hi, On 2025-05-18 19:31:33 -0400, Tom Lane wrote: > While chasing down Valgrind leakage reports, I was disturbed > to realize that some of them arise from a case where the > executor scribbles on the plan tree it's given, which it is > absolutely not supposed to do: > > /* > * Initialize result tuple slot and assign its rowtype using the first > * RETURNING list. We assume the rest will look the same. > */ > mtstate->ps.plan->targetlist = (List *) linitial(returningLists); > > A bit of git archaeology fingers Andres' commit 4717fdb14, which we > can't easily revert since he later got rid of ExecAssignResultType > altogether. But I think we need to do something about it --- it's > purest luck that this doesn't cause serious problems in some cases. I have no idea what I was smoking at that time, this clearly is wrong. I think the reason it doesn't cause problems is that we're just using the first child's targetlist every time and we're just assigning the same value back every time. What's even more absurd is: Why do we even need to assign the result type at all? Before & after 4717fdb14. The planner ought to have already figured this out, no? It seems that if not, we'd have a problem anyway, who says the "calling" nodes (say a wCTE) can cope with whatever output we come up with? Except of course, there is exactly one case in our tests where the tupledescs aren't equal :( I've not dug fully into this, but I thought I should send this email to confirm that I'm looking into the issue. Greetings, Andres Freund
Re: Re: proposal: schema variables
On Thu, May 15, 2025 at 08:48:37AM +0200, Pavel Stehule wrote: > Hi > > fresh rebase I will again ask why this patch set is being reposted when there is no plan to apply it to git master? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Please update the pgconf.dev Unconference notes
On 20/5/2025 08:49, Hannu Krosing wrote: Hi to all note-takers (I added two who I *think* I remember took notes) Please upload the notes to the Unconference section in https://wiki.postgresql.org/wiki/PGConf.dev_2025 I have also some random notes on scraps of paper from the two sessions I attended and did not present and would like to add these if I think something relevant is missing once the main notes are there I am glad to see that, discussing pg_stat_statements, the community in Canada have come to the exact scope of issues as recently at PGConf.DE.2025. I recall we also concluded that moving to the core is too much and discussed the idea of letting another module filter queries/columns/length of the query string to store in pg_stat_statements through hooks - 'extend the extension'. -- regards, Andrei Lepikhov
Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION
Hi! On 31.03.25 13:22, Yugo Nagata wrote: > On Mon, 31 Mar 2025 20:00:57 +0900 > Yugo Nagata wrote: > >> Hi, >> >> I found that multiple sessions concurrently execute CREATE OR REPLACE >> FUNCTION >> for a same function, the error "tuple concurrently updated" is raised. This >> is >> an internal error output by elog, also the message is not user-friendly. >> >> I've attached a patch to prevent this internal error by locking an exclusive >> lock before the command and get the read tuple after acquiring the lock. >> Also, if the function has been removed during the lock waiting, the new entry >> is created. > I also found the same error is raised when concurrent ALTER FUNCTION commands > are > executed. I've added a patch to fix this in the similar way. > > Regards, > Yugo Nagata I just briefly tested this patch and it seems to work as expected for CREATE OF REPLACE FUNCTION: -- Session 1 (t1): postgres=# BEGIN; BEGIN postgres=*# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 1; END;$$; CREATE FUNCTION -- Session 2 (t2) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 2; END;$$; (wait) -- Session 3 (t3) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 3; END;$$; (wait) -- Session 4 (t4) postgres=# CREATE OR REPLACE FUNCTION f1() RETURNS INT LANGUAGE plpgsql AS $$ BEGIN RETURN 4; END;$$; CREATE FUNCTION (wait) -- Session 1 (t5) postgres=*# END; COMMIT at this point Sessions 2, 3, and 4 were released with: CREATE FUNCTION -- Session 1 (t6) postgres=# \sf f1 CREATE OR REPLACE FUNCTION public.f1() RETURNS integer LANGUAGE plpgsql AS $function$ BEGIN RETURN 4; END;$function$ So... it no longer shows the error message: ERROR: tuple concurrently updated I did the same for ALTER FUNCTION but I was unable to reproduce the error your reported. Could you provide your script? Best regards, Jim
Re: Add comment explaining why queryid is int64 in pg_stat_statements
Supporting unsigned INTs will be valuable for more than just this obviously, and if we do ever have that capability, we would likely want to go that route. I know I've been asked about why queryIds could be negative more than a few times. Until then, I think the patch being suggested is reasonable and cleaner. A few comments on the patches: 1/ this was missed in pg_stat_statements ./contrib/pg_stat_statements/pg_stat_statements.c:uint64 saved_queryId = pstmt->queryId; 2/ Should "DatumGetInt64(hash_any_extended(.." be turned into a macro since it's used in several places? Also, we can add a comment in the macro such as " Output the queryId as an int64 rather than a uint64, to match the BIGINT column used to emit queryId in system views " I think this comment is needed to address the confusion that is the original subject of this thread. Otherwise, the confusion will be moved from pg_stat_statements.c to queryjumblefuncs.c DoJumble(JumbleState *jstate, Node *node) { /* Jumble the given node */ @@ -208,9 +208,9 @@ DoJumble(JumbleState *jstate, Node *node) FlushPendingNulls(jstate); /* Process the jumble buffer and produce the hash value */ - return DatumGetUInt64(hash_any_extended(jstate->jumble, - jstate->jumble_len, - 0)); + return DatumGetInt64(hash_any_extended(jstate->jumble, + jstate->jumble_len, + 0)); } /* @@ -256,10 +256,10 @@ AppendJumbleInternal(JumbleState *jstate, const unsigned char *item, if (unlikely(jumble_len >= JUMBLE_SIZE)) { - uint64 start_hash; + int64 start_hash; - start_hash = DatumGetUInt64(hash_any_extended(jumble, - JUMBLE_SIZE, 0)); + start_hash = DatumGetInt64(hash_any_extended(jumble, + JUMBLE_SIZE, 0)); memcpy(jumble, &start_hash, sizeof(start_hash)); jumble_len = sizeof(start_hash); -- Sami Imseih Amazon Web Services (AWS)
Re: Re: proposal: schema variables
Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian escreveu: > I will again ask why this patch set is being reposted when there is no > plan to apply it to git master Too bad. I would love to have this functionality, from the user's point of view there are problems where it would solve them wonderfully. I don't know technically of what prevents it from being natively on core, but it would be great, it would definitely be. regards Marcos
Re: Capturing both IP address and hostname in the log
On Thu, Apr 10, 2025 at 12:00 PM Tom Lane wrote: > Melanie recently committed a patch (9219093ca) that purports to > generalize our log_connections logging ability: > > Convert the boolean log_connections GUC into a list GUC comprised of > the > connection aspects to log. > > This gives users more control over the volume and kind of connection > logging. > > The current log_connections options are 'receipt', 'authentication', > and > 'authorization'. The empty string disables all connection logging. > 'all' > enables all available connection logging. > > I wonder if it'd be reasonable to remove the separate log_hostname GUC > and fold it into this infrastructure, and while doing so make it > possible to log either or both of the client IP address and hostname. > (For that matter, I think there is interest in being able to capture > the server IP address too, cf 3516ea768. You might wish to log the > IP address only once, not in every log line.) Seems reasonable to me. I'd be willing to move such a thing forward but would want to see the feature requestors' specific desired behavior (e.g. an option for each of hostname, client ip address, and server address?). Also, if they write a WIP patch at least to config.sgml, it would also help me gauge how serious of a request it is or, rather, how satisfactory a solution a log_connections option is. Perhaps there are people who absolutely love log_hostname and don't want to see it deprecated as a separate GUC? I also think folding in log_disconnections as a "disconnection" option makes sense. I do have some anxiety that a very long list of options will anger users -- but I suppose that ship mostly sailed. - Melanie
Re: Violation of principle that plan trees are read-only
Hi, On 2025-05-20 13:04:46 -0400, Tom Lane wrote: > I wrote: > > I think we can just delete this assignment (and fix these comments). > > As attached. Largely makes sense, the only thing I see is that the !returningLists branch does: /* * We still must construct a dummy result tuple type, because InitPlan * expects one (maybe should change that?). */ mtstate->ps.plan->targetlist = NIL; which we presumably shouldn't do anymore either. It never changes anything afaict, but still. > I'm tempted to back-patch this: the plan tree damage seems harmless at > present, but maybe it'd become less harmless with future fixes. I am not sure, I can see arguments either way. There are *some* cases where this changes the explain output, but but the new output is more correct, I think: --- /tmp/a.out 2025-05-20 14:19:04.947945026 -0400 +++ /tmp/b.out 2025-05-20 14:19:12.878975702 -0400 @@ -18,7 +18,7 @@ QUERY PLAN - Update on public.part_abc - Output: part_abc_1.a, part_abc_1.b, part_abc_1.c + Output: a, b, c Update on public.part_abc_2 part_abc_1 -> Append Subplans Removed: 1 I suspect this is an argument for backpatching, not against - seems that deparsing could end up creating bogus output in cases where it could matter? Not sure if such cases are reachable via views (and thus pg_dump) or postgres_fdw, but it seems possible. Greetings, Andres Freund
Re: Violation of principle that plan trees are read-only
I wrote: > I think we can just delete this assignment (and fix these comments). As attached. I'm tempted to back-patch this: the plan tree damage seems harmless at present, but maybe it'd become less harmless with future fixes. regards, tom lane diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 46d533b7288..71bc174cfc9 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -4830,12 +4830,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) ExprContext *econtext; /* - * Initialize result tuple slot and assign its rowtype using the first - * RETURNING list. We assume the rest will look the same. + * Initialize result tuple slot and assign its rowtype using the plan + * node's declared targetlist, which the planner set up to be the same + * as the first RETURNING list. We assume the rest will produce + * compatible output. */ - mtstate->ps.plan->targetlist = (List *) linitial(returningLists); - - /* Set up a slot for the output of the RETURNING projection(s) */ ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual); slot = mtstate->ps.ps_ResultTupleSlot; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 150e9f060ee..8e1eb77dd49 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1097,9 +1097,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) /* * Set up the visible plan targetlist as being the same as - * the first RETURNING list. This is for the use of - * EXPLAIN; the executor won't pay any attention to the - * targetlist. We postpone this step until here so that + * the first RETURNING list. This is mostly for the use + * of EXPLAIN; the executor won't execute that targetlist, + * although it does use it to prepare the node's result + * tuple slot. We postpone this step until here so that * we don't have to do set_returning_clause_references() * twice on identical targetlists. */
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
On Mon, May 12, 2025 at 8:58 AM Peter Geoghegan wrote: > I wonder if we can fix this problem by getting rid of the old support > routine #5, "options". It currently doesn't do anything, and I always > thought it was strange that it was added "for uniformity" with other > index AMs. Attached patch completely removes the nbtree "options" support function, while changing the support function number of skip support: it becomes support function #5 (the number previously used by "options"). This patch should fix the regression that Tomas complained about in an expedient way. It's likely that somebody else will run into the same problem in the future, the next time that a new support function is needed. But I think that it makes sense to do this much now -- we need a short term solution for Postgres 18. Usually I would never suggest breaking compatibility like this, but, remarkably, we have never actually done anything with our current support function 5. It's not possible to break compatibility with code that can never be called in the first place, so I see no compatibility to preserve. Questions for Alexander about the "options" support function: * Why did you invent the whole idea of an "options" support function, given that it doesn't actually do anything? I get that it might be a good idea to add these kinds of functions in the future, but why didn't you wait until nbtree *actually had a use* for them? * I've removed some of the tests that you added, that (for whatever reason) cover nbtree specifically. The test from alter_generic.sql. There might be some kind of loss of test coverage. What do you think? -- Peter Geoghegan v1-0001-Remove-OPTIONS-support-proc-from-nbtree.patch Description: Binary data
Re: Please update the pgconf.dev Unconference notes
Hi Hannu, On Tue, May 20, 2025 at 08:49:28AM +0200, Hannu Krosing wrote: > (I added two who I *think* I remember took notes) > > Please upload the notes to the Unconference section in > https://wiki.postgresql.org/wiki/PGConf.dev_2025 > > I have also some random notes on scraps of paper from the two sessions > I attended and did not present and would like to add these if I think > something relevant is missing once the main notes are there Thanks Hannu for the reminder. I will add my notes in a few days, with as well as a short translation of the slides I was not able to show during the "TOAST fixing" session. These were just structure suggestions for the last bit case in toast headers, with a few notes written at 5AM the morning before the unconference. I would also be really interested in getting a copy of the slides you have yourself produced on the matter for the session. Could you send me a PDF in private at least? Let's say that you have convinced me in helping with a review of what you have planned, as far as I understand, for the v19 cycle on the matter with the OID -> TID lookup replacement logic for the external toast pointers. -- Michael signature.asc Description: PGP signature
Re: fixing CREATEROLE
On Tue, May 20, 2025 at 2:32 PM Robert Haas wrote: > trying to create a role that already exists. At this point, my head is > already kind of exploding, because I thought we were pretty careful to > try to make it so that pg_dump output can be restored without error even > in the face of pre-existing objects like the public schema and > the plpgsql language, but apparently we haven't applied the same principle > to pg_dumpall.[1] > This has always been my understanding, even if we are not explicitly stating it anywhere. pg_dump -> no errors. pg_dumpall -> always at least one error :) But if, as you posit above, we were to try running the output of pg_dumpall > through psql as a non-superuser, the problem is a whole lot > worse. I'm of the camp that pg_dumpall should almost always be run as superuser. That said, I find myself using pg_dumpall less and less with every year, and cannot think of the last time I advised a client to use it (other than a pg_dumpall --globals and ignore the errors as a poor-man's role duplication system. Even that is getting rarer, as we generally don't want the same passwords) Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support
Re: Regression in statement locations
On Tue, May 20, 2025 at 02:04:20PM +, David Steele wrote: > I did a careful examination of the remaining diffs (there are quite a few) > and in every case I consider them to be beneficial, i.e. they make the > output more targeted and readable. > > I did not do a real code review, but I did notice that the test table column > is called weird_name as in our tests. I would argue that since it is missing > the quotes and space it is not really all that weird and should maybe get a > normal name so developers in the future don't wonder what is weird about it. I have fixed that, as it is not a weird attribute, removed the unnedeed DO blocks in the tests as pointed out by Anthonin, and moved one comment as pointed out by Jian. > Thank you for this improvement and the quick fix! Yeah, thanks all for pointing out that sometimes my analysis of things can go off tracks. The fix has been applied now on HEAD. I've also checked the state of pgaudit on branch dev-pg18, with the regression getting fixed. Things look clear now, at least from my side. -- Michael signature.asc Description: PGP signature
Minor adjustment to pg_aios output naming
Hi, I've noticed a minor inconsistency in the output of pg_aios. According to the documentation, the values for 'operation' are described as: readv, a vectored read ... writev, a vectored write However, in actual output, they appear as read and write -- without the trailing v: =# select operation from pg_aios; .. operation | read While this discrepancy is unlikely to cause any real issues, it would be better to keep the naming consistent. I was a bit unsure which form to align to. There are currently no other types of read/write operations in pg_aios, so the shorter form might have been sufficient. However, using the 'v'-suffixed names makes the vectored nature of these operations explicit and future-proofs the naming in case other variants are introduced later. What do you think? Best regards, -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 71c5d0b8f1a67026e8db2e7ee27ce18f6e0689e0 Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 21 May 2025 10:50:53 +0900 Subject: [PATCH v1] Adjust pg_aios operation names to match documentation pg_aios output currently shows 'read' and 'write' for vectored I/O operations, while the documentation refers to them as 'readv' and 'writev'. This patch updates the output to use the 'v'-suffixed forms for consistency. --- src/backend/storage/aio/aio_io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/aio/aio_io.c b/src/backend/storage/aio/aio_io.c index 00e176135a..520b5077df 100644 --- a/src/backend/storage/aio/aio_io.c +++ b/src/backend/storage/aio/aio_io.c @@ -181,9 +181,9 @@ pgaio_io_get_op_name(PgAioHandle *ioh) case PGAIO_OP_INVALID: return "invalid"; case PGAIO_OP_READV: - return "read"; + return "readv"; case PGAIO_OP_WRITEV: - return "write"; + return "writev"; } return NULL;/* silence compiler */ base-commit: a6060f1cbec39575634043baeeaeb11e86042fa6 -- 2.43.0
Re: Assert("vacrel->eager_scan_remaining_successes > 0")
On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada wrote: > > On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada > wrote: > > > > However, there is a possibility that we have already eagerly scanned > > another page and returned it to the read stream before we freeze the > > eagerly-scanned page and disable the eager scan. In this case, the > > next block that we retrieved from the read stream could also be an > > eagerly-scanned page. > > I've added it to Open Items for v18. > > If I understand correctly, there's a potential scenario where we might > eagerly scan more pages than permitted by the success and failure > caps. One question is: what is the practical magnitude of this excess > scanning? If the overflow could be substantial (for instance, eagerly > scanning 30% of table pages), we should consider revising our eager > scanning mechanism. Thanks for reporting this. Sorry I missed it initially. I need to do some more investigation, but IIUC you are saying that this is an interaction between the read stream and eager scan code? I tried to confirm that was the case by setting io_combine_limit and maintenance_io_concurrency to 1, which should be similar behavior to not using the read stream WRT read combining etc. But, even doing so, your repro tripped the assert. What made you suspect an interaction with the read stream? - Melanie
Re: Assert("vacrel->eager_scan_remaining_successes > 0")
On Tue, May 20, 2025 at 3:22 PM Melanie Plageman wrote: > > > On Tue, May 20, 2025 at 5:23 PM Masahiko Sawada wrote: >> >> >> On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada >> wrote: >> > >> > However, there is a possibility that we have already eagerly scanned >> > another page and returned it to the read stream before we freeze the >> > eagerly-scanned page and disable the eager scan. In this case, the >> > next block that we retrieved from the read stream could also be an >> > eagerly-scanned page. >> >> I've added it to Open Items for v18. >> >> If I understand correctly, there's a potential scenario where we might >> eagerly scan more pages than permitted by the success and failure >> caps. One question is: what is the practical magnitude of this excess >> scanning? If the overflow could be substantial (for instance, eagerly >> scanning 30% of table pages), we should consider revising our eager >> scanning mechanism. > > > Thanks for reporting this. Sorry I missed it initially. > > I need to do some more investigation, but IIUC you are saying that this is an > interaction between the read stream and eager scan code? Yes. > I tried to confirm that was the case by setting io_combine_limit and > maintenance_io_concurrency to 1, which should be similar behavior to not > using the read stream WRT read combining etc. But, even doing so, your repro > tripped the assert. What made you suspect an interaction with the read stream? While I haven't identified how exactly read stream is related to this issue, what I've observed through debugging this issue is, during a single read_stream_next_buffer() call, I observed that heap_vac_scan_next_block() is invoked multiple times to return blocks to the read stream and that we continued processing eagerly scanned pages even after the success counter reaches zero while processing the previous block. Even when I set both io_combine_limit and maintenance_io_concurrency to 1, the debug logs showed that vacuum still performs eager scanning two blocks ahead of the current block being processed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make wal_receiver_timeout configurable per subscription
On Tue, May 20, 2025 at 2:13 AM vignesh C wrote: > > On Tue, 20 May 2025 at 03:16, Michael Paquier wrote: > > > > On Mon, May 19, 2025 at 11:19:48AM -0400, Robert Haas wrote: > > > The advantage of Fujii-san's proposal is that it is very simple to > > > implement. A subscription option would indeed be better, but it would > > > also be considerably more complex. Why not start simple and if someone > > > wants to do the work to add something more complicated, that is fine? > > > > Logically, adding that as an option of CREATE SUBSCRIPTION would just > > be a duplication of what a connection strings are already able to do > > with "options='-c foo=fooval'", isn't it? I think there is a difference in the point that Vignesh made below; the worker can detect wal_receiver_timeout change and restart. > > > It seems to me that the issue of downgrading wal_receiver_timeout to > > become user-settable is if we're OK to allow non-superusers play with > > it in the code path where it's used currently. Knowing that physical > > WAL receivers are only spawned in a controlled manner by the startup > > process, this does not sound like an issue. > > If we set the wal_receiver_timeout configuration using ALTER ROLE for > the subscription owner's role, the apply worker will start with that > value. However, any changes made via ALTER ROLE ... SET > wal_receiver_timeout will not take effect for an already running apply > worker unless the subscription is disabled and re-enabled. In > contrast, this is handled automatically during CREATE SUBSCRIPTION, > where parameter changes are detected. Right. But given changing wal_receiver_timeout doesn't happen frequently in practice I guess this would not be a big downside of the proposed idea. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Add comment explaining why queryid is int64 in pg_stat_statements
On Tue, May 20, 2025 at 11:28:17PM +1200, David Rowley wrote: > Certainly, a bit late, yes. It basically requires implementing > unsigned types on the SQL level. I believe there are a few sunken > ships along that coastline, and plenty of history in the archives if > you want to understand some of the difficulties. Providing some more context and some more information than this reply, the latest thread that I can find on the matter is this one, from December 2024: https://www.postgresql.org/message-id/CAN3gO4sPBKbfWYK10i294u3kzsfDb4WX891FMbjLnKjMS08u7A%40mail.gmail.com It summarizes nicely the situation and it contains some more general points. This particular point from Tom about numeric promotion and casting hierarchy resonates as a very good one: https://www.postgresql.org/message-id/3180774.1733588677%40sss.pgh.pa.us My own bet is that if you don't want to lose any existing functionality, perhaps we should be just more aggressive with casting requirements on input to begin with even if it means more work when writing queries, even if it comes at a loss of usability that should still be something.. FWIW, I've wanted an unsigned in-core type more than once. It would be a lot of work, but we have a lot of things that exist in the core code that map to unsigned 32b or 64b integer values. Just naming one, WAL LSN difference calculations. XLogRecPtr is a 64b unsigned value, not its representation at SQL level, meaning that we'd overflow after reaching the threshold of the bit tracking the signedness. It's true that a system would need to live long enough to reach such LSNs, but we have also 32b integers plugged here and there. Another one is block numbers, or OID which is an in-core type. Having an in-core unsigned integer type would scale better that creating types mapping to every single internal core concept. -- Michael signature.asc Description: PGP signature
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
On Mon, May 12, 2025 at 4:31 PM Dmitry Koval wrote: > > Hi! > > We (with colleagues) discussed further improvements to SPLIT/MERGE > PARTITION(S). As a result of the discussion, the following answers to > the questions remained: > > 1. Who should be the owner of new partitions (SPLIT PARTITION command) > if owner of the partition being split is not the same as the current user? > a) current user (since he is the one who creates new tables); > b) the owner of the partitioned partition (since it is the owner of the > table and should continue to own it). per https://www.postgresql.org/docs/current/sql-altertable.html "You must own the table to use ALTER TABLE." That means the current user must own the to be SPLITed partition. the new partitions owner should be current user (the one who did ALTER TABLE) > 2. Who should be the owner of the new partition (MERGE PARTITIONS > command) if the partitions being merged have different owners? > a) current user (since he is the one who creates new table); > b) merging of partitions should be forbidden if they have different owners. > let say: alter table whatever_range merge partitions ( whatever_range_c, whatever_range_de ) into whatever_range_cde; In this case, the current user must own whatever_range_c and whatever_range_de to perform this operation. and the current user will be the owner of whatever_range_cde.
Re: queryId constant squashing does not support prepared statements
> At the same time AFAICT there isn't much more code paths > to worry about in case of a LocationExpr as a node I can imagine there are others like value expressions, row expressions, json array expressions, etc. that we may want to also normalize. > I believe it's worth to not only to keep amount of work to support > LocationExpr as minimal as possible, but also impact on the existing > code. What I see as a problem is keeping such specific information as > the location boundaries in such a generic expression as A_Expr, where it > will almost never be used. Do I get it right, you folks are ok with > that? There are other examples of fields that are minimally used in other structs. Here is one I randomly spotted in SelectStmt such as SortClause, Limit options, etc. I think the IN list is quite a common case, otherwise we would not care as much as we do. There are other examples of fields that are minimally used in other structs. Here is one I randomly spotted in SelectStmt such as SortClause, Limit options, etc. I think the IN list is quite a common case, otherwise we would not care as much as we do. Adding another 8 bytes to the struts does not seem like that big of a problem to me, especially the structs will remain below 64 bytes. ``` (gdb) ptype /o A_Expr type = struct A_Expr { /* 0 | 4 */NodeTag type; /* 4 | 4 */A_Expr_Kind kind; /* 8 | 8 */List *name; /* 16 | 8 */Node *lexpr; /* 24 | 8 */Node *rexpr; /* 32 | 4 */ParseLoc location; /* XXX 4-byte padding */ /* total size (bytes): 40 */ } (gdb) ptype \o A_ArrayExpr Invalid character '\' in expression. (gdb) ptype /o A_ArrayExpr type = struct A_ArrayExpr { /* 0 | 4 */NodeTag type; /* XXX 4-byte hole */ /* 8 | 8 */List *elements; /* 16 | 4 */ParseLoc location; /* XXX 4-byte padding */ /* total size (bytes): 24 */ } ``` In general, Making something like T_LocationExpr as a query node seems totally wrong to me. It's not a node, but rather a temporary wrapper of some location information and it does not seem it has business being used by the time we get to thee expression transformations. It seems very odd considering location information are simple fields in the parse node itself. >I was a bit worried about not using a Node but Sami has reminded me > last week that we already have in gram.y the concept of using some > private structures to track intermediate results done by the parsing Attached is a sketch of what I mean. There is a private struct that tracks the list boundaries and this can wrap in_expr or whatever else makes sense in the future. +typedef struct ListWithBoundary +{ + Node *expr; + ParseLocstart; + ParseLocend; +} ListWithBoundary; + /* ConstraintAttributeSpec yields an integer bitmask of these flags: */ #define CAS_NOT_DEFERRABLE 0x01 #define CAS_DEFERRABLE 0x02 @@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); struct KeyAction *keyaction; ReturningClause *retclause; ReturningOptionKind retoptionkind; + struct ListWithBoundary *listwithboundary; } +%type in_expr The values are then added to start_location/end_location ParseLoc in A_ArrayExpr and A_Expr. Doing it this will keep changes to the parse_expr.c code to a minimum, only the IN transformation will need to set the values of the A_Expr into the final A_ArrayExpr. -- Sami Imseih Amazon Web Services (AWS) diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 0b5652071d1..bec24aab720 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -136,6 +136,13 @@ typedef struct KeyActions KeyAction *deleteAction; } KeyActions; +typedef struct ListWithBoundary +{ + Node *expr; + ParseLocstart; + ParseLocend; +} ListWithBoundary; + /* ConstraintAttributeSpec yields an integer bitmask of these flags: */ #define CAS_NOT_DEFERRABLE 0x01 #define CAS_DEFERRABLE 0x02 @@ -269,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); struct KeyAction *keyaction; ReturningClause *retclause; ReturningOptionKind retoptionkind; + struct ListWithBoundary *listwithboundary; } %typestmt toplevel_stmt schema_stmt routine_body_stmt @@ -523,8 +531,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type def_elem reloption_elem old_aggr_elem operator_def_elem %typedef_arg columnElem where_clause where_or_current_clause a_expr b_expr c_expr Aexp
Re: Re: proposal: schema variables
On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote: > This topic is difficult, because there is no common solution. SQL/PSM is > almost dead. T-SQL (and MySQL) design is weak and cannot be used for > security. > Oracle's design is joined with just one environment. And although almost > all widely used databases have supported session variables for decades, no > one design > is perfect. Proposed design is not perfect too (it introduces possible > ambiguity) , but I think it can support most wanted use cases (can be > enhanced in future), > and it is consistent with Postgres. There are more ways to reduce risk of > unwanted ambiguity to zero. But it increases the size of the patch. One thing that I keep hearing about this feature is that this would be really helpful for migration from Oracle to PostgreSQL, helping a lot with rewrites of pl/pgsql functions. There is one page on the wiki about private variables, dating back to 2016: https://wiki.postgresql.org/wiki/CREATE_PRIVATE_VARIABLE Perhaps it would help to summarize a bit the pros and cons of existing implementations to drive which implementation would be the most suited on a wiki page? The way standards are defined is by overwriting existing standards, and we don't have one in the SQL specification. It's hard to say if there will be one at some point, but if the main SQL products around the world have one, it pretty much is a point in favor of having a standard. Another possible angle that could be taken is to invest in a proposal for the SQL committee to consider, forcing an actual standard that PostgreSQL could rely on rather than having one behavior implemented to have it standard-incompatible a few years after. And luckily, we have Vik Fearing and Peter Eisentraut in this community who invest time and resources in this area. FWIW, I do agree with the opinion that if you want to propose rebased versions of this patch set on a periodic basis, you are free to do so. This is the core concept of an open community. In terms of my committer time, I'm pretty much already booked in terms of features planned for the next release, so I guess that I won't be able to invest time on this thread. Just saying. I know that this patch set has been discussed at FOSDEM at some point, so others may be able to comment more about that. That's just one opinion among many others. -- Michael signature.asc Description: PGP signature
Addition of %b/backend_type in log_line_prefix of TAP test logs
Hi all, While navigating through the logs in the CI, the buildfarm, or even my own machine, I've found myself spending a lot of time looking at very specific log entries to find out the PID of a specific process, sometimes mistaking a normal backend vs a logical WAL sender while looking for a PID, or just looking for an auxiliary process. I'd like to suggest the following change in Cluster.pm: - print $conf "log_line_prefix = '%m [%p] %q%a '\n"; + print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n"; It is then possible to match a backend_type with a PID. That increases the quantity of the logs, of course, but I'm finding that really helpful to have. But perhaps it's only me? Thanks, -- Michael diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 1c11750ac1d0..3bbd4a3f7548 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -684,7 +684,7 @@ sub init print $conf "\n# Added by PostgreSQL::Test::Cluster.pm\n"; print $conf "fsync = off\n"; print $conf "restart_after_crash = off\n"; - print $conf "log_line_prefix = '%m [%p] %q%a '\n"; + print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n"; print $conf "log_statement = all\n"; print $conf "log_replication_commands = on\n"; print $conf "wal_retrieve_retry_interval = '500ms'\n"; signature.asc Description: PGP signature
Re: POC: Parallel processing of indexes in autovacuum
On Thu, May 15, 2025 at 10:10 PM Daniil Davydov <3daniss...@gmail.com> wrote: > > Hi, > > On Fri, May 16, 2025 at 4:06 AM Matheus Alcantara > wrote: > > I've reviewed the v1-0001 patch, the build on MacOS using meson+ninja is > > failing: > > ❯❯❯ ninja -C build install > > ninja: Entering directory `build' > > [1/126] Compiling C object > > src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o > > FAILED: src/backend/postgres_lib.a.p/utils_misc_guc_tables.c.o > > ../src/backend/utils/misc/guc_tables.c:3613:4: error: incompatible > > pointer to integer conversion initializing 'int' with an expression of > > type 'void *' [-Wint-conversion] > > 3613 | NULL, > > | ^~~~ > > > > Thank you for reviewing this patch! > > > It seems that the "autovacuum_reserved_workers_num" declaration on > > guc_tables.c has an extra gettext_noop() call? > > Good catch, I fixed this warning in the v2 version. > > > > > One other point is that as you've added TAP tests for the autovacuum I > > think you also need to create a meson.build file as you already create > > the Makefile. > > > > You also need to update the src/test/modules/meson.build and > > src/test/modules/Makefile to include the new test/modules/autovacuum > > path. > > > > OK, I should clarify this moment : modules/autovacuum is not a normal > test but a sandbox - just an example of how we can trigger parallel > index autovacuum. Also it may be used for debugging purposes. > In fact, 001_autovac_parallel.pl is not verifying anything. > I'll do as you asked (add all meson and Make stuff), but please don't > focus on it. The creation of the real test is still in progress. (I'll > try to complete it as soon as possible). > > In this letter I will divide the patch into 2 parts : implementation > and sandbox. What do you think about implementation? Thank you for updating the patches. I have some comments on v2-0001 patch: + { + {"autovacuum_reserved_workers_num", PGC_USERSET, RESOURCES_WORKER_PROCESSES, + gettext_noop("Number of worker processes, reserved for participation in parallel index processing during autovacuum."), + gettext_noop("This parameter is depending on \"max_worker_processes\" (not on \"autovacuum_max_workers\"). " +"*Only* autovacuum workers can use these additional processes. " +"Also, these processes are taken into account in \"max_parallel_workers\"."), + }, + &av_reserved_workers_num, + 0, 0, MAX_BACKENDS, + check_autovacuum_reserved_workers_num, NULL, NULL + }, I find that the name "autovacuum_reserved_workers_num" is generic. It would be better to have a more specific name for parallel vacuum such as autovacuum_max_parallel_workers. This parameter is related to neither autovacuum_worker_slots nor autovacuum_max_workers, which seems fine to me. Also, max_parallel_maintenance_workers doesn't affect this parameter. Which number does this parameter mean to specify: the maximum number of parallel vacuum workers that can be used during autovacuum or the maximum number of parallel vacuum workers that each autovacuum can use? --- The patch includes the changes to bgworker.c so that we can reserve some slots for autovacuums. I guess that this change is not necessarily necessary because if the user sets the related GUC parameters correctly the autovacuum workers can use parallel vacuum as expected. Even if we need this change, I would suggest implementing it as a separate patch. --- + { + { + "parallel_idx_autovac_enabled", + "Allows autovacuum to process indexes of this table in parallel mode", + RELOPT_KIND_HEAP, + ShareUpdateExclusiveLock + }, + false + }, The proposed reloption name doesn't align with our naming conventions. Looking at our existing reloptions, we typically write out full words rather than using abbreviations like 'autovac' or 'idx'. The new reloption name seems not to follow the conventional naming style for existing reloption. For instance, we don't use abbreviations such as 'autovac' and 'idx'. I guess we can implement this parameter as an integer parameter so that the user can specify the number of parallel vacuum workers for the table. For example, we can have a reloption autovacuum_parallel_workers. Setting 0 (by default) means to disable parallel vacuum during autovacuum, and setting special value -1 means to let PostgreSQL calculate the parallel degree for the table (same as the default VACUUM command behavior). I've also considered some alternative names. If we were to use parallel_maintenance_workers, it sounds like it controls the parallel degree for all operations using max_parallel_maintenance_workers, including CREATE INDEX. Similarly, vacuum_parallel_workers could be interpreted as affecting both autovacuum and manual VACUUM commands, suggesting that when users run "VACUUM (PARALLEL) t", the sys
Re: Please update the pgconf.dev Unconference notes
Thanks a lot! On Tue, May 20, 2025 at 9:42 AM Tomas Vondra wrote: > > On 5/20/25 08:49, Hannu Krosing wrote: > > Hi to all note-takers > > > > (I added two who I *think* I remember took notes) > > > > Please upload the notes to the Unconference section in > > https://wiki.postgresql.org/wiki/PGConf.dev_2025 > > > > I have also some random notes on scraps of paper from the two sessions > > I attended and did not present and would like to add these if I think > > something relevant is missing once the main notes are there > > > > I only took notes during the "better logging" session, and I already > posted those to the wiki. The discussion moved too quickly in different > directions for me to catch all the details, so the notes have gaps etc. > If others can improve that / clarify, that'd be great. > > > regards > > -- > Tomas Vondra >
Re: Addition of %b/backend_type in log_line_prefix of TAP test logs
On Wed, May 21, 2025 at 8:51 AM Michael Paquier wrote: > > Hi all, > > While navigating through the logs in the CI, the buildfarm, or even my > own machine, I've found myself spending a lot of time looking at > very specific log entries to find out the PID of a specific process, > sometimes mistaking a normal backend vs a logical WAL sender while > looking for a PID, or just looking for an auxiliary process. > > I'd like to suggest the following change in Cluster.pm: > - print $conf "log_line_prefix = '%m [%p] %q%a '\n"; > + print $conf "log_line_prefix = '%m [%b,%p] %q%a '\n"; +1 to this change. Since pg_regress uses log_line_prefix = '%m %b[%p] %q%a ', isn't it better to use the same setting in TAP tests as well? Regards, -- Fujii Masao
Re: Minor adjustment to pg_aios output naming
On Wed, May 21, 2025 at 11:14:11AM +0900, torikoshia wrote: > Hi, > > I've noticed a minor inconsistency in the output of pg_aios. > > According to the documentation, the values for 'operation' are described as: > > readv, a vectored read > ... > writev, a vectored write > > However, in actual output, they appear as read and write -- without the > trailing v: > > =# select operation from pg_aios; > .. > operation | read > > While this discrepancy is unlikely to cause any real issues, it would be > better to keep the naming consistent. > > I was a bit unsure which form to align to. > There are currently no other types of read/write operations in pg_aios, so > the shorter form might have been sufficient. > However, using the 'v'-suffixed names makes the vectored nature of these > operations explicit and future-proofs > the naming in case other variants are introduced later. > > What do you think? --- a/src/backend/storage/aio/aio_io.c +++ b/src/backend/storage/aio/aio_io.c @@ -181,9 +181,9 @@ pgaio_io_get_op_name(PgAioHandle *ioh) case PGAIO_OP_INVALID: return "invalid"; case PGAIO_OP_READV: -return "read"; +return "readv"; case PGAIO_OP_WRITEV: -return "write"; +return "writev"; } I think that your suggestion of fix is right. The properties are marked as "WRITEV" and "READV" for vectored operations. So the documentation is right, not the name used in the code. Will fix, thanks for the report. -- Michael signature.asc Description: PGP signature
Re: Addition of %b/backend_type in log_line_prefix of TAP test logs
On Wed, May 21, 2025 at 11:41:18AM +0900, Fujii Masao wrote: > Since pg_regress uses log_line_prefix = '%m %b[%p] %q%a ', > isn't it better to use the same setting in TAP tests as well? Oh, right, good point. I've managed to miss this part in pg_regress.c. Using the same value makes even more sense. -- Michael signature.asc Description: PGP signature
Re: [BUG] Skipped initialization of some xl_xact_parsed_prepare fields
On Thu, May 15, 2025 at 8:30 PM Fujii Masao wrote: > > I suppose to add initialization of `nstats` and `nabortstats` to > > ParsePrepareRecord (see attached patch). > > LGTM. Barring any objection, I will commit this patch. I've pushed the patch. Thanks! Regards, -- Fujii Masao
Re: domain over virtual generated column
On Mon, Apr 28, 2025 at 10:45 AM jian he wrote: > > summary of attached patch: > v1-0001 > we need to compute the generation expression for the domain with constraints, > thus rename ExecComputeStoredGenerated to ExecComputeGenerated. > > v1-0002 > soft error variant of ExecPrepareExpr, ExecInitExpr. for soft error > processing > of CoerceToDomain. we don't want error messages like "value for domain d2 > violates check constraint "d2_check"" while validating existing domain data, > we > want something like: > ERROR: column "b" of table "gtest24" contains values that violate the > new constraint > > v1-0003 virtual generation columns over domain. hi. new patches attached. mainly code refactoring, also try to reduce some unnecessary regress tests. From ffd307605c1cfadd5a28090f90d54dceb75e6bc1 Mon Sep 17 00:00:00 2001 From: jian he Date: Wed, 21 May 2025 10:59:07 +0800 Subject: [PATCH v2 1/3] rename ExecComputeStoredGenerated to ExecComputeGenerated to support virtual generated column over domain type, we actually need compute the virtual generated expression. we did it at ExecComputeGenerated for stored, we can use it for virtual. so do the rename. discussion: https://postgr.es/m/CACJufxFNUHxuSE=g20c2alo3d+4t_j9h0x8esmirzcc4frl...@mail.gmail.com --- src/backend/commands/copyfrom.c| 4 ++-- src/backend/executor/execReplication.c | 8 src/backend/executor/nodeModifyTable.c | 20 ++-- src/include/executor/nodeModifyTable.h | 5 ++--- 4 files changed, 18 insertions(+), 19 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index fbbbc09a97b..906b6581e11 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1346,8 +1346,8 @@ CopyFrom(CopyFromState cstate) /* Compute stored generated columns */ if (resultRelInfo->ri_RelationDesc->rd_att->constr && resultRelInfo->ri_RelationDesc->rd_att->constr->has_generated_stored) - ExecComputeStoredGenerated(resultRelInfo, estate, myslot, - CMD_INSERT); + ExecComputeGenerated(resultRelInfo, estate, myslot, + CMD_INSERT); /* * If the target is a plain table, check the constraints of diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c index 53ddd25c42d..ca300ac0f00 100644 --- a/src/backend/executor/execReplication.c +++ b/src/backend/executor/execReplication.c @@ -587,8 +587,8 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo, /* Compute stored generated columns */ if (rel->rd_att->constr && rel->rd_att->constr->has_generated_stored) - ExecComputeStoredGenerated(resultRelInfo, estate, slot, - CMD_INSERT); + ExecComputeGenerated(resultRelInfo, estate, slot, + CMD_INSERT); /* Check the constraints of the tuple */ if (rel->rd_att->constr) @@ -684,8 +684,8 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo, /* Compute stored generated columns */ if (rel->rd_att->constr && rel->rd_att->constr->has_generated_stored) - ExecComputeStoredGenerated(resultRelInfo, estate, slot, - CMD_UPDATE); + ExecComputeGenerated(resultRelInfo, estate, slot, + CMD_UPDATE); /* Check the constraints of the tuple */ if (rel->rd_att->constr) diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 46d533b7288..ab41ad8a8bc 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -537,12 +537,12 @@ ExecInitGenerated(ResultRelInfo *resultRelInfo, } /* - * Compute stored generated columns for a tuple + * Compute generated columns for a tuple. + * we might support virtual generated column in future, currently not. */ void -ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo, - EState *estate, TupleTableSlot *slot, - CmdType cmdtype) +ExecComputeGenerated(ResultRelInfo *resultRelInfo, EState *estate, + TupleTableSlot *slot, CmdType cmdtype) { Relation rel = resultRelInfo->ri_RelationDesc; TupleDesc tupdesc = RelationGetDescr(rel); @@ -931,8 +931,8 @@ ExecInsert(ModifyTableContext *context, */ if (resultRelationDesc->rd_att->constr && resultRelationDesc->rd_att->constr->has_generated_stored) - ExecComputeStoredGenerated(resultRelInfo, estate, slot, - CMD_INSERT); + ExecComputeGenerated(resultRelInfo, estate, slot, + CMD_INSERT); /* * If the FDW supports batching, and batching is requested, accumulate @@ -1058,8 +1058,8 @@ ExecInsert(ModifyTableContext *context, */ if (resultRelationDesc->rd_att->constr && resultRelationDesc->rd_att->constr->has_generated_stored) - ExecComputeStoredGenerated(resultRelInfo, estate, slot, - CMD_INSERT); + ExecComputeGenerated(resultRelInfo, estate, slot, + CMD_INSERT); /* * Check any RLS WITH CHECK policies. @@ -2146,8 +2146,8 @@ ExecUpdatePrepareSlot(ResultRelInfo *re
Re: Assertion failure in smgr.c when using pg_prewarm with partitioned tables
On Mon, May 19, 2025 at 5:18 PM Masahiro Ikeda wrote: > > Thanks for your work and feedback! > > I've updated the patches and added regular regression tests for > both pg_prewarm and amcheck. Thanks for updating the patches! Regarding the 0001 patch: +CREATE TABLE test_part1 PARTITION OF test FOR VALUES FROM (1) TO (1000); +INSERT INTO test SELECT generate_series(1, 100); These lines don't seem necessary for the test. How about removing them? It would simplify the test and slightly reduce the execution time - though the time savings are very minimal. +-- To specify the relation which does not have storage should fail. This comment could be clearer as: "pg_prewarm() should fail if the target relation has no storage." + /* Check that the storage exists. */ Maybe rephrase to: "Check that the relation has storage."? Regarding the 0002 patch: - errdetail("Relation \"%s\" is a %s index.", -RelationGetRelationName(rel), NameStr(((Form_pg_am) GETSTRUCT(amtuprel))->amname; + errdetail("Relation \"%s\" is a %s %sindex.", +RelationGetRelationName(rel), NameStr(((Form_pg_am) GETSTRUCT(amtuprel))->amname), +(rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX) ? "partitioned " : ""))); Instead of manually building the message, how about using errdetail_relkind_not_supported()? It would be more consistent with similar error reporting elsewhere. > I agree with back-patching v3-0001. I was able to reproduce the issue > on the REL_17_STABLE branch. Thanks for the confirmation. > One concern is that this patch changes > the error message in production: > > * v17.5 (without --enable-cassert) > > ERROR: fork "main" does not exist for this relation > > * REL_17_STABLE with the v3-0001 patch (without --enable-cassert) > > ERROR: relation "test" does not have storage > > However, I think preventing the assertion failure should take priority. Yes. Regards, -- Fujii Masao
Re: Add comment explaining why queryid is int64 in pg_stat_statements
On Tue, May 20, 2025 at 10:35:39AM -0500, Sami Imseih wrote: > 1/ this was missed in pg_stat_statements > ./contrib/pg_stat_statements/pg_stat_statements.c:uint64 > saved_queryId = pstmt->queryId; Right. Some greps based on "queryid" and "query_id" point that this is the last reference to uint in the tree. > 2/ Should "DatumGetInt64(hash_any_extended(.." be turned into a > macro since it's used in > several places? Also, we can add a comment in the macro such as > " > Output the queryId as an int64 rather than a uint64, to match the > BIGINT column used to emit > queryId in system views > " We are talking about two places in queryjumblefuncs.c. Leaving the code as in David's original proposal is fine IMO. Adding a comment about the implied uint64 -> int64 conversion when calling hash_any_extended() sounds like a good idea to document what we want from the hash. I've had a closer look at David's patch, and I did not spot other places that required similar adjustments. I may have missed something, of course. David, how would you like to move on with this patch set? I can take responsibility for both patches as the plan ID change can qualify as an open item. -- Michael From 64df69093c52a17287aabb8551e8ab2b62d1b16a Mon Sep 17 00:00:00 2001 From: David Rowley Date: Tue, 20 May 2025 12:43:18 +1200 Subject: [PATCH v2 1/2] Change internal queryid type to int64 Previously, this was uint64, which perhaps was chosen in cff440d36 as the previous type was uint32 prior to that widening. Having this as uint64 does require us to have to cast the value in various places since the value is output in its signed form in various places, such as EXPLAIN VERBOSE and in the pg_stat_statements.queryid column. We should likely note down this change in the release notes "Source Code" section as some extensions may wish to adjust their code. Discussion: https://postgr.es/m/50cb0c8b-994b-48f9-a1c4-13039eb35...@eisentraut.org --- src/include/nodes/parsenodes.h| 2 +- src/include/nodes/plannodes.h | 2 +- src/include/utils/backend_status.h| 6 +-- src/backend/access/brin/brin.c| 2 +- src/backend/access/nbtree/nbtsort.c | 2 +- src/backend/commands/explain.c| 8 +--- src/backend/commands/vacuumparallel.c | 2 +- src/backend/nodes/gen_node_support.pl | 5 +++ src/backend/nodes/outfuncs.c | 6 +++ src/backend/nodes/queryjumblefuncs.c | 28 -- src/backend/nodes/readfuncs.c | 6 +++ src/backend/rewrite/rewriteHandler.c | 2 +- src/backend/tcop/postgres.c | 4 +- src/backend/utils/activity/backend_status.c | 12 +++--- src/backend/utils/adt/pgstatfuncs.c | 4 +- .../pg_stat_statements/pg_stat_statements.c | 38 +-- 16 files changed, 73 insertions(+), 56 deletions(-) diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 4610fc61293b..b976a40a75d6 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -128,7 +128,7 @@ typedef struct Query * might not be set; also not stored. This is the result of the query * jumble, hence ignored. */ - uint64 queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0)); + int64 queryId pg_node_attr(equal_ignore, query_jumble_ignore, read_write_ignore, read_as(0)); /* do I set the command result tag? */ bool canSetTag pg_node_attr(query_jumble_ignore); diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h index 658d76225e47..56cfdca074de 100644 --- a/src/include/nodes/plannodes.h +++ b/src/include/nodes/plannodes.h @@ -53,7 +53,7 @@ typedef struct PlannedStmt CmdType commandType; /* query identifier (copied from Query) */ - uint64 queryId; + int64 queryId; /* plan identifier (can be set by plugins) */ uint64 planId; diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h index 430ccd7d78e4..bbebe517501f 100644 --- a/src/include/utils/backend_status.h +++ b/src/include/utils/backend_status.h @@ -170,7 +170,7 @@ typedef struct PgBackendStatus int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; /* query identifier, optionally computed using post_parse_analyze_hook */ - uint64 st_query_id; + int64 st_query_id; /* plan identifier, optionally computed using planner_hook */ uint64 st_plan_id; @@ -321,7 +321,7 @@ extern void pgstat_clear_backend_activity_snapshot(void); /* Activity reporting functions */ extern void pgstat_report_activity(BackendState state, const char *cmd_str); -extern void pgstat_report_query_id(uint64 query_id, bool force); +extern void pgstat_report_query_id(int64 query_id, bool force); extern void pgstat_report_plan_id(uint64 plan_id, bool force); extern void pgstat_report_tempfile(size_t filesize); extern void pgstat_report_appname(const char *appname); @@ -329
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
On Wed, May 21, 2025 at 12:45 AM Masahiko Sawada wrote: > > On Mon, May 19, 2025 at 2:05 AM Amit Kapila wrote: > > > > On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada > > wrote: > > > > > > On Sat, May 10, 2025 at 7:08 AM Amit Kapila > > > wrote: > > > > > > > > > > > > Can we have a parameter like immediately_reserve in > > > > create_logical_slot API, similar to what we have for physical slots? > > > > We need to work out the details, but that should address the kind of > > > > use case you are worried about, unless I am missing something. > > > > > > Interesting idea. One concern in my mind is that in the use case I > > > mentioned above, users would need to carefully manage the extra > > > logical slot to keep the logical decoding active. The logical decoding > > > is deactivated on the standby as soon as users drop all logical slots > > > on the primary. > > > > > > > Yes, but the same is true for a physical slot in the case of physical > > replication used via primary_slot_name parameter. > > Could you elaborate on this? > I am trying to correlate with the case where standby no longer needs physical slot due to some reason like the standby machine failure, or say someone uses pg_createsubscriber on standby to make it subscriber, etc. In such a case, user needs to manually remove the physical slot on primary. There is difference in both cases but the point is one may need to manage physical slot as well. > > I recently had a discussion with Ashtosh at PGConf.dev regarding an > alternative approach: introducing a new command syntax such as "ALTER > SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he > outlined this proposed command as a means to modify specific GUC > parameters synchronously. The backend executing this command would > manage the transition, allowing users to interrupt the process via > Ctrl-C if necessary. In the specific context of wal_level change, this > command could be designed to reject operations like "ALTER SYSTEM > UPDATE wal_level TO 'minimal'" with an error, effectively preventing > undesirable wal_level transitions to or from 'minimal'. While this > approach shares similarities with our previous proposal of > implementing a dedicated SQL function for WAL level modifications, it > offers a more standardized interface for users. > > Though I find merit in this proposal, I remain uncertain about its > implementation details and whether it represents the optimal solution > for online wal_level changes, particularly given that our current > approach of automatic WAL level adjustment appears viable. > Yeah, I find the idea that the presence of a logical slot will allow the user to enable logical decoding/replication more appealing than this new alternative, leaving aside the challenges of realizing it. -- With Regards, Amit Kapila.
Re: Fix slot synchronization with two_phase decoding enabled
On Tue, May 6, 2025 at 4:52 PM Zhijie Hou (Fujitsu) wrote: > > On Mon, May 5, 2025 at 6:59 PM Amit Kapila wrote: > > > > On Sun, May 4, 2025 at 2:33 PM Masahiko Sawada > > wrote: > > > > > > While I cannot be entirely certain of my analysis, I believe the root > > > cause might be related to the backward movement of the confirmed_flush > > > LSN. The following scenario seems possible: > > > > > > 1. The walsender enables the two_phase and sets two_phase_at (which > > > should be the same as confirmed_flush). > > > 2. The slot's confirmed_flush regresses for some reason. > > > 3. The slotsync worker retrieves the remote slot information and > > > enables two_phase for the local slot. > > > > > > > Yes, this is possible. Here is my theory as to how it can happen in the > > current > > case. In the failed test, after the primary has prepared a transaction, the > > transaction won't be replicated to the subscriber as two_phase was not > > enabled for the slot. However, subsequent keepalive messages can send the > > latest WAL location to the subscriber and get the confirmation of the same > > from > > the subscriber without its origin being moved. Now, after we restart the > > apply > > worker (due to disable/enable for a subscription), it will use the previous > > origin_lsn to temporarily move back the confirmed flush LSN as explained in > > one of the previous emails in another thread [1]. During this temporary > > movement of confirm flush LSN, the slotsync worker fetches the two_phase_at > > and confirm_flush_lsn values, leading to the assertion failure. We see this > > issue intermittently because it depends on the timing of slotsync worker's > > request to fetch the slot's value. > > Based on this theory, I can reproduce the BF failure in the 040 tap-test on > HEAD after applying the 0001 patch. This is achieved by using the injection > point to stop the walsender from sending a keepalive before receiving the old > origin position from the apply worker, ensuring the confirmed_flush > consistently moves backward before slotsync. > > Additionally, I've reproduced the duplicate data issue on HEAD without > slotsync > using the attached script (after applying the injection point patch). This > issue arises if we immediately disable the subscription after the > confirm_flush_lsn moves backward, preventing the walsender from advancing the > confirm_flush_lsn. > > In this case, if a prepared transaction exists before two_phase_at, then after > re-enabling the subscription, it will replicate that prepared transaction when > decoding the PREPARE record and replicate that again when decoding the COMMIT > PREPARED record. In such cases, the apply worker keeps reporting the error: > > ERROR: transaction identifier "pg_gid_16387_755" is already in use. > > Apart from above, we're investigating whether the same issue can occur in > back-branches and will share the results once ready. > The issue was confirmed to occur on back branches as well, due to confirmed_flush_lsn moving backward. It has now been fixed on HEAD and all supported back-branches down to PG13. For details, refer to the separate thread [1]; the fix was committed (commit: ad5eaf3)[2]. The BF failure has not occurred since the fix, but we’ll continue to keep an eye. [1] https://www.postgresql.org/message-id/CAJpy0uDZ29P=BYB1JDWMCh-6wXaNqMwG1u1mB4=10ly0x7h...@mail.gmail.com [2] https://github.com/postgres/postgres/commit/ad5eaf390c58294e2e4c1509aa87bf13261a5d15 -- Thanks, Nisha
Re: fix: propagate M4 env variable to flex subprocess
Hi, On Tue, 13 May 2025 at 18:54, Andres Freund wrote: > > Hi, > > On 2025-05-12 23:14:59 -0400, J. Javier Maestro wrote: > > The pgflex wrapper runs flex with an explicit environment, so it doesn't > > inherit environment variables from the parent process. However, flex can > > use the M4 env variable and/or the PATH (via execvp) to find the m4 macro > > processor. > > > Thus, since flex honors the M4 env variable, it should be propagated to the > > subprocess environment if it's set in the parent environment. This patch > > fixes it. > > I think it probably was not intentional to fully specify the environment, > rather than just *adding* FLEX_TMP_DIR to the caller's environment. Bilal, I > think you wrote this originally, do you recall? I found the original pull request [1] but it does not include the 'FLEX_TMP_DIR' part. I tried to search where the 'FLEX_TMP_DIR' part is added [2], it seems that this part is added while rebasing so there is no specific commit. [1] https://github.com/anarazel/postgres/pull/51 [2] https://github.com/anarazel/postgres/commit/cd817afd4ab006b90307a940d96b5116d649165c -- Regards, Nazir Bilal Yavuz Microsoft
Re: Regression in statement locations
On Tue, May 20, 2025 at 5:59 AM Michael Paquier wrote: > Once I have fixed that, I've been a bit puzzled by the difference in > output in the tests of pg_overexplain, but I think that the new output > is actually the correct one: the queries whose plan outputs have > changed are passed as arguments of the explain_filter() function, > hence the location of the inner queries point at the start location of > the inner query instead of the start of the top-level query. Note > that if you add a semicolon at the end of these three queries in the > pg_overexplain tests, we finish with an end location reported. Indeed, by dumping more details on the query string and parsed string in pg_overexplain, the reported location does match the inner SELECT. This seems appropriate since it is for the planned select statement. Executor Parameter Types: none Query String: EXPLAIN (DEBUG, RANGE_TABLE, COSTS OFF) SELECT genus, array_agg(name ORDER BY name) FROM vegetables GROUP BY genus Parse Location: 41 to end Parsed String: SELECT genus, array_agg(name ORDER BY name) FROM vegetables GROUP BY genus Looking at the tests, there are 2 additionals empty DO blocks: +DO $$ +DECLARE BEGIN +END $$; What's the point of those? They won't be visible in the output since we have 'toplevel IS FALSE' in the pg_stat_statements calls and they don't fit the "DO block --- multiple inner queries with separators". Other than that, the patch looks good.
Re: wrong query results on bf leafhopper
On 5/20/25 07:50, David Rowley wrote: > On Tue, 20 May 2025 at 16:07, Tom Lane wrote: >> Failures like this one [1]: >> >> @@ -340,9 +340,13 @@ >> create function myinthash(myint) returns integer strict immutable language >>internal as 'hashint4'; >> NOTICE: argument type myint is only a shell >> +ERROR: ROWS is not applicable when function does not return a set >> >> are hard to explain as anything besides "that machine is quite >> broken". Whether it's flaky hardware, broken compiler, or what is >> undeterminable from here, but I don't believe it's our bug. So I'm >> unexcited about putting effort into it. > > There are certainly much fewer moving parts in PostgreSQL code for > that one as this failure doesn't seem to rely on anything stored in > any tables or the catalogues. > > I'd have thought it would be unlikely to be a compiler bug as wouldn't > that mean it'd fail every time? > > Are there any Prime95-like stress testers for ARM that could be run on > this machine? > > It would be good to kick this one out the pool if there's hardware issues. > There are tools like "stress" and "stressant", etc. Works on my rpi5, but depends on the packager. I'd probably just look at dmesg first. In my experience hardware issues are often pretty visible there - reports of failed I/O requests, thermal issues on the CPU, that kind of stuff. regards -- Tomas Vondra
Re: Should we optimize the `ORDER BY random() LIMIT x` case?
Nico, > > Can the detection of such queries be done by the yacc/bison parser > > grammar? > > Maybe the `sortby` rule could check if the expression is `random()`, > then `sort_clause` could check if `$3` is a one-item `sortby_list` of > just `random()` and mark `$$` as special -- this should be cheap, yes? > We'd still need to check for `LIMIT` somewhere else. Although partially implementing an optimizer on the parser level is possible, it doesn't strike me as a right long-term solution. Firstly, it's a hacky solution because maybe it will work for random() but for some reason will not work for -random()*2. Secondly, imagine adding a dozen optimizations like this in the upcoming years and all of them interacting with each other. Imagine you are the person who has to maintain this and not break things when adding another optimization. All in all, this would be a poor design choice in my opinion. -- Best regards, Aleksander Alekseev
Re: generic plans and "initial" pruning
On 5/20/25 05:06, Tom Lane wrote: > Amit Langote writes: >> Pushed after some tweaks to comments and the test case. > > My attention was drawn to commit 525392d57 after observing that > Valgrind complained about a memory leak in some code that commit added > to BuildCachedPlan(). I tried to make sense of said code so I could > remove the leak, and eventually arrived at the attached patch, which > is part of a series of leak-fixing things hence the high sequence > number. > > Unfortunately, the bad things I speculated about in the added comments > seem to be reality. The second attached file is a test case that > triggers > > ... FYI I added this as a PG18 open item: https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items regards -- Tomas Vondra
Re: Add comment explaining why queryid is int64 in pg_stat_statements
On Tue, 20 May 2025 at 18:12, Julien Rouhaud wrote: > I don't think it was discussed, but why not go the other way, keep everything > as uint64 and expose an uint64 datatype at the SQL level instead? That's > something I actually want pretty often so I would be happy to have that > natively, whether it's called bigoid, oid8 or anything else. That's probably > too late for pg18 though. Certainly, a bit late, yes. It basically requires implementing unsigned types on the SQL level. I believe there are a few sunken ships along that coastline, and plenty of history in the archives if you want to understand some of the difficulties. David
Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7
> On Tue, May 20, 2025 at 11:57 AM Dilip Kumar wrote: > > On Mon, May 5, 2025 at 11:19 AM DIPESH DHAMELIYA > wrote: > > > > Hello everyone, > > > > With the commit 556f7b7bc18d34ddec45392965c3b3038206bb62, Any plpgsql > > function that returns scalar value would not be able to use parallelism to > > evaluate a return statement. It will not be considered for parallel > > execution because we are passing maxtuples = 2 to exec_run_select from > > exec_eval_expr to evaluate the return expression of the function. > > > I could not find commit '556f7b7bc18d34ddec45392965c3b3038206bb62' in > git log on the master branch, but here is my analysis after looking at > your patch. Here is the github link to commit - https://github.com/postgres/postgres/commit/556f7b7bc18d34ddec45392965c3b3038206bb62 and discussion - https://www.postgresql.org/message-id/flat/20241206062549.710dc01cf91224809dd6c0e1%40sraoss.co.jp > > I don't think we can remove the 'maxtuples' parameter from > exec_run_select(). In this particular case, the query itself is > returning a single tuple, so we are good. Still, in other cases where > the query returns more tuples, it makes sense to stop the execution as > soon as we have got enough tuples otherwise, it will do the execution > until we produce all the tuples. Consider the below example where we > just need to use the first tuple, but if we apply your patch, the > executor will end up processing all the tuples, and it will impact the > performance. So IMHO, the benefit you get by enabling a parallelism > in some cases may hurt badly in other cases, as you will end up > processing more tuples than required. > > CREATE OR REPLACE FUNCTION get_first_user_email() > RETURNS TEXT AS $$ > DECLARE > user_email TEXT; > BEGIN > user_email = (SELECT email FROM users); > RETURN user_email; > END; > $$ LANGUAGE plpgsql; > I understand but aren't we blocking parallelism for genuine cases with a very complex query where parallelism can help to some extent to improve execution time? Users can always rewrite a query (for example using TOP clause) if they are expecting one tuple to be returned.
cookConstraint dead code
hi. in cookConstraint /* * Make sure no outside relations are referred to (this is probably dead * code now that add_missing_from is history). */ if (list_length(pstate->p_rtable) != 1) ereport(ERROR, (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), errmsg("only table \"%s\" can be referenced in check constraint", relname))); looks like it indeed is dead code. because we cannot use subquery in check constraint. if you want to reference another table, then you need explicit mention it, like: CREATE TABLE t3 (a int CHECK ( (a> v3.a))); v3. is a ColumnRef node. transformColumnRef, refnameNamespaceItem will error out if there is no table name specified in FROM clause.
Re: Log connection establishment timings
> On 20 May 2025, at 10:52, Peter Eisentraut wrote: > src/test/ssl/t/SSL/Server.pm While I don't have an immediate usecase or test in mind, having the SSL tests use log_conections=all could be handy when hacking on the TLS support. -- Daniel Gustafsson
Re: Conflict detection for update_deleted in logical replication
On Tue, May 20, 2025 at 8:38 AM shveta malik wrote: > > Please find few more comments: > > 1) > ProcessStandbyPSRequestMessage: > + /* > + * This shouldn't happen because we don't support getting primary status > + * message from standby. > + */ > + if (RecoveryInProgress()) > + elog(ERROR, "the primary status is unavailable during recovery"); > > > a) This error is not clear. Is it supposed to be user oriented error > or internal error? As a user, it is difficult to interpret this error > and take some action. > > b) What I understood is that there is no user of enabling > 'retain_conflict_info' for a subscription which is subscribing to a > publisher which is physical standby too. So shall we emit such an > ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that > would need checking whether the publisher is physical standby or not > during CREATE-SUB. Is that a possibility? > > 2) > > -- > send_feedback(last_received, requestReply, requestReply); > > + maybe_advance_nonremovable_xid(&data, false); > + > /* > * Force reporting to ensure long idle periods don't lead to > * arbitrarily delayed stats. Stats can only be reported outside > -- > > Why do we need this call to 'maybe_advance_nonremovable_xid' towards > end of LogicalRepApplyLoop() i.e. the last call? Can it make any > further 'data.phase' change here? IIUC, there are 2 triggers for > 'data.phase' change through LogicalRepApplyLoop(). First one is for > the very first time when we start this loop, it will set data.phase to > 0 (RCI_GET_CANDIDATE_XID) and will call > maybe_advance_nonremovable_xid(). After that, LogicalRepApplyLoop() > function can trigger a 'data.phase' change only when it receives a > response from the publisher. Shouldn't the first 4 calls > to maybe_advance_nonremovable_xid() from LogicalRepApplyLoop() suffice? > > 3) > Code is almost the same for GetOldestActiveTransactionId() and > GetOldestTransactionIdInCommit(). I think we can merge these two. > GetOldestActiveTransactionId() can take new arg "getTxnInCommit". > GetOldestTransactionIdInCommit() can call > GetOldestActiveTransactionId() with that arg as true, whereas other 2 > callers can pass it as false. > Few more comments mostly for patch001: 4) For this feature, since we are only interested in remote UPDATEs happening concurrently, so shall we ask primary to provide oldest "UPDATE" transaction-id in commit-phase rather than any operation's transaction-id? This may avoid unnecessarily waiting and pinging at subscriber's end in order to advance oldest_nonremovable-xid. Thoughts? 5) + +/* + * GetOldestTransactionIdInCommit() + * + * Similar to GetOldestActiveTransactionId but returns the oldest transaction ID + * that is currently in the commit phase. + */ +TransactionId +GetOldestTransactionIdInCommit(void) If there is no transaction currently in 'commit' phase, this function will then return the next transaction-id. Please mention this in the comments. I think the doc 'protocol-replication.html' should also be updated for the same. 6) + data->flushpos_update_time = 0; Do we really need to reset this 'flushpos_update_time' at the end of wait_for_local_flush()? Even in the next cycle (when the phase restarts from RCI_GET_CANDIDATE_XID), we can reuse this time to decide whether we should call get_flush_position() again or skip it, when in wait_for_local_flush(). 7) +/* + * The remote WAL position that has been applied and flushed locally. We + * record this information while sending feedback to the server and use this + * both while sending feedback and advancing oldest_nonremovable_xid. + */ +static XLogRecPtr last_flushpos = InvalidXLogRecPtr; I think we record/update last_flushpos in wait_for_local_flush() as well. Shall we update comments accordingly? 8) Shall we rename "check_conflict_info_retaintion" to "check_conflict_info_retention" or "check_remote_for_retainconflictinfo"? ('retaintion' is not a correct word) 9) In RetainConflictInfoData, we can keep reply_time along with other remote_* variables. The idea is to separate the variables received in remote's response from the ones purely set and reset by the local node. thanks Shveta
Re: Logical Replication of sequences
On Tue, May 20, 2025 at 8:35 AM Nisha Moond wrote: > > > > > Thanks for the comments, these are handled in the attached v20250516 > > version patch. > > > > Thanks for the patches. Here are my review comments - > > Patch-0004: src/backend/replication/logical/sequencesync.c > Hi, Currently, the behavior of the internal query used to fetch sequence info from the pub is inconsistent and potentially misleading. case1: If a single non-existent sequence is passed (e.g., VALUES ('public','n10')), the query throws an ERROR, so we get error on sub - ERROR: could not receive list of sequence information from the publisher: ERROR: sequence "public.n10" does not exist case2: If multiple non-existent sequences are passed (e.g., VALUES ('public','n8'),('public','n9')), it silently returns zero rows, resulting only in a LOG message instead of an error. LOG: Logical replication sequence synchronization for subscription "subs" - total unsynchronized: 2; batch #1 = 2 attempted, 0 succeeded, 0 mismatched IMO, This inconsistency can be confusing for users. I think we should make the behavior uniform. Either - (a) Raise an error if any/all of the requested sequences are missing on the publisher, or (b) Instead of raising an error, emit a LOG(as is done in case2) and maybe include the count of missing sequences too. I'm fine with either option. -- Thanks, Nisha
Re: plan shape work
On 5/19/25 20:01, Robert Haas wrote: > Hi, > > A couple of people at pgconf.dev seemed to want to know more about my > ongoing plan shape work, so here are the patches I have currently. > This is a long way from something that actually looks like a usable > feature, but these are bits of infrastructure that I think will be > necessary to get to a usable feature. As a recap, my overall goal here > is to make it so that you can examine a finished plan, figure out what > decisions the planner made, and then somehow get the planner to make > those same decisions over again in a future planning cycle. Since > doing this for all types of planner decisions seems too difficult for > an initial goal, I'm focusing on scans and joins for now. A further > goal is that I want it to be possible for extensions to use this > infrastructure to implement a variety of different policies that they > might feel to be beneficial, so I'm looking to minimize the amount of > stuff that has to be done in core PostgreSQL or can only be used by > core PostgreSQL. > > ... Thanks for the overview. I don't have any immediate feedback, but it sounds like it might be related to the "making planner decisions clear" session from the unconference ... The basic premise of that session was about how to give users better info about the planner decisions - why paths were selected/rejected, etc. A simple example would be "why was the index not used", and the possible answers include "dominated by cost by another path" or "does not match the index keys" etc. I wonder if this work might be useful for something like that. regards -- Tomas Vondra
Re: Conflict detection for update_deleted in logical replication
On Tue, May 20, 2025 at 10:24 AM Amit Kapila wrote: > > On Tue, May 20, 2025 at 8:38 AM shveta malik wrote: > > > > Please find few more comments: > > > > 1) > > ProcessStandbyPSRequestMessage: > > + /* > > + * This shouldn't happen because we don't support getting primary status > > + * message from standby. > > + */ > > + if (RecoveryInProgress()) > > + elog(ERROR, "the primary status is unavailable during recovery"); > > > > > > a) This error is not clear. Is it supposed to be user oriented error > > or internal error? As a user, it is difficult to interpret this error > > and take some action. > > > > This is an internal error for developers to understand that they have > sent the wrong message. Do you have any suggestions to change it? The current message is fine if point b) is already taken care of. > > > b) What I understood is that there is no user of enabling > > 'retain_conflict_info' for a subscription which is subscribing to a > > publisher which is physical standby too. So shall we emit such an > > ERROR during 'Create Sub(retain_conflict_info=on)' itself? But that > > would need checking whether the publisher is physical standby or not > > during CREATE-SUB. Is that a possibility? > > > > The 0003 patch already took care of this, see check_conflict_info_retaintion. > Okay, thanks. Missed it somehow during review. thanks Shveta
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
> On 20 May 2025, at 04:29, Tom Lane wrote: > I want to argue for reverting, at least for v18. I do not think that > ProcessGetMemoryContextInterrupt is anywhere near release-quality. > I found out while poking into Valgrind leak reports that it leaks > memory --- and does so in TopMemoryContext. Ugh, that's indeed a showstopper. Attached is a revert of the feature, I want to re-read and re-test before pushing to make sure I didn't miss anything but can go ahead with it later today. -- Daniel Gustafsson 0001-Revert-function-to-get-memory-context-stats-for-proc.patch Description: Binary data
Re: PG 18 release notes draft committed
On Tue, May 13, 2025 at 08:05:24AM +0200, Laurenz Albe wrote: > On Mon, 2025-05-12 at 20:27 -0700, David G. Johnston wrote: > > Should all columns removed from system views and/or catalogs be listed in > > “Migration” or is there some filtering criteria? There are at minimum quite > > a few statistics related ones we’ve dropped that only appear in the Changes > > section (e.g., pg_stat_io, pg_stat_wal). > > I am not sure. > > On the one hand, the catalogs don't promise to be a stable API, so there > would be no need to enumerate such changes as compatibility breaks. > The "Migration" section also doesn't list changes to the exported > PostgreSQL functins, which has bitten me as extension developer several > times. > > On the other hand, the catalogs are described in the documentation, which > gives them more exposure, and it doesn't seem unreasonable to document > breaking changes as well. > > Do you have an idea how many changes there are? If there are not too many, > and somebody is willing to do the work, I wouldn't be against it. First, I apologize for the delay in my replying --- I was on vacation last week. Second, let me explain the criteria I use for table changes, and then we can discuss if the criteria is correct, and whether I followed the criteria accurately for PG 18. So, there are system views and system tables. Most system views are important to users, because we created them mostly for user consumption, while system tables might or might not hold useful information for users. Second, we have three possible changes --- column addition, column renaming, and column removal. And third, we can list the changes in the incompatibility section, or in the main release notes. So, for column additions, I would never list them in the incompatibility section, though it could break SELECT *. For renames and deletes, they would normally appear in the incompatibility section, unless they are system tables that do not normally hold user-helpful information, in which case I might list it in the main section, or not at all. I believe I followed that criteria for PG 18. There might be a few cases in PG 18 where columns used for monitoring were renamed or deleted because they were replaced, and I felt it was too complex to list them in the incompatibility section because there were new features mixed into the process so I listed them in the main section. I thought that was the proper balance. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: strange perf regression with data checksums
On Mon, May 19, 2025 at 8:21 PM Tomas Vondra wrote: > With v2 the regression disappears, both for index-only scans and regular > index scans. I haven't tried anything with bitmap scans - I hit the > regression mostly by accident, on a workload that does IOS only. I may > try constructing something with bitmap scans, but I didn't have time for > that right now. I imagine bitmap index scans will be similar to plain index scans. > I don't know what "fully fix" means in this context. I see a massive > improvement with v2, I have no idea if that's the best we could do. You expected there to be *zero* performance impact from enabling checksums for this workload, since it is a pure read-only workload. That's what I meant by "fully fix". > But you're right - it seems sensitive to how many rows are returned, and > at some point the contention goes away and there's no regression. > > I need to do proper automated testing, to get reliable results. I've > been doing manual testing, but it's easy to make mistakes that way. > > Do you have any suggestions what cases you'd like me to test? Nothing comes to mind. Again, just be aware that we can only completely avoid calling BufferGetLSNAtomic is only possible when: * Scan is an index-only scan OR * Scan is a bitmap index scan OR * Scan is a plain index scan, reading a page that _bt_readpage() returned "false" for when called. In other words, plain index scans that read a lot of tuples might receive no benefit whatsoever. It's possible that it already matters less there anyway. It's also possible that there is some unforeseen benefit from merely *delaying* the call to BufferGetLSNAtomic. But in all likelihood these "unfixed" plain index scans are just as fast with v2 as they are when run on master/baseline. -- Peter Geoghegan
Re: generic plans and "initial" pruning
Amit Langote writes: > Thanks for pointing out the hole in the current handling of > CachedPlan->stmt_list. You're right that the approach of preserving > the list structure while replacing its contents in-place doesn’t hold > up when the rewriter adds or removes statements dynamically. There > might be other cases that neither of us have tried. I don’t think > that mechanism is salvageable. > To address the issue without needing a full revert, I’m considering > dropping UpdateCachedPlan() and removing the associated MemoryContext > dance to preserve CachedPlan->stmt_list structure. Instead, the > executor would replan the necessary query into a transient list of > PlannedStmts, leaving the original CachedPlan untouched. That avoids > mutating shared plan state during execution and still enables deferred > locking in the vast majority of cases. Yeah, I think messing with the CachedPlan is just fundamentally wrong. It breaks the invariant that the executor should not scribble on what it's handed --- maybe not as obviously as some other cases, but it's still not a good design. I kind of feel that we ought to take two steps back and think about what it even means to have a generic plan in this situation. Perhaps we should simply refuse to use that code path if there are prunable partitioned tables involved? > Let me know what you think -- I’ll hold off on posting a revert or a > replacement until we’ve agreed on the path forward. I had not looked at 525392d57 in any detail before (the claim in the commit message that I reviewed it is a figment of someone's imagination). Now that I have, I'm still going to argue for revert. Aside from the points above, I really hate what's been done to the fundamental executor APIs. The fact that ExecutorStart callers have to know about this is as ugly as can be. I also don't like the fact that it's added overhead in cases where there can be no benefit (notice that my test case doesn't even involve a partitioned table). I still like the core idea of deferring locking, but I don't like anything about this implementation of it. It seems like there has to be a better and simpler way. regards, tom lane
Re: Suggestion: Update Copyright Year to 2025 in Recently Added Files
On Sun, May 18, 2025 at 11:34:07PM -0400, Tom Lane wrote: > Michael Paquier writes: > > On Sun, May 18, 2025 at 12:09:23PM +0530, Shaik Mohammad Mujeeb wrote: > >> Just wanted to check - should this be updated to reflect 1996-2025 instead? > > > This process is automated by src/tools/copyright.pl on a yearly-basis, > > but it is possible that holes appear when some code gets committed > > that predates the new year. > > I had the idea that this was part of our pre-branch checklist, > but it was not mentioned there. I added it. I only run it in early January of each year. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Violation of principle that plan trees are read-only
Andres Freund writes: > Afaict the mtstate->ps.plan->targetlist assignment, and the ExecTypeFromTL(), > ExecAssignResultTypeFromTL() before that, are completely superfluous. Am I > missing something? I think you are right. The two tlists are completely identical in most cases, because of this bit in setrefs.c: /* * Set up the visible plan targetlist as being the same as * the first RETURNING list. This is for the use of * EXPLAIN; the executor won't pay any attention to the * targetlist. We postpone this step until here so that * we don't have to do set_returning_clause_references() * twice on identical targetlists. */ splan->plan.targetlist = copyObject(linitial(newRL)); I added a quick check + if (!equal(mtstate->ps.plan->targetlist, + linitial(returningLists))) + elog(WARNING, "not matched targetlist"); to verify that. There's one regression case in which it complains, and that's one where we pruned the first result relation so that linitial(returningLists) is now the second result rel's RETURNING list. But as you say, that should still produce the same tupdesc. I think we can just delete this assignment (and fix these comments). > Wonder if the targetlist assignment is superfluous made me wonder if we would > detect mismatches - and afaict we largely wouldn't. There's basically no > verification in ExecBuildProjectionInfo(). And indeed, adding something very > basic shows: Hmm, seems like an independent issue. regards, tom lane
Re: fixing CREATEROLE
On Wed, Apr 30, 2025 at 4:29 PM Tom Lane wrote: > However, I'd at least like to complain about the fact that > it breaks pg_dumpall, which is surely not expecting anything > but the default behavior. If for any reason the restore is > run under a non-default setting of createrole_self_grant, > there's a potential of creating role grants that were not > there in the source database. Admittedly the damage is > probably limited by the fact that it only applies if the > restoring user has CREATEROLE but not SUPERUSER, which > I imagine is a rare case. But don't we need to add > createrole_self_grant to the set of GUCs that pg_dump[all] > resets in the emitted SQL? I spent some time on this today. As you might imagine, it's quite easy to make pg_dumpall emit SET createrole_self_grant = '', but doing so seems less useful than I had expected. I wonder if I'm missing something important here, so I thought I'd better inquire before proceeding further. As I see it, the core difficulty is that the output of pg_dumpall always contains a CREATE ROLE statement for every single role in the system, even the bootstrap superuser. For starters, that means that if you simply run initdb and create cluster #1, run initdb again and create cluster #2, dump the first and restore to the second, you will get an error, because the same bootstrap superuser will exist in both systems, so when you feed the output of pg_dumpall to psql, you end up trying to create a role that already exists. At this point, my head is already kind of exploding, because I thought we were pretty careful to try to make it so that pg_dump output can be restored without error even in the face of pre-existing objects like the public schema and the plpgsql language, but apparently we haven't applied the same principle to pg_dumpall.[1] But if, as you posit above, we were to try running the output of pg_dumpall through psql as a non-superuser, the problem is a whole lot worse. You can imagine a pg_dumpall feature that only tries to dump (on the source system) roles that the dumping user can administer, and only tries to recreate those roles on the target system, but we haven't got that feature, so we're going to try to recreate every single source role on the target system, including the bootstrap user and the non-superuser who is restoring the dump if they exist on the source side and any other superusers and any other users created by other CREATEROLE superusers and it seems to me that under any set of somewhat-reasonable assumptions you're going to expect a bunch of error messages to start showing up at this point. In short, trying to restore pg_dumpall output as a non-superuser appears to be an unsupported scenario, so the fact that we don't SET createrole_self_grant = '' to cater to it doesn't really seem like a bug to many any more. In fact, I think there's a decent argument that we ought to let the prevailing value of createrole_self_grant take effect in this scenario. One pretty likely scenario, at least as it seems to me, is that the user was superuser on the source system and is not superuser on the target system but wants to recreate the same set of roles. If they want to freely access objects owned by those roles as they could on the source system, then they're going to need self-grants, and we have no better clue than the value of createrole_self_grant to help us figure out whether they want that or not. To state the concern another way, if this is a bug, it should be possible to construct a test case that fails without the patch and passes with the patch. But it appears to me that the only way I could do that is if I programatically edit the dump. And that seems like cheating, because if we are talking about a scenario where the user is editing the dump, they can also add SET createrole_self_grant = '' if desired. I don't want to make it sound like I now hate the idea of doing as you proposed here, because I do see the point of nailing down critical GUCs that can affect the interpretation of SQL statements in places like pg_dumpall output, and maybe we should do that here ... kinda just in case? But I'm not altogether sure that's a sufficient justification, and at any rate I think we need to be clear on whether that *is* the justification or whether there's something more concrete that we're trying to make work. -- Robert Haas EDB: http://www.enterprisedb.com [1] Exception: When --binary-upgrade is used, we emit only ALTER ROLE and not CREATE ROLE for the bootstrap superuser. Why we think the error is only worth avoiding in the --binary-upgrade case is unclear to me.
Re: proposal: schema variables
On Tue, 2025-05-20 at 16:28 -0400, Bruce Momjian wrote: > On Tue, May 20, 2025 at 08:47:36PM +0200, Daniel Gustafsson wrote: > > > On 20 May 2025, at 18:39, Bruce Momjian wrote: > > > My only point is that we should only be using email lists for work that > > > is being actively worked on to be added to community Postgres. There > > > has been talk of a trimmed-down version of this being applied, but I > > > don't see any work in that direction. > > > > > > This patch should be moved to a separate location where perhaps people > > > can subscribe to updates when they are posted, perhaps github. > > > > As a project with no roadmap governed by open forum consensus I don't think > > we > > have any right to tell community members what they can or cannot work on > > here, > > any technical discussion which conforms with our published policies should > > be > > welcome. If Pavel want's to continue rebasing his patchset here then he > > has, > > IMHO, every right to do so. > > > > Whether or not a committer will show interest at some point is another > > thing, > > but we are seeing a very good role-model for taking responsibility for ones > > work here at the very least =) > > Well, we do have a right, e.g., we would not allow someone to repeatedly > post patches for a Postgres extension we don't manage, or the jdbc > driver. I also don't think we would allow someone to continue posting > patches for a feature we have decided to reject, and I think we have > decided to reject the patch in in its current form. I think we might > accept a trimmed-down version, but I don't see the patch moving in that > direction. > > Now, of course, if I am the only one who feels this way, I can suppress > these emails on my end. In my opinion, this patch set is adding something that would be valuable to have in core. If no committer intends to pick it up and commit it, I think the proper action would be to step up and reject the patch set, not complain about the insistence of the author. Yours, Laurenz Albe
Re: Violation of principle that plan trees are read-only
Andres Freund writes: > Largely makes sense, the only thing I see is that the !returningLists branch > does: > /* >* We still must construct a dummy result tuple type, because > InitPlan >* expects one (maybe should change that?). >*/ > mtstate->ps.plan->targetlist = NIL; > which we presumably shouldn't do anymore either. It never changes anything > afaict, but still. D'oh ... I had seen that branch before, but missed fixing it. Yeah, the targetlist will be NIL already, but it's still wrong. >> I'm tempted to back-patch this: the plan tree damage seems harmless at >> present, but maybe it'd become less harmless with future fixes. > There are *some* cases where this changes the explain output, but but the new > output is more correct, I think: > ... > I suspect this is an argument for backpatching, not against - seems that > deparsing could end up creating bogus output in cases where it could matter? > Not sure if such cases are reachable via views (and thus pg_dump) or > postgres_fdw, but it seems possible. I don't believe that we guarantee EXPLAIN output to be 100% valid SQL, so I doubt there's a correctness argument here; certainly it'd not affect pg_dump. I'm curious though: what was the test case you were looking at? regards, tom lane
Re: Re: proposal: schema variables
Hi út 20. 5. 2025 v 18:39 odesílatel Bruce Momjian napsal: > On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote: > > Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian > > escreveu: > > > > I will again ask why this patch set is being reposted when there is > no > > plan to apply it to git master > > > > Too bad. I would love to have this functionality, from the user's point > of view > > there are problems where it would solve them wonderfully. I don't know > > technically of what prevents it from being natively on core, but it > would be > > great, it would definitely be. > > My only point is that we should only be using email lists for work that > is being actively worked on to be added to community Postgres. There > has been talk of a trimmed-down version of this being applied, but I > don't see any work in that direction. > I sent a reduced version a few months ago - from 21 patches to 8 (and it can be reduced to six if we postpone tools for detection ambiguity). The timing was not perfect - the focus was and it is concentrated to finish pg18. I am very sorry if this topic and patches bother anyone. I am afraid if I close it to some personal github, it will not be visible, and I am sure this feature is missing in Postgres. Today we have few workarounds. Some workarounds are not available everywhere, some workarounds cannot be used for security. With integrated solutions some scenarios can be done more easily, more secure, faster, more comfortable. It is true, so mentioned scenarios are not "hot" today. Stored procedures or RLS or migration procedures from other databases are not extra common. But who uses it, then he misses session variables. This topic is difficult, because there is no common solution. SQL/PSM is almost dead. T-SQL (and MySQL) design is weak and cannot be used for security. Oracle's design is joined with just one environment. And although almost all widely used databases have supported session variables for decades, no one design is perfect. Proposed design is not perfect too (it introduces possible ambiguity) , but I think it can support most wanted use cases (can be enhanced in future), and it is consistent with Postgres. There are more ways to reduce risk of unwanted ambiguity to zero. But it increases the size of the patch. Regards Pavel > > This patch should be moved to a separate location where perhaps people > can subscribe to updates when they are posted, perhaps github. > > -- > Bruce Momjian https://momjian.us > EDB https://enterprisedb.com > > Do not let urgent matters crowd out time for investment in the future. >
Re: proposal: schema variables
On Tue, May 20, 2025 at 08:47:36PM +0200, Daniel Gustafsson wrote: > > On 20 May 2025, at 18:39, Bruce Momjian wrote: > > My only point is that we should only be using email lists for work that > > is being actively worked on to be added to community Postgres. There > > has been talk of a trimmed-down version of this being applied, but I > > don't see any work in that direction. > > > > This patch should be moved to a separate location where perhaps people > > can subscribe to updates when they are posted, perhaps github. > > As a project with no roadmap governed by open forum consensus I don't think we > have any right to tell community members what they can or cannot work on here, > any technical discussion which conforms with our published policies should be > welcome. If Pavel want's to continue rebasing his patchset here then he has, > IMHO, every right to do so. > > Whether or not a committer will show interest at some point is another thing, > but we are seeing a very good role-model for taking responsibility for ones > work here at the very least =) Well, we do have a right, e.g., we would not allow someone to repeatedly post patches for a Postgres extension we don't manage, or the jdbc driver. I also don't think we would allow someone to continue posting patches for a feature we have decided to reject, and I think we have decided to reject the patch in in its current form. I think we might accept a trimmed-down version, but I don't see the patch moving in that direction. Now, of course, if I am the only one who feels this way, I can suppress these emails on my end. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Violation of principle that plan trees are read-only
Hi, On 2025-05-20 16:18:57 -0400, Tom Lane wrote: > Andres Freund writes: > >> I'm tempted to back-patch this: the plan tree damage seems harmless at > >> present, but maybe it'd become less harmless with future fixes. > > > There are *some* cases where this changes the explain output, but but the > > new > > output is more correct, I think: > > ... > > I suspect this is an argument for backpatching, not against - seems that > > deparsing could end up creating bogus output in cases where it could matter? > > Not sure if such cases are reachable via views (and thus pg_dump) or > > postgres_fdw, but it seems possible. > > I don't believe that we guarantee EXPLAIN output to be 100% valid SQL, > so I doubt there's a correctness argument here; certainly it'd not > affect pg_dump. I wasn't thinking of EXPLAIN itself, but was wondering whether it's possible to create a view, rule or such that is affected by the output change. Would be a weird case, if it existed. > I'm curious though: what was the test case you were looking at? It's a modified query from our regression tests, I had added some debugging to find cases where the targetlists differed. I attached the extracted, somewhat modified, sql script. Greetings, Andres Freund repro-dml-returning.sql Description: application/sql
Re: Violation of principle that plan trees are read-only
On Mon, May 19, 2025 at 08:41:06AM -0400, Isaac Morland wrote: > I assume this question has an obvious negative answer, but why can't we > attach const declarations to the various structures that make up the plan > tree (at all levels, all the way down)? I know const doesn't actually > prevent a value from changing, but at least the compiler would complain if > code accidentally tried. What you want is for C to have a type attribute that denotes immutability all the way down. `const` doesn't do that. One thing that could be done is to write a utility that creates const-all-the-way-down clones of given types, but such a tool can't be written as C pre-processor macros -- it would have to be a tool that parses the actual type definitions, or uses DWARF or similar to from the compilation of a file (I've done this latter before, but this weds you to compilers that output DWARF, which MSVC doesn't, for example). Really, the C committee ought to add this at some point, darn it. It would be the opposite of Rust's &mut. Nico --
Re: Join removal and attr_needed cleanup
Given that some time has passed I'd like to raise this again. Any chance of this making it back to 16 to fix the regression? We'll be using 16 for the better part of a year still and we're having to resort to some annoying workarounds for this. Regards, Bennie On 2024/11/10 18:17, Tom Lane wrote: Bennie Swart writes: -- join elimination fails -- expect both b and c to be eliminated, but b remains explain (costs off) select a.* from foo a left join foo b on (b.id1, b.id2) = (a.id1, a.id2) left join foo c on (c.id1, c.id2) = (a.id1, b.id2); -- ^^ This does work in HEAD, presumably as a consequence of a3179ab69: regression=# explain (costs off) select a.* from foo a left join foo b on (b.id1, b.id2) = (a.id1, a.id2) left join foo c on (c.id1, c.id2) = (a.id1, b.id2); QUERY PLAN --- Seq Scan on foo a (1 row) I think it's still too soon to consider back-patching that though, since it's only been in the tree for six weeks. regards, tom lane
Re: Violation of principle that plan trees are read-only
I wrote: > [ confused... ] I get the same output from that script with or > without the patch. Oh, scratch that: I'd gotten confused about which branch I was working in. It does change the output as you say. I still think it's irrelevant to view display, including pg_dump, though. Those operations work from a parse tree not a plan tree. regards, tom lane
Re: Disable parallel query by default
On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote: > On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote: > > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote: > > > On Tue, May 13, 2025 at 4:37 PM Scott Mead wrote: > > > > I'll open by proposing that we prevent the planner from automatically > > > > selecting parallel plans by default > > > > > > That seems a pretty heavy hammer, when we have things like > > > parallel_setup_cost that should be tweaked first. > > > > I agree it's a big hammer and I thought through parallel_setup_cost > > quite a bit myself. The problem with parallel_setup_cost is that it > > doesn't actually represent the overhead of a setting up parallel > > query for a busy system. It does define the cost of setup for a > > *single* parallel session, but it cannot accurately express the > > cost of CPU and other overhead associated with the second, third, > > fourth, etc... query that is executed as parallel. The expense to > > the operating system is a function of the _rate_ of parallel query > > executions being issued. Without new infrastructure, there's no way > > to define something that will give me a true representation of the > > cost of issuing a query with parallelism. > > There is no way for the optimizer to represent that your system is > under CPU overload currently. But I agree with Greg that > parallel_setup_cost is the setting that should be adjusted. > If PostgreSQL is more reluctant to even start considering a parallel plan, > that would be a move in the right direction in a case like this: > > > > > What is the fallout? When a high-volume, low-latency query flips to > > > > parallel execution on a busy system, we end up in a situation where > > > > the database is effectively DDOSing itself with a very high rate of > > > > connection establish and tear-down requests. Even if the query ends > > > > up being faster (it generally does not), the CPU requirements for the > > > > same workload rapidly double or worse, with most of it being spent > > > > in the OS (context switch, fork(), destroy()). When looking at the > > > > database, you'll see a high load average, and high wait for CPU with > > > > very little actual work being done within the database. > > You are painting a bleak picture indeed. I get to see PostgreSQL databases > in trouble regularly, but I have not seen anything like what you describe. > If a rather cheap, very frequent query is suddenly estimated to be > expensive enough to warrant a parallel plan, I'd suspect that the estimates > must be seriously off. > > With an argument like that, you may as well disable nested loop joins. > I have seen enough cases where disabling nested loop joins, without any > deeper analysis, made very slow queries reasonably fast. My argument is that parallel query should not be allowed to be invoked without user intervention. Yes, nestedloop can have a similar impact, but let's take a look at the breakdown at scale of PQ: 1. pgbench -i -s 100 2. Make a query that will execute in parallel SELECT aid, a.bid, bbalance FROM pgbench_accounts a, pgbench_branches b WHERE a.bid = b.bid ORDER BY bbalance desc; Non Parallel query = 4506.559 ms Parallel query = 2849.073 Arguably, much better. 3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10 4. The parallel query was executing ~ 2.8 seconds in isolation, but when running with 10 concurrent sessions, breaks down to 5.8 seconds the non-parallel version executes on average of 5.5 seconds. You've completely erased the gains and only have a concurrency of 5 (that's with max_parallel_workers = 8). If you increase max_parallel_workers, this quickly becomes worse. Even though parallel query is faster in isolation, even a small amount of concurrency has a quickly compounding effect the degrades very quickly (again, defaults with a 16 processor machine). Concurrency - Non Parallel Runtime - Parallel Runtime 1- 5003.951 - 3681.452 5- 4936.565 - 4565.171 10- 5573.239 - 5894.397 15 - 6224.292 - 8470.982 20 - 5632.948 - 13277.857 Even with max_parallel_workers protecting us with '8' (default), we erase our advantage by the time we go to concurrency of 5 clients. Going back to the original commit which enabled PQ by default[1], it was done so that the feature would be tested during beta. I think it's time that we limit the accidental impact this can have to users by disabling the feature by default. [1]- https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85 " Enable parallel query by default. Change max_parallel_degree default from 0 to 2. It is possible that this is not a good idea, or that we should g
Re: proposal: schema variables
On Tue, May 20, 2025 at 10:36:54PM +0200, Laurenz Albe wrote: > On Tue, 2025-05-20 at 16:28 -0400, Bruce Momjian wrote: > > Well, we do have a right, e.g., we would not allow someone to repeatedly > > post patches for a Postgres extension we don't manage, or the jdbc > > driver. I also don't think we would allow someone to continue posting > > patches for a feature we have decided to reject, and I think we have > > decided to reject the patch in in its current form. I think we might > > accept a trimmed-down version, but I don't see the patch moving in that > > direction. > > > > Now, of course, if I am the only one who feels this way, I can suppress > > these emails on my end. > > In my opinion, this patch set is adding something that would be valuable to > have in core. > > If no committer intends to pick it up and commit it, I think the proper > action would be to step up and reject the patch set, not complain about the > insistence of the author. Are you saying I should not complain until we have officially rejected the patch set? If we officially reject it, the patch author would no longer post it? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Re: proposal: schema variables
On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote: > út 20. 5. 2025 v 18:39 odesílatel Bruce Momjian napsal: > I sent a reduced version a few months ago - from 21 patches to 8 (and it can > be > reduced to six if we postpone tools for detection ambiguity). > The timing was not perfect - the focus was and it is concentrated to finish > pg18. It was not clear to me that this patch set was being reduced to make it more likely it would be accepted? I thought it was the same patch set since 20??. > I am very sorry if this topic and patches bother anyone. I am afraid if I > close > it to some personal github, it will not be visible, and I am sure this > feature is missing in Postgres. Today we have few workarounds. Some > workarounds > are not available everywhere, some workarounds cannot > be used for security. With integrated solutions some scenarios can be done > more > easily, more secure, faster, more comfortable. It is true, so > mentioned scenarios are not "hot" today. Stored procedures or RLS or migration > procedures from other databases are not extra common. But > who uses it, then he misses session variables. Understood. If people feel it is progressing toward acceptance, I certainly withdraw my objection and apologize. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: Violation of principle that plan trees are read-only
Andres Freund writes: > On 2025-05-20 16:18:57 -0400, Tom Lane wrote: >> I'm curious though: what was the test case you were looking at? > It's a modified query from our regression tests, I had added some debugging to > find cases where the targetlists differed. I attached the extracted, somewhat > modified, sql script. [ confused... ] I get the same output from that script with or without the patch. regards, tom lane
Re: Avoid orphaned objects dependencies, take 3
On Mon, 2025-05-19 at 14:07 -0400, Robert Haas wrote: > I agree with that, but I think that it may also be error-prone to > assume that it's OK to acquire heavyweight locks on other catalog > objects at any place in the code where we record a dependency. I will > not be surprised at all if that turns out to have some negative > consequences. For example, I think it might result in acquiring the > locks on those other objects at a subtly wrong time (leading to race > conditions) or acquiring two locks on the same object with different > lock modes where we should really only acquire one. I'm all in favor > of solving this problem using heavyweight locks, but I think that > implicitly acquiring them as a side effect of recording dependencies > feels too surprising. I see what you mean now, in the sense that other code that calls LockDatabaseObject (and other variants of LockAcquire) are mostly higher-level operations like AlterPublication(), and not side-effects of something else. But relation_open() is sort of an exception. There are lots of places where that takes a lock because we happen to want something out of the relcache, like generate_partition_qual() taking a lock on the parent or CheckAttributeType() taking a lock on the typrelid. You could say those are fairly obvious, but that's because we already know, and we could make it more widely known that recording a dependency takes a lock. One compromise might be to have recordDependencyOn() take a LOCKMODE parameter, which would both inform the caller that a lock will be taken, and allow the caller to do it their own way and specify NoLock if necessary. That still results in a huge diff, but the end result would not be any more complex than the current code. Regards, Jeff Davis
Re: Violation of principle that plan trees are read-only
On 20/5/25 22:43, Nico Williams wrote: [snip] What you want is for C to have a type attribute that denotes immutability all the way down. `const` doesn't do that. One thing that could be done is to write a utility that creates const-all-the-way-down clones of given types, but such a tool can't be written as C pre-processor macros -- it would have to be a tool that parses the actual type definitions, or uses DWARF or similar to from the compilation of a file (I've done this latter before, but this weds you to compilers that output DWARF, which MSVC doesn't, for example). Really, the C committee ought to add this at some point, darn it. It would be the opposite of Rust's &mut. Like C++'s const specifier, specially const references to objects? This is actually natively compatible with C code, "just" by throwing extern "C" around. https://en.cppreference.com/w/cpp/language/cv (most of Postgres' code is already "Object-oriented in C" in my view...) The fact that the C++ compiler is usually able to optimize deeper/better than a C one due to aggresive inlining+global optimization and inferred "strict"ness doesn't hurt either :) My €.02. HTH. / J.L. -- Parkinson's Law: Work expands to fill the time alloted to it.
Re: Assert("vacrel->eager_scan_remaining_successes > 0")
(CC'ed to Melanie) On Fri, May 2, 2025 at 11:59 AM Masahiko Sawada wrote: > > Hi, > > I hit the assertion failure in the subject line with the following script: > > create table t (a int) with (autovacuum_enabled = off); > insert into t select generate_series(1, 1_000_000); > vacuum t; > insert into t select generate_series(1, 1_000_000); > set vacuum_freeze_min_age to 0; > vacuum t; > > When the success count reaches to 0, we disable the eager scan by > resetting related fields as follows: > > /* > * If we hit our success cap, permanently disable eager > * scanning by setting the other eager scan management > * fields to their disabled values. > */ > vacrel->eager_scan_remaining_fails = 0; > vacrel->next_eager_scan_region_start = InvalidBlockNumber; > vacrel->eager_scan_max_fails_per_region = 0; > > However, there is a possibility that we have already eagerly scanned > another page and returned it to the read stream before we freeze the > eagerly-scanned page and disable the eager scan. In this case, the > next block that we retrieved from the read stream could also be an > eagerly-scanned page. I've added it to Open Items for v18. If I understand correctly, there's a potential scenario where we might eagerly scan more pages than permitted by the success and failure caps. One question is: what is the practical magnitude of this excess scanning? If the overflow could be substantial (for instance, eagerly scanning 30% of table pages), we should consider revising our eager scanning mechanism. One potential solution would be to implement a counter tracking the number of eagerly scanned but unprocessed pages. This counter could then inform the decision-making process in find_next_unskippable_block() regarding whether to proceed with eager scanning of additional pages. Alternatively, if the excess scanning proves negligible in practice, we could adopt a simpler solution by allowing vacrel->eager_scan_remaining_successes to accept negative values and removing the assertion in question. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Hi Alexander, Thank you very much for the review! > The patchset doesn't seem to build after 371f2db8b0, which adjusted > the signature of the INJECTION_POINT() macro. Could you, please, > update the patchset accordingly. I've updated the patch (see attached). Thanks. > I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN() > before slots synchronization then use it for WAL truncation. > Generally looks good. But what about the "if > (InvalidateObsoleteReplicationSlots(...))" branch? It calls > XLogGetReplicationSlotMinimumLSN() again. Why would the value > obtained from the latter call reflect slots as they are synchronized > to the disk? In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old behaviour - this function was called in KeepLogSeg prior to my change. I also call CheckPointReplicationSlots at the next line to save the invalidated and other dirty slots on disk again to make sure, the new oldest LSN is in sync. The problem I tried to solve in this if-branch is to fix test src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was not truncated enought for the test to pass ok. In general, this branch is not necessary and we may fix the test by calling checkpoint twice (please, see the alternative.rej patch for this case). If you think, we should incorporate this new change, I'm ok to do it. But the WAL will be truncating more lazily. Furthermore, I think we can save slots on disk right after invalidation, not in CheckPointGuts to avoid saving invalidated slots twice. With best regards, Vitaly From 41eed2a90d68f4d9ac1ee3d00c89879358d19fd1 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 21 Nov 2024 23:07:22 +0100 Subject: [PATCH 2/5] Add TAP test to check logical repl slot advance during checkpoint The test verifies that logical replication slot is still valid after immediate restart on checkpoint completion in case when the slot was advanced during checkpoint. Original patch by: Tomas Vondra Modified by: Vitaly Davydov Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497 --- src/test/modules/Makefile | 4 +- src/test/modules/meson.build | 1 + .../test_replslot_required_lsn/Makefile | 18 +++ .../test_replslot_required_lsn/meson.build| 15 +++ .../t/001_logical_slot.pl | 124 ++ 5 files changed, 160 insertions(+), 2 deletions(-) create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build create mode 100644 src/test/modules/test_replslot_required_lsn/t/001_logical_slot.pl diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile index aa1d27bbed3..53d3dd8e0ed 100644 --- a/src/test/modules/Makefile +++ b/src/test/modules/Makefile @@ -46,9 +46,9 @@ SUBDIRS = \ ifeq ($(enable_injection_points),yes) -SUBDIRS += injection_points gin typcache +SUBDIRS += injection_points gin typcache test_replslot_required_lsn else -ALWAYS_SUBDIRS += injection_points gin typcache +ALWAYS_SUBDIRS += injection_points gin typcache test_replslot_required_lsn endif ifeq ($(with_ssl),openssl) diff --git a/src/test/modules/meson.build b/src/test/modules/meson.build index 9de0057bd1d..ac0dbd1f10f 100644 --- a/src/test/modules/meson.build +++ b/src/test/modules/meson.build @@ -43,3 +43,4 @@ subdir('typcache') subdir('unsafe_tests') subdir('worker_spi') subdir('xid_wraparound') +subdir('test_replslot_required_lsn') diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile new file mode 100644 index 000..e5ff8af255b --- /dev/null +++ b/src/test/modules/test_replslot_required_lsn/Makefile @@ -0,0 +1,18 @@ +# src/test/modules/test_replslot_required_lsn/Makefile + +EXTRA_INSTALL=src/test/modules/injection_points \ + contrib/test_decoding + +export enable_injection_points +TAP_TESTS = 1 + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_replslot_required_lsn +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build new file mode 100644 index 000..999c16201fb --- /dev/null +++ b/src/test/modules/test_replslot_required_lsn/meson.build @@ -0,0 +1,15 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +tests += { + 'name': 'test_replslot_required_lsn', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'tap': { +'env': { + 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no', +}, +'tests': [ + 't/001_logical_slot.pl' +], + }, +} diff --git a/src/test/modules/test_replslot_required_ls
Re: Violation of principle that plan trees are read-only
Hi, On 2025-05-20 10:59:22 -0400, Andres Freund wrote: > On 2025-05-18 19:31:33 -0400, Tom Lane wrote: > > While chasing down Valgrind leakage reports, I was disturbed > > to realize that some of them arise from a case where the > > executor scribbles on the plan tree it's given, which it is > > absolutely not supposed to do: > > > > /* > > * Initialize result tuple slot and assign its rowtype using the > > first > > * RETURNING list. We assume the rest will look the same. > > */ > > mtstate->ps.plan->targetlist = (List *) linitial(returningLists); > > > > A bit of git archaeology fingers Andres' commit 4717fdb14, which we > > can't easily revert since he later got rid of ExecAssignResultType > > altogether. But I think we need to do something about it --- it's > > purest luck that this doesn't cause serious problems in some cases. > > I have no idea what I was smoking at that time, this clearly is wrong. > > I think the reason it doesn't cause problems is that we're just using the > first child's targetlist every time and we're just assigning the same value > back every time. > > > What's even more absurd is: Why do we even need to assign the result type at > all? Before & after 4717fdb14. The planner ought to have already figured this > out, no? > > It seems that if not, we'd have a problem anyway, who says the "calling" nodes > (say a wCTE) can cope with whatever output we come up with? > > Except of course, there is exactly one case in our tests where the tupledescs > aren't equal :( That's a harmless difference, as it turns out. The varnos / varattnos differ, but that's fine, because we don't actually build the projection with that tlist, we just use to allocate the slot, and that doesn't care about the tlist. The building of the projection uses the specific child's returning list. Afaict the mtstate->ps.plan->targetlist assignment, and the ExecTypeFromTL(), ExecAssignResultTypeFromTL() before that, are completely superfluous. Am I missing something? Wonder if the targetlist assignment is superfluous made me wonder if we would detect mismatches - and afaict we largely wouldn't. There's basically no verification in ExecBuildProjectionInfo(). And indeed, adding something very basic shows: --- /home/andres/src/postgresql/src/test/regress/expected/merge.out 2025-04-06 22:54:14.078394968 -0400 +++ /srv/dev/build/postgres/m-dev-assert/testrun/regress/regress/results/merge.out 2025-05-20 11:51:51.549525728 -0400 @@ -2653,20 +2653,95 @@ MERGE into measurement m USING new_measurement nm ON (m.city_id = nm.city_id and m.logdate=nm.logdate) WHEN MATCHED AND nm.peaktemp IS NULL THEN DELETE WHEN MATCHED THEN UPDATE SET peaktemp = greatest(m.peaktemp, nm.peaktemp), unitsales = m.unitsales + coalesce(nm.unitsales, 0) WHEN NOT MATCHED THEN INSERT (city_id, logdate, peaktemp, unitsales) VALUES (city_id, logdate, peaktemp, unitsales); +WARNING: type mismatch: resno 1: slot type 0 vs expr type 23 +WARNING: TargetEntry: +DETAIL: {TARGETENTRY + :expr + {VAR + :varno -1 + :varattno 3 + :vartype 23 + :vartypmod -1 + :varcollid 0 + :varnullingrels (b) + :varlevelsup 0 + :varreturningtype 0 + :varnosyn 2 + :varattnosyn 1 + :location 384 + } + :resno 1 + :resname city_id + :ressortgroupref 0 + :resorigtbl 0 + :resorigcol 0 + :resjunk false + } + +WARNING: type mismatch: resno 2: slot type 23 vs expr type 1082 +WARNING: TargetEntry: +DETAIL: {TARGETENTRY + :expr + {VAR + :varno -1 + :varattno 4 + :vartype 1082 + :vartypmod -1 + :varcollid 0 + :varnullingrels (b) + :varlevelsup 0 + :varreturningtype 0 + :varnosyn 2 + :varattnosyn 2 + :location 393 + } + :resno 2 + :resname logdate + :ressortgroupref 0 + :resorigtbl 0 + :resorigcol 0 + :resjunk false + } + +WARNING: type mismatch: resno 3: slot type 1082 vs expr type 23 +WARNING: TargetEntry: +DETAIL: {TARGETENTRY + :expr + {VAR + :varno -1 + :varattno 1 + :vartype 23 + :vartypmod -1 + :varcollid 0 + :varnullingrels (b) + :varlevelsup 0 + :varreturningtype 0 + :varnosyn 2 + :varattnosyn 3 + :location 402 + } + :resno 3 ... Greetings, Andres Freund
Re: Re: proposal: schema variables
On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote: > Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian > escreveu: > > I will again ask why this patch set is being reposted when there is no > plan to apply it to git master > > Too bad. I would love to have this functionality, from the user's point of > view > there are problems where it would solve them wonderfully. I don't know > technically of what prevents it from being natively on core, but it would be > great, it would definitely be. My only point is that we should only be using email lists for work that is being actively worked on to be added to community Postgres. There has been talk of a trimmed-down version of this being applied, but I don't see any work in that direction. This patch should be moved to a separate location where perhaps people can subscribe to updates when they are posted, perhaps github. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: plan shape work
+1, this seems like it could be very useful. A somewhat related issue is being able to tie plan nodes back to the query text: it can be hard to understand the planner's decisions if it's not even clear what part of the query it's making decisions about. I'm sure this is not an easy problem in general, but I wonder if you think that could be improved in the course of this work, or if you have other thoughts about it. Thanks, Maciek
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
On Mon, May 19, 2025 at 2:05 AM Amit Kapila wrote: > > On Sun, May 18, 2025 at 1:09 AM Masahiko Sawada wrote: > > > > On Sat, May 10, 2025 at 7:08 AM Amit Kapila wrote: > > > > > > > > > Can we have a parameter like immediately_reserve in > > > create_logical_slot API, similar to what we have for physical slots? > > > We need to work out the details, but that should address the kind of > > > use case you are worried about, unless I am missing something. > > > > Interesting idea. One concern in my mind is that in the use case I > > mentioned above, users would need to carefully manage the extra > > logical slot to keep the logical decoding active. The logical decoding > > is deactivated on the standby as soon as users drop all logical slots > > on the primary. > > > > Yes, but the same is true for a physical slot in the case of physical > replication used via primary_slot_name parameter. Could you elaborate on this? IIUC the purpose of using a physical slot in a physical replication case is obvious; users don't want to lose WAL files necessary for replication. On the other hand, this empty logical slot needs to be maintained just for keeping the logical decoding active. > > > Also, with this idea of automatically increasing WAL level, do we want > > to keep the 'logical' WAL level? If so, it requires an extra step of > > creating a non-reserved logical slot on the primary in order for the > > standby to activate the logical decoding. On the other hand, we can > > also keep the 'logical' WAL level for the compatibility and for making > > the logical decoding enabled without the coordination of WAL level > > transition. > > Right, I also feel we should retain both ways to enable logical > replication at least initially. Once we get some feedback, we may > think of removing 'logical' as wal_level. > > > But wal_level GUC parameter would no longer tell the > > actual WAL level to users when 'replica' + logical slots. > > > > Right. > > > Is it > > sufficient to provide a read-only GUC parameter, say > > effective_wal_level showing the actual WAL level being used? > > > > I am not so sure about how we want to communicate this to the user, > but I guess to start with, this is a good idea. I recently had a discussion with Ashtosh at PGConf.dev regarding an alternative approach: introducing a new command syntax such as "ALTER SYSTEM UPDATE wal_level TO 'logical'". In his presentation[1], he outlined this proposed command as a means to modify specific GUC parameters synchronously. The backend executing this command would manage the transition, allowing users to interrupt the process via Ctrl-C if necessary. In the specific context of wal_level change, this command could be designed to reject operations like "ALTER SYSTEM UPDATE wal_level TO 'minimal'" with an error, effectively preventing undesirable wal_level transitions to or from 'minimal'. While this approach shares similarities with our previous proposal of implementing a dedicated SQL function for WAL level modifications, it offers a more standardized interface for users. Though I find merit in this proposal, I remain uncertain about its implementation details and whether it represents the optimal solution for online wal_level changes, particularly given that our current approach of automatic WAL level adjustment appears viable. Ashtosh plans to initiate a separate discussion thread where we can explore these considerations in greater detail. Regards, [1] https://www.pgevents.ca/events/pgconfdev2025/schedule/session/286-changing-shared_buffers-on-the-fly/ -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: queryId constant squashing does not support prepared statements
> On Tue, May 20, 2025 at 06:30:25AM GMT, Michael Paquier wrote: > On Mon, May 12, 2025 at 06:40:43PM -0400, Sami Imseih wrote: > > Also, LocationExpr is not really an expression node, but a wrapper to > > an expression node, so I think it's wrong to define it as a Node and be > > required to add the necessary handling for it in nodeFuncs.c. I think we > > can just define it as a struct in gram.y so it can carry the locations of > > the > > expression and then set the List of the location boundaries in > > A_Expr and A_ArrayExpr. right? > > Right. LocationExpr is not a full Node, so if we can do these > improvements without it we have less maintenance to worry about across > the board with less code paths. At the end, I think that we should > try to keep the amount of work done by PGSS as minimal as possible. > > I was a bit worried about not using a Node but Sami has reminded me > last week that we already have in gram.y the concept of using some > private structures to track intermediate results done by the parsing > that we sometimes do not want to push down to the code calling the > parser. If we can do the same, the result could be nicer. I believe it's worth to not only to keep amount of work to support LocationExpr as minimal as possible, but also impact on the existing code. What I see as a problem is keeping such specific information as the location boundaries in such a generic expression as A_Expr, where it will almost never be used. Do I get it right, you folks are ok with that? At the same time AFAICT there isn't much more code paths to worry about in case of a LocationExpr as a node -- in the end all options would have to embed the location information into ArrayExpr during transformation, independently from how this information was conveyed. Aside that the only extra code we've got is node functions (exprType, etc). Is there anything I'm missing here? > By the way, the new test cases for ARRAY lists are sent in the last > patch of the series posted on this thread: > https://www.postgresql.org/message-id/7zbzwk4btnxoo4o3xbtzefoqvht54cszjj4bol22fmej5nmgkf@dbcn4wtakw4y > > These should be first in the list, IMO, so as it is possible to track > what the behavior was before the new logic as of HEAD, and what the > behavior would become after the new logic. Sure, I can reshuffle that. BTW, I'm going to be away for a couple of weeks soon. So if you want to decide one way or another soonish, let's do it now.
Re: Statistics Import and Export
Ah ya, forgot that reltuples are not always accurate. This sounds reasonable to me. On Mon, May 19, 2025 at 2:32 PM Nathan Bossart wrote: > On Mon, May 19, 2025 at 02:13:45PM -0700, Hari Krishna Sunder wrote: > > I think it would be better to revert 9879105 since there can be a > > considerable number of true empty tables that we don´t need to process. > > I'm not sure that's a use-case we really need to optimize. Even with > 100,000 empty tables, "vacuumdb --analyze-only --missing-stats-only --jobs > 64" completes in ~5.5 seconds on my laptop. Plus, even if reltuples is 0, > there might actually be rows in the table, in which case analyzing it will > produce rows in pg_statistic. > > -- > nathan >
Re: proposal: schema variables
> On 20 May 2025, at 18:39, Bruce Momjian wrote: > > On Tue, May 20, 2025 at 01:33:18PM -0300, Marcos Pegoraro wrote: >> Em ter., 20 de mai. de 2025 às 11:56, Bruce Momjian >> escreveu: >> >>I will again ask why this patch set is being reposted when there is no >>plan to apply it to git master >> >> Too bad. I would love to have this functionality, from the user's point of >> view >> there are problems where it would solve them wonderfully. I don't know >> technically of what prevents it from being natively on core, but it would be >> great, it would definitely be. > > My only point is that we should only be using email lists for work that > is being actively worked on to be added to community Postgres. There > has been talk of a trimmed-down version of this being applied, but I > don't see any work in that direction. > > This patch should be moved to a separate location where perhaps people > can subscribe to updates when they are posted, perhaps github. As a project with no roadmap governed by open forum consensus I don't think we have any right to tell community members what they can or cannot work on here, any technical discussion which conforms with our published policies should be welcome. If Pavel want's to continue rebasing his patchset here then he has, IMHO, every right to do so. Whether or not a committer will show interest at some point is another thing, but we are seeing a very good role-model for taking responsibility for ones work here at the very least =) -- Daniel Gustafsson
Re: Hash table scans outside transactions
Ashutosh Bapat wrote 2023-03-28 15:58: Bumping it to attract some attention. On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat wrote: Hi, Hash table scans (seq_scan_table/level) are cleaned up at the end of a transaction in AtEOXact_HashTables(). If a hash seq scan continues beyond transaction end it will meet "ERROR: no hash_seq_search scan for hash table" in deregister_seq_scan(). That seems like a limiting the hash table usage. Our use case is 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan repeat above steps in an infinite loop. Note that we do not add/modify/delete entries in step 2. We can't use linked lists since the entries need to be updated or deleted using hash keys. Because the hash seq scan is cleaned up at the end of the transaction, we encounter error in the 3rd step. I don't see that the actual hash table scan depends upon the seq_scan_table/level[] which is cleaned up at the end of the transaction. I have following questions 1. Is there a way to avoid cleaning up seq_scan_table/level() when the transaction ends? 2. Is there a way that we can use hash table implementation in PostgreSQL code for our purpose? -- Best Wishes, Ashutosh Bapat Hi! I tried to resend this thread directly to myself, but for some reason it ended up in the whole hackers list.. I thought I'd chime in on this topic since it hasn't really been discussed anywhere else (or maybe I missed it). I've attached two patches: the first one is a little test extension to demonstrate the problem (just add "hash_test" to "shared_preload_libraries"), and the second is a possible solution. Basically, the idea is that we don't reset the scan counter if we find scans that started outside of the current transaction at the end. I have to admit, though, that I can't immediately say what other implications this might have or what else we need to watch out for if we try this. Maybe any thoughts on that? regards, Aidar Imamov diff --git a/contrib/Makefile b/contrib/Makefile index 2f0a88d3f77..b4ff1391387 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -21,6 +21,7 @@ SUBDIRS = \ earthdistance \ file_fdw \ fuzzystrmatch \ + hash_test \ hstore \ intagg \ intarray \ diff --git a/contrib/hash_test/Makefile b/contrib/hash_test/Makefile new file mode 100644 index 000..a4ce6692127 --- /dev/null +++ b/contrib/hash_test/Makefile @@ -0,0 +1,18 @@ +# contrib/hash_test/Makefile + +MODULE_big = hash_test +OBJS = \ + hash_test.o + +PGFILEDESC = "hash_test - demostrate hash_seq_search and transaction issue" + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/hash_test +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/hash_test/hash_test.c b/contrib/hash_test/hash_test.c new file mode 100644 index 000..7c9e24f40cc --- /dev/null +++ b/contrib/hash_test/hash_test.c @@ -0,0 +1,73 @@ + +#include "postgres.h" + +#include "access/xact.h" +#include "fmgr.h" +#include "miscadmin.h" +#include "postmaster/bgworker.h" +#include "utils/hsearch.h" + +#define HASH_TEST_TABLE_NELEMS 2 + +PG_MODULE_MAGIC; + +PGDLLEXPORT void hash_test_workhorse(Datum main_arg); + +static HTAB *hash_test_table; + +typedef struct +{ + /* key */ + int num_key; +} hash_test_table_entry; + +void +_PG_init(void) +{ + BackgroundWorker worker; + + if (!process_shared_preload_libraries_in_progress) + return; + + MemSet(&worker, 0, sizeof(BackgroundWorker)); + worker.bgw_flags = BGWORKER_SHMEM_ACCESS | + BGWORKER_BACKEND_DATABASE_CONNECTION; + worker.bgw_start_time = BgWorkerStart_ConsistentState; + worker.bgw_restart_time = BGW_NEVER_RESTART; + sprintf(worker.bgw_library_name, "hash_test"); + sprintf(worker.bgw_function_name, "hash_test_workhorse"); + sprintf(worker.bgw_name, "hash_test proccess"); + + RegisterBackgroundWorker(&worker); +} + +void +hash_test_workhorse(Datum main_arg) +{ + hash_test_table_entry *h_entry; + HASHCTL ctl; + HASH_SEQ_STATUS hs; + + BackgroundWorkerInitializeConnection(NULL, NULL, 0); + + ctl.keysize = sizeof(int); + ctl.entrysize = sizeof(hash_test_table_entry); + + hash_test_table = hash_create("hash_test_table", + HASH_TEST_TABLE_NELEMS, + &ctl, + HASH_ELEM | HASH_BLOBS); + + /* insert elements */ + for (int i = 0; i < HASH_TEST_TABLE_NELEMS; i++) + hash_search(hash_test_table, &i, HASH_ENTER, NULL); + + /* go through hash table */ + hash_seq_init(&hs, hash_test_table); + while ((h_entry = hash_seq_search(&hs)) != NULL) + { + StartTransactionCommand(); + /* do some stuff */ + CommitTransactionCommand(); + } +} diff --git a/contrib/hash_test/meson.build b/contrib/hash_test/meson.build new file mode 100644 index 000..bc3aac4e66d --- /dev/null +++ b/contrib/hash_test/meson.build @@ -0,0 +1,10 @@ + +hash_test_sources = files( + 'has
Re: [BUG] Skipped initialization of some xl_xact_parsed_prepare fields
Hi, On Wed, May 21, 2025 at 9:59 AM Fujii Masao wrote: > I've pushed the patch. Thanks! > Glad to hear it, thank you! -- Best regards, Daniil Davydov
Re: Avoid orphaned objects dependencies, take 3
Hi, On Tue, May 20, 2025 at 02:12:41PM -0700, Jeff Davis wrote: > On Mon, 2025-05-19 at 14:07 -0400, Robert Haas wrote: > > I agree with that, but I think that it may also be error-prone to > > assume that it's OK to acquire heavyweight locks on other catalog > > objects at any place in the code where we record a dependency. I will > > not be surprised at all if that turns out to have some negative > > consequences. For example, I think it might result in acquiring the > > locks on those other objects at a subtly wrong time (leading to race > > conditions) or acquiring two locks on the same object with different > > lock modes where we should really only acquire one. I'm all in favor > > of solving this problem using heavyweight locks, but I think that > > implicitly acquiring them as a side effect of recording dependencies > > feels too surprising. > > I see what you mean now, in the sense that other code that calls > LockDatabaseObject (and other variants of LockAcquire) are mostly > higher-level operations like AlterPublication(), and not side-effects > of something else. > > But relation_open() is sort of an exception. There are lots of places > where that takes a lock because we happen to want something out of the > relcache, like generate_partition_qual() taking a lock on the parent or > CheckAttributeType() taking a lock on the typrelid. You could say those > are fairly obvious, but that's because we already know, and we could > make it more widely known that recording a dependency takes a lock. > > One compromise might be to have recordDependencyOn() take a LOCKMODE > parameter, which would both inform the caller that a lock will be > taken, and allow the caller to do it their own way and specify NoLock > if necessary. That still results in a huge diff, but the end result > would not be any more complex than the current code. Thanks for sharing your thoughts! I had in mind to "just" check if there is an existing lock (and if so, skip acquiring a new one) but your proposal sounds better. Indeed it would make the locking behavior explicit and also be flexible (allowing the callers to choose the LOCKMODE). I'll prepare a new version implementing your proposal. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Violation of principle that plan trees are read-only
On Mon, May 19, 2025 at 7:45 PM Tom Lane wrote: > Isaac Morland writes: > > I assume this question has an obvious negative answer, but why can't we > > attach const declarations to the various structures that make up the plan > > tree (at all levels, all the way down)? I know const doesn't actually > > prevent a value from changing, but at least the compiler would complain > if > > code accidentally tried. > > The big problem is that a "const" attached to a top-level pointer > doesn't inherently propagate down to sub-nodes. So if I had, say, > "const Query *stmt", the compiler would complain about > > stmt->jointree = foo; > > but not about > > stmt->jointree->quals = foo; > > I guess we could imagine developing an entirely parallel set of > struct declarations with "const" on all pointer fields, like > > typedef struct ConstQuery > { > ... > const ConstFromExpr *jointree; > ... > } ConstQuery; > > but even with automated maintenance of the ConstFoo doppelganger > typedefs, it seems like that'd be a notational nightmare. For > one thing, I'm not sure how to teach the compiler that casting > "Query *" to "ConstQuery *" is okay but vice versa isn't. > > Does C++ have a better story in this area? I haven't touched it > in so long that I don't remember. > > regards, tom lane > > One unconventional but potentially effective approach to detect unexpected modifications in the plan tree can be as follows: - Implement a function that can deeply compare two plan trees for structural or semantic differences. - Before passing the original plan tree to the executor, make a deep copy of it. - After execution (or at strategic checkpoints), compare the current plan tree against the original copy. - If any differences are detected, emit a warning or log it for further inspection. Yes, this approach introduces some memory and performance overhead. However, we can limit its impact by enabling it conditionally via a compile-time flag or #define, making it suitable for debugging or assertion-enabled builds. It might sound a bit unconventional, but it could serve as a useful sanity check especially during development or when investigating plan tree integrity issues. Pardon me if this sounds naive! Yasir Data Bene
Re: Re: proposal: schema variables
st 21. 5. 2025 v 2:21 odesílatel Michael Paquier napsal: > On Tue, May 20, 2025 at 10:28:31PM +0200, Pavel Stehule wrote: > > This topic is difficult, because there is no common solution. SQL/PSM is > > almost dead. T-SQL (and MySQL) design is weak and cannot be used for > > security. > > Oracle's design is joined with just one environment. And although almost > > all widely used databases have supported session variables for decades, > no > > one design > > is perfect. Proposed design is not perfect too (it introduces possible > > ambiguity) , but I think it can support most wanted use cases (can be > > enhanced in future), > > and it is consistent with Postgres. There are more ways to reduce risk of > > unwanted ambiguity to zero. But it increases the size of the patch. > > One thing that I keep hearing about this feature is that this would be > really helpful for migration from Oracle to PostgreSQL, helping a lot > with rewrites of pl/pgsql functions. > > There is one page on the wiki about private variables, dating back to > 2016: > https://wiki.postgresql.org/wiki/CREATE_PRIVATE_VARIABLE > > I wrote mail https://www.postgresql.org/message-id/cafj8prb8kdwqcdn2x1_63c58+07oy4z+rudk_xptup+pe8r...@mail.gmail.com and there is another wiki page https://wiki.postgresql.org/wiki/Variable_Design > Perhaps it would help to summarize a bit the pros and cons of existing > implementations to drive which implementation would be the most suited > on a wiki page? The way standards are defined is by overwriting > existing standards, and we don't have one in the SQL specification. > It's hard to say if there will be one at some point, but if the main > SQL products around the world have one, it pretty much is a point in > favor of having a standard. > Although it is maybe a peccant idea - I can imagine two different implementations of server side session variables with different syntaxes (and different advantages and disadvantages, and different use cases). The implementations are not going against, but we should to accept fact, so one feature is implemented twice. We should choose just one, that will be implemented first. Proposed helps with migration from PL/SQL. > > Another possible angle that could be taken is to invest in a proposal > for the SQL committee to consider, forcing an actual standard that > PostgreSQL could rely on rather than having one behavior implemented > to have it standard-incompatible a few years after. And luckily, we > have Vik Fearing and Peter Eisentraut in this community who invest > time and resources in this area. > Theoretically the proposed design is a subset of implementation from DB2 - I designed it without knowledge of this DB2 feature. But without introduction of concept of modules (that is partially redundant to schemas), this design is very natural and I am very sure, so there is not another way, how to design it. We can ask Peter or Vik about real possibilities in this area. I have not any information from this area, just I haven't seen the changes in SQL/PSM for decades, so I didn't think about it. > FWIW, I do agree with the opinion that if you want to propose rebased > versions of this patch set on a periodic basis, you are free to do so. > This is the core concept of an open community. In terms of my > committer time, I'm pretty much already booked in terms of features > planned for the next release, so I guess that I won't be able to > invest time on this thread. Just saying. > thank you for an info > I know that this patch set has been discussed at FOSDEM at some point, > so others may be able to comment more about that. That's just one > opinion among many others. > -- > Michael >