Re: Implement missing join selectivity estimation for range types
I cannot figure out why it aborts. as Tom mentioned in upthread about the test cases. similar to src/test/regress/sql/stats_ext.sql check_estimated_rows function. we can test it by something: create or replace function check_estimated_rows(text) returns table (ok bool) language plpgsql as $$ declare ln text; tmp text[]; first_row bool := true; begin for ln in execute format('explain analyze %s', $1) loop if first_row then first_row := false; tmp := regexp_match(ln, 'rows=(\d*) .* rows=(\d*)'); return query select 0.2 < tmp[1]::float8 / tmp[2]::float8 and tmp[1]::float8 / tmp[2]::float8 < 5; end if; end loop; end; $$; select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 && ir2$$); select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 << ir2$$); select * from check_estimated_rows($$select * from test_range_join_1, test_range_join_2 where ir1 >> ir2$$); Do you need 3 tables to do the test? because we need to actually run the query then compare the estimated row and actually returned rows. If you really execute the query with 3 table joins, it will take a lot of time. So two tables join with where quql should be fine? /* Fast-forwards i and j to start of iteration */ + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++); + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++); + + /* Do the estimation on overlapping regions */ + while (i < nhist1 && j < nhist2) + { + double cur_sel1, + cur_sel2; + RangeBound cur_sync; + + if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) < 0) + cur_sync = hist1[i++]; + else if (range_cmp_bound_values(typcache, &hist1[i], &hist2[j]) > 0) + cur_sync = hist2[j++]; + else + { + /* If equal, skip one */ + cur_sync = hist1[i]; + this part range_cmp_bound_values "if else if" part computed twice, you can just do ` int cmp; cmp = range_cmp_bound_values(typcache, &hist1[i], &hist2[j]); if cmp <0 then else if cmp > 0 then else then ` also. I think you can put the following into main while loop. + for (i = 0; range_cmp_bound_values(typcache, &hist1[i], &hist2[0]) < 0; i++); + for (j = 0; range_cmp_bound_values(typcache, &hist2[j], &hist1[0]) < 0; j++); split range and multirange into 2 patches might be a good idea. seems: same function (calc_hist_join_selectivity) with same function signature in src/backend/utils/adt/multirangetypes_selfuncs.c and src/backend/utils/adt/rangetypes_selfuncs.c, previously mail complaints not resolved.
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Mon, Jan 22, 2024 at 2:24 PM Masahiko Sawada wrote: > > For the next version patch, I'll work on this idea and try to clean up > locking stuff both in tidstore and radix tree. Or if you're already > working on some of them, please let me know. I'll review it. Okay go ahead, sounds good. I plan to look at the tests since they haven't been looked at in a while.
Bug report and fix about building historic snapshot
Hello, I find a bug in building historic snapshot and the steps to reproduce are as follows: Prepare: (pub)create table t1 (id int primary key); (pub)insert into t1 values (1); (pub)create publication pub for table t1; (sub)create table t1 (id int primary key); Reproduce: (pub)begin; insert into t1 values (2); (txn1 in session1) (sub)create subscription sub connection 'hostaddr=127.0.0.1 port=5432 user=xxx dbname=postgres' publication pub; (pub will switch to BUILDING_SNAPSHOT state soon) (pub)begin; insert into t1 values (3); (txn2 in session2) (pub)create table t2 (id int primary key); (session3) (pub)commit; (commit txn1, and pub will switch to FULL_SNAPSHOT state soon) (pub)begin; insert into t2 values (1); (txn3 in session3) (pub)commit; (commit txn2, and pub will switch to CONSISTENT state soon) (pub)commit; (commit txn3, and replay txn3 will failed because its snapshot cannot see t2) Reasons: We currently don't track the transaction that begin after BUILDING_SNAPSHOT and commit before FULL_SNAPSHOT when building historic snapshot in logical decoding. This can cause a transaction which begin after FULL_SNAPSHOT to take an incorrect historic snapshot because transactions committed in BUILDING_SNAPSHOT state will not be processed by SnapBuildCommitTxn(). To fix it, we can track the transaction that begin after BUILDING_SNAPSHOT and commit before FULL_SNAPSHOT forcely by using SnapBuildCommitTxn(). -- Regards, ChangAo Chen 0001-Track-transactions-committed-in-BUILDING_SNAPSHOT.patch Description: Binary data
Re: heavily contended lwlocks with long wait queues scale badly
On Fri, Jan 19, 2024 at 01:49:59PM +0100, Jakub Wartak wrote: > Hi Michael, just to reassure you that it is a good thing. We have a > customer who reported much better performance on 16.x than on 13~15 in > very heavy duty LWLock/lockmanager scenarios (ofc, before that was > committed/released), so I gave it a try here today to see how much can > be attributed to that single commit. Ahh. Thanks for the feedback. -- Michael signature.asc Description: PGP signature
RE: Synchronizing slots from primary to standby
On Monday, January 22, 2024 11:36 AM shveta malik wrote: Hi, > On Fri, Jan 19, 2024 at 4:18 PM shveta malik wrote: > > > > PFA v64. > > V64 fails to apply to HEAD due to a recent commit. Rebased it. PFA v64_2. It > has > no new changes. I noticed few things while analyzing the patch. 1. sleep_ms = Min(sleep_ms * 2, MAX_WORKER_NAPTIME_MS); The initial value for sleep_ms is 0(default value for static variable) which will not be advanced in this expression. We should initialize sleep_ms to a positive number. 2. / Wait a bit, we don't expect to have to wait long / rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 10L, WAIT_EVENT_BGWORKER_SHUTDOWN); The slotsync worker is not a bgworker anymore after 0003 patch, so a new event is needed I think. 3. slot->effective_catalog_xmin = xmin_horizon; The assignment is also needed in local_slot_update() to make ReplicationSlotsComputeRequiredXmin work. Best Regards, Hou zj
Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Hi, On Mon, Jan 22, 2024 at 03:54:44PM +0900, Michael Paquier wrote: > On Fri, Jan 19, 2024 at 09:03:01AM +, Bertrand Drouvot wrote: > > On Fri, Jan 19, 2024 at 09:00:01AM +0300, Alexander Lakhin wrote: > +# Launch $sql and wait for a new snapshot that has a newer horizon before > +# doing the vacuum with $vac_option on $to_vac. > +sub wait_until_vacuum_can_remove > > This had better document what the arguments of this routine are, > because that's really unclear. $to_vac is the relation that will be > vacuumed, for one. Agree, done that way in v8 attached. > Also, wouldn't it be better to document in the test why > txid_current_snapshot() is chosen? Contrary to txid_current(), it > does not generate a Transaction/COMMIT to make the test more > predictible, something you have mentioned upthread, and implied in the > test. Good point, added more comments in v8 (but not mentioning txid_current() as after giving more thought about it then I think it was not the right approach in any case). > > - INSERT INTO flush_wal DEFAULT VALUES; -- see create table flush_wal > > This removes two INSERTs on flush_wal and refactors the code to do the > INSERT in wait_until_vacuum_can_remove(), using a SQL comment to > document a reference about the reason why an INSERT is used. Couldn't > you just use a normal comment here? Agree, done in v8. > >> I've re-tested the v6 patch today and confirmed that it makes the test > >> more stable. I ran (with bgwriter_delay = 1 in temp.config) 20 tests in > >> parallel and got failures ('inactiveslot slot invalidation is logged with > >> vacuum on pg_authid') on iterations 2, 6, 6 with no patch applied. > >> (With unlimited CPU, when average test duration is around 70 seconds.) > >> > >> But with v6 applied, 60 iterations succeeded. > > > > Nice! Thanks for the testing! > > I need to review what you have more thoroughly, but is it OK to assume > that both of you are happy with the latest version of the patch in > terms of stability gained? That's still not the full picture, still a > good step towards that. Yeah, I can clearly see how this patch helps from a theoritical perspective but rely on Alexander's testing to see how it actually stabilizes the test. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 1380408965d8c7cb1a2e63043af7d1b8033a7e89 Mon Sep 17 00:00:00 2001 From: bdrouvot Date: Tue, 9 Jan 2024 05:08:35 + Subject: [PATCH v8] Fix 035_standby_logical_decoding.pl race condition We want to ensure that vacuum was able to remove dead rows (aka no other transactions holding back global xmin) before testing for slots invalidation on the standby. While at it, also fixing some typos/bad test description in it. --- .../t/035_standby_logical_decoding.pl | 78 +++ 1 file changed, 45 insertions(+), 33 deletions(-) 100.0% src/test/recovery/t/ diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index 8bc39a5f03..6664405772 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -238,6 +238,35 @@ sub check_for_invalidation ) or die "Timed out waiting confl_active_logicalslot to be updated"; } +# Launch $sql and wait for a new snapshot that has a newer horizon before +# doing the vacuum. Arguments are: $vac_option (the option to be passed to the +# vacuum command), $sql (the sql to launch before triggering the vacuum) and +# $to_vac (the relation to vacuum). +# Note that to get the horizon we're using pg_current_snapshot() (it does not +# generate a Transaction/COMMIT wal record, so we don't increase the risk to see +# a xl_running_xacts that would advance active replication slot's catalog_xmin). +# Indeed advancing the active replication slot's catalog_xmin would break some +# tests that expect the active slot to conflict with the catalog xmin horizon. +sub wait_until_vacuum_can_remove +{ + my ($vac_option, $sql, $to_vac) = @_; + + # Get the current xid horizon + my $xid_horizon = $node_primary->safe_psql('testdb', qq[select pg_snapshot_xmin(pg_current_snapshot());]); + + # Launch our sql + $node_primary->safe_psql('testdb', qq[$sql]); + + # Wait until we get a newer horizon + $node_primary->poll_query_until('testdb', + "SELECT (select pg_snapshot_xmin(pg_current_snapshot())::text::int - $xid_horizon) > 0") + or die "new snapshot does not have a newer horizon"; + + # Launch the vacuum command and insert into flush_wal (see create table + # flush_wal for the reason why). + $node_primary->safe_psql('testdb', qq[VACUUM $vac_option verbose $to_vac; + INSERT INTO flush_wal DEFAULT VALUES;]); +} # Initialize primary node @@ -248,6 +277,7 @@ $node_primary->append_conf( wal_level = 'logical' max_replication_slots = 4 max_wal_senders = 4 +autovac
RE: speed up a logical replica setup
Dear Peter, > > > Yet other options could be > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name > > sounds like we are doing some work to create the subscriber which I > > think is the case here. > > I see your point here. pg_createsubscriber is not like createuser in > that it just runs an SQL command. It does something different than > CREATE SUBSCRIBER. So a different verb would make that clearer. Maybe > something from here: https://www.thesaurus.com/browse/convert I read the link and found a good verb "switch". So, how about using "pg_switchsubscriber"? Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Add \syncpipeline command to pgbench
That looks like a bug with how opened pipelines are not caught at the end of the script processing. startpipeline seems to have similar related issues. $ cat only_startpipeline.sql \startpipeline SELECT 1; With only 1 transaction, pgbench will consider this a success despite not sending anything since the pipeline was not flushed: pgbench -t1 -Mextended -f only_startpipeline.sql [...] number of transactions per client: 1 number of transactions actually processed: 1/1 With 2 transactions, the error will happen when \startpipeline is called a second time: pgbench -t2 -Mextended -f only_startpipeline.sql [...] pgbench: error: client 0 aborted in command 0 (startpipeline) of script 0; already in pipeline mode number of transactions per client: 2 number of transactions actually processed: 1/2 I've split the changes into two patches. 0001 introduces a new error when the end of a pgbench script is reached while there's still an ongoing pipeline. 0002 adds the \syncpipeline command (original patch with an additional test case). Regards, Anthonin On Mon, Jan 22, 2024 at 7:16 AM Michael Paquier wrote: > > On Fri, Jan 19, 2024 at 08:55:31AM +0100, Anthonin Bonnefoy wrote: > > I've updated the doc to group the commands. It does look better and > > more consistent with similar command groups like \if. > > I was playing with a few meta command scenarios while looking at this > patch, and this sequence generates an error that should never happen: > $ cat /tmp/test.sql > \startpipeline > \syncpipeline > $ pgbench -n -f /tmp/test.sql -M extended > [...] > pgbench: error: unexpected transaction status 1 > pgbench: error: client 0 aborted while receiving the transaction status > > It looks to me that we need to be much smarter than that for the error > handling we'd need when a sync request is optionally sent when a > transaction stops at the end of pgbench. Could you look at it? > -- > Michael v3-0001-Abort-pgbench-if-script-end-is-reached-with-an-op.patch Description: Binary data v3-0002-pgbench-Add-syncpipeline-to-pgbench.patch Description: Binary data
Re: speed up a logical replica setup
On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu) wrote: > > > > > Yet other options could be > > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name > > > sounds like we are doing some work to create the subscriber which I > > > think is the case here. > > > > I see your point here. pg_createsubscriber is not like createuser in > > that it just runs an SQL command. It does something different than > > CREATE SUBSCRIBER. Right. So a different verb would make that clearer. Maybe > > something from here: https://www.thesaurus.com/browse/convert > > I read the link and found a good verb "switch". So, how about using > "pg_switchsubscriber"? > I also initially thought on these lines and came up with a name like pg_convertsubscriber but didn't feel strongly about it as that would have sounded meaningful if we use a name like pg_convertstandbytosubscriber. Now, that has become too long. Having said that, I am not opposed to it having a name on those lines. BTW, another option that occurred to me today is pg_preparesubscriber. We internally create slots and then wait for wal, etc. which makes me sound like adding 'prepare' in the name can also explain the purpose. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Mon, Jan 22, 2024 at 1:11 PM Bertrand Drouvot wrote: > > Hi, > Thanks for sharing the feedback. > > > Then we also discussed whether extending libpqwalreceiver's connect > > API is a good idea and whether we need to further extend it in the > > future. As far as I can see, slotsync worker's primary requirement is > > to execute SQL queries which the current API is sufficient, and don't > > see something that needs any drastic change in this API. Note that > > tablesync worker that executes SQL also uses these APIs, so we may > > need something in the future for either of those. Then finally we need > > a slotsync worker to also connect to a database to use SQL and fetch > > results. > > > > On my side the nits concerns about using the libpqrcv_connect / > walrcv_connect are: > > - cosmetic: the "rcv" do not really align with the sync slot worker > But note that the same API is even used for apply worker as well. One can think that this is a connection used to receive WAL or slot_info. minor comments on the patch: === 1. + /* First time slot update, the function must return true */ + Assert(local_slot_update(remote_slot)); Isn't moving this code to Assert in update_and_persist_slot() wrong? It will make this function call no-op in non-assert builds? 2. + ereport(LOG, + errmsg("newly locally created slot \"%s\" is sync-ready now", I think even without 'locally' in the above LOG message, it is clear. 3. +/* + * walrcv_get_dbinfo_for_failover_slots_fn + * + * Run LIST_DBID_FOR_FAILOVER_SLOTS on primary server to get the + * list of unique DBIDs for failover logical slots + */ +typedef List *(*walrcv_get_dbinfo_for_failover_slots_fn) (WalReceiverConn *conn); This looks like a leftover from the previous version of the patch. -- With Regards, Amit Kapila.
Re: initdb's -c option behaves wrong way?
> On 19 Jan 2024, at 17:33, Tom Lane wrote: > > Daniel Gustafsson writes: >> I'll give some more time for opinions, then I'll go ahead with one of the >> patches with a backpatch to v16. > > OK, I take back my previous complaint that just using strncasecmp > would be enough --- I was misremembering how the code worked, and > you're right that it would use the spelling from the command line > rather than that from the file. > > However, the v3 patch is flat broken. You can't assume you have > found a match until you've verified that whitespace and '=' > appear next --- otherwise, you'll be fooled by a guc_name that > is a prefix of one that appears in the file. I think the simplest > change that does it correctly is as attached. The attached v4 looks good to me, I don't think it moves the needle wrt readability (ie, no need to move the search). Feel free to go ahead with that version, or let me know if you want me to deal with it. -- Daniel Gustafsson
Re: Prefetch the next tuple's memory during seqscans
On Sat, 20 Jan 2024 at 16:35, vignesh C wrote: > I'm seeing that there has been no activity in this thread for more > than 6 months, I'm planning to close this in the current commitfest > unless someone is planning to take it forward. Thanks for the reminder about this. Since the heapgettup/heapgettup_pagemode refactor I was unable to see the same performance gains as I was before. Also, since reading "The Art of Writing Efficient Programs" I'm led to believe that modern processor hardware prefetchers can detect and prefetch on both forward and backward access patterns. I also saw some discussion on twitter about this [1]. I'm not sure yet how this translates to non-uniform access patterns, e.g. tuples are varying cachelines apart and we do something like only deform attributes in the first cacheline. Will the prefetcher still see the pattern in this case? If it's non-uniform, then how does it know which cacheline to fetch? If the tuple spans multiple cacheline and we deform the whole tuple, will accessing the next cacheline in a forward direction make the hardware prefetcher forget about the more general backward access that's going on? These are questions I'll need to learn the answers to before I can understand what's the best thing to do in this area. The only way to tell is to design a benchmark and see how far we can go before the hardware prefetcher no longer detects the pattern. I've withdrawn the patch. I can resubmit once I've done some more experimentation if that experimentation yields positive results. David [1] https://twitter.com/ID_AA_Carmack/status/1470832912149135360
Re: Make documentation builds reproducible
On 20.01.24 23:59, Tom Lane wrote: Peter Eisentraut writes: On 20.01.24 17:03, Tom Lane wrote: * I gather that the point here is to change some generated anchor tags. Would any of these tags be things people would be likely to have bookmarked? No, because the problem is that the anchor names are randomly generated in each build. D'oh. No objection then. Thanks, committed.
Re: XLog size reductions: smaller XLRec block header for PG17
Hi, > I'm seeing that there has been no activity in this thread for nearly 4 > months, I'm planning to close this in the current commitfest unless > someone is planning to take it forward. I don't think that closing CF entries purely due to inactivity is a good practice (neither something we did before) as long as there is code, it applies, etc. There are a lot of patches and few people working on them. Inactivity in a given thread doesn't necessarily indicate lack of interest, more likely lack of resources. -- Best regards, Aleksander Alekseev
Re: FEATURE REQUEST: Role vCPU limit/priority
Hi Yoni, > It would be useful to have the ability to define for a role default vCPU > affinity limits/thread priority settings so that more active sessions could > coexist similar to MySQL resource groups. To me this sounds like a valuable feature. Would you be interested in working on it? Typically it is a good idea to start with an RFC document and discuss it with the community. -- Best regards, Aleksander Alekseev
Re: Remove unused fields in ReorderBufferTupleBuf
Hi, > Hi, this patch was marked in CF as "Needs Review" [1], but there has > been no activity on this thread for 5+ months. > > Do you wish to keep this open, or can you post something to elicit > more interest in reviews for the latest patch set? Otherwise, if > nothing happens then the CF entry will be closed ("Returned with > feedback") at the end of this CF. I don't think that closing CF entries only due to inactivity is a good practice, nor something we typically do. When someone will have spare time this person will (hopefully) review the code. -- Best regards, Aleksander Alekseev
Re: BUG: Former primary node might stuck when started as a standby
Hi, > But node1 knows that it's a standby now and it's expected to get all the > WAL records from the primary, doesn't it? Yes, but node1 doesn't know if it always was a standby or not. What if node1 was always a standby, node2 was a primary, then node2 died and node3 is a new primary. If node1 sees inconsistency in the WAL records, it should report it and stop doing anything, since it doesn't has all the information needed to resolve the inconsistencies in all the possible cases. Only DBA has this information. > > It's been a while since I seriously played with replication, but if > > memory serves, a proper way to switch node1 to a replica mode would be > > to use pg_rewind on it first. > > Perhaps that's true generally, but as we can see, without the extra > records replayed, this scenario works just fine. Moreover, existing tests > rely on it, e.g., 009_twophase.pl or 012_subtransactions.pl (in fact, my > research of the issue was initiated per a test failure). I suggest focusing on particular flaky tests then and how to fix them. -- Best regards, Aleksander Alekseev
Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals
On Thu, 28 Dec 2023 at 00:38, Andy Fan wrote: > I also want to add notnullattnums for the UniqueKey stuff as well, by > comparing your implementation with mine, I found you didn't consider > the NOT NULL generated by filter. After apply your patch: > > create table a(a int); > explain (costs off) select * from a where a > 3 and a is null; > QUERY PLAN > - > Seq Scan on a >Filter: ((a IS NULL) AND (a > 3)) > (2 rows) > [1] > https://www.postgresql.org/message-id/attachment/151254/v1-0001-uniquekey-on-base-relation-and-used-it-for-mark-d.patch I believe these are two different things and we should not mix the two up. Looking at your patch, I see you have: + /* The not null attrs from catalogs or baserestrictinfo. */ + Bitmapset *notnullattrs; Whereas, I have: /* zero-based set containing attnums of NOT NULL columns */ Bitmapset *notnullattnums; I'm a bit worried that your definition of notnullattrs could lead to confusion about which optimisations will be possible. Let's say for example I want to write some code that optimises the expression evaluation code to transform EEOP_FUNCEXPR_STRICT into EEOP_FUNCEXPR when all function arguments are Vars that have NOT NULL constraints and are not nullable by any outer join. With my definition, it should be safe to do this, but with your definition, we can't trust we'll not see any NULLs as if the strict function is evaluated before the strict base qual that filters the NULLs then the strict function could be called with NULL. Perhaps we'd want another Bitmapset that has members for strict OpExrs that filter NULLs and we could document that it's only safe to assume there are no NULLs beyond the scan level but I'd say that's another patch and I don't want to feed you design ideas here and derail this patch. David
Re: Remove unused fields in ReorderBufferTupleBuf
On Wed, Jul 26, 2023 at 7:22 PM Aleksander Alekseev wrote: > > > Here is the corrected patch. I added it to the nearest CF [1]. > > I played a bit more with the patch. There was an idea to make > ReorderBufferTupleBufData an opaque structure known only within > reorderbyffer.c but it turned out that replication/logical/decode.c > accesses it directly so I abandoned that idea for now. > > > Alternatively we could convert ReorderBufferTupleBufData macro to an > > inlined function. At least it will add some type safety. > > Here is v3 that implements it too as a separate patch. > But why didn't you pursue your idea of getting rid of the wrapper structure ReorderBufferTupleBuf which after this patch will have just one member? I think there could be hassles in backpatching bug-fixes in some cases but in the longer run it would make the code look clean. -- With Regards, Amit Kapila.
Re: Removing const-false IS NULL quals and redundant IS NOT NULL quals
On Mon, 22 Jan 2024 at 17:32, Peter Smith wrote: > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there were CFbot test failures last time it was run [2]. I've attached v11 which updates the expected results in some newly added regression tests. No other changes. David From 095744f5583ab5446c1cdb75bfd3b40c7ab493d8 Mon Sep 17 00:00:00 2001 From: David Rowley Date: Thu, 7 Dec 2023 22:52:34 +1300 Subject: [PATCH v11] Reduce NullTest quals to constant TRUE or FALSE --- .../postgres_fdw/expected/postgres_fdw.out| 16 +- contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +- src/backend/optimizer/plan/initsplan.c| 202 +-- src/backend/optimizer/util/joininfo.c | 28 ++ src/backend/optimizer/util/plancat.c | 19 ++ src/backend/optimizer/util/relnode.c | 3 + src/include/nodes/pathnodes.h | 7 +- src/include/optimizer/planmain.h | 4 + src/test/regress/expected/equivclass.out | 18 +- src/test/regress/expected/join.out| 67 +++-- src/test/regress/expected/predicate.out | 244 ++ src/test/regress/parallel_schedule| 2 +- src/test/regress/sql/predicate.sql| 122 + 13 files changed, 663 insertions(+), 73 deletions(-) create mode 100644 src/test/regress/expected/predicate.out create mode 100644 src/test/regress/sql/predicate.sql diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index d83f6ae8cb..b5a38aeb21 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -656,20 +656,20 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = 100)) AND ((c2 = 0)) (3 rows) -EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- NullTest - QUERY PLAN -- +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;-- NullTest + QUERY PLAN +-- Foreign Scan on public.ft1 t1 Output: c1, c2, c3, c4, c5, c6, c7, c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL)) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS NULL)) (3 rows) -EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest - QUERY PLAN -- +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;-- NullTest +QUERY PLAN +-- Foreign Scan on public.ft1 t1 Output: c1, c2, c3, c4, c5, c6, c7, c8 - Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL)) + Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS NOT NULL)) (3 rows) EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql index 90c8fa4b70..f410c3db4e 100644 --- a/contrib/postgres_fdw/sql/postgres_fdw.sql +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql @@ -332,8 +332,8 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft_empty ORDER BY c1; -- === EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr -EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- NullTest -EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NULL;-- NullTest +EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c3 IS NOT NULL;-- NullTest EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1; -- OpExpr(l) EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE (c1 IS NOT NULL) IS DISTINCT FROM (c1 IS NOT NULL); -- DistinctExpr diff --git a
Re: introduce dynamic shared memory registry
On Mon, Jan 22, 2024 at 3:43 AM Nathan Bossart wrote: > > Oops. I've attached an attempt at fixing this. I took the opportunity to > clean up the surrounding code a bit. The code looks cleaner and readable with the patch. All the call sites are taking care of dsm_attach returning NULL value. So, the attached patch looks good to me. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Jan 22, 2024 at 12:28 PM Amit Kapila wrote: > > On Fri, Jan 19, 2024 at 3:55 PM shveta malik wrote: > > > > On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada > > wrote: > > > > > > > > > Thank you for updating the patch. I have some comments: > > > > > > --- > > > +latestWalEnd = GetWalRcvLatestWalEnd(); > > > +if (remote_slot->confirmed_lsn > latestWalEnd) > > > +{ > > > +elog(ERROR, "exiting from slot synchronization as the > > > received slot sync" > > > + " LSN %X/%X for slot \"%s\" is ahead of the > > > standby position %X/%X", > > > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn), > > > + remote_slot->name, > > > + LSN_FORMAT_ARGS(latestWalEnd)); > > > +} > > > > > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is > > > typically the primary server's flush position and doesn't mean the LSN > > > where the walreceiver received/flushed up to. > > > > yes. I think it makes more sense to use something which actually tells > > flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd() > > with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have > > enabled the slot-sync feature in a running standby, in that case we > > are all good (flushedUpto is the same as actual flush-position > > indicated by LogstreamResult.Flush). But if I restart standby, then I > > observed that the startup process sets flushedUpto to some value 'x' > > (see [1]) while when the wal-receiver starts, it sets > > 'LogstreamResult.Flush' to another value (see [2]) which is always > > greater than 'x'. And we do not update flushedUpto with the > > 'LogstreamResult.Flush' value in walreceiver until we actually do an > > operation on primary. Performing a data change on primary sends WALs > > to standby which then hits XLogWalRcvFlush() and updates flushedUpto > > same as LogstreamResult.Flush. Until then we have a situation where > > slots received on standby are ahead of flushedUpto and thus slotsync > > worker keeps one erroring out. I am yet to find out why flushedUpto is > > set to a lower value than 'LogstreamResult.Flush' at the start of > > standby. Or maybe am I using the wrong function > > GetWalRcvFlushRecPtr() and should be using something else instead? > > > > Can we think of using GetStandbyFlushRecPtr()? We probably need to > expose this function, if this works for the required purpose. I think we can. For the records, the problem while using flushedUpto (or GetWalRcvFlushRecPtr()) directly is that it is not set to the latest flushed position immediately after startup. It points to some prior location (perhaps segment or page start) after startup until some data is flushed next which then updates it to the latest flushed position, thus we can not use it directly. GetStandbyFlushRecPtr() OTOH takes care of it i.e. it returns correct flushed-location at any point of time. I have changed v65 to use this one. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Jan 19, 2024 at 11:48 AM Peter Smith wrote: > > Here are some review comments for patch v63-0003. Thanks Peter. I have addressed all in v65. > > 4b. > It was a bit different when there were ERRORs but now they are LOGs > somehow it seems wrong for this function to say what the *caller* will > do. Maybe you can rewrite all the errmsg so the don't say "skipping" > but they just say "bad configuration for slot synchronization" > > If valid is false then you can LOG "skipping" at the caller... I have made this change but now in the log file we see 3 logs like below, does it seem apt? Was the earlier one better where we get the info in 2 lines? [34416] LOG: bad configuration for slot synchronization [34416] HINT: hot_standby_feedback must be enabled. [34416] LOG: skipping slot synchronization thanks Shveta
Re: Synchronizing slots from primary to standby
On Sat, Jan 20, 2024 at 7:44 PM Amit Kapila wrote: > > On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar wrote: > > > > On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila wrote: > > > > > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik > > > wrote: > > > > > > > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > > > on the topic of extending replication commands instead of using the > > > current model where we fetch the required slot information via SQL > > > using a database connection. I would like to summarize the discussion > > > and would like to know the thoughts of others on this topic. > > > > > > In the current patch, we launch the slotsync worker on physical > > > standby which connects to the specified database (currently we let > > > users specify the required dbname in primary_conninfo) on the primary. > > > It then fetches the required information for failover marked slots > > > from the primary and also does some primitive checks on the upstream > > > node via SQL (the additional checks are like whether the upstream node > > > has a specified physical slot or whether the upstream node is a > > > primary node or a standby node). To fetch the required information it > > > uses a libpqwalreciever API which is mostly apt for this purpose as it > > > supports SQL execution but for this patch, we don't need a replication > > > connection, so we extend the libpqwalreciever connect API. > > > > What sort of extension we have done to 'libpqwalreciever'? Is it > > something like by default this supports replication connections so we > > have done an extension to the API so that we can provide an option > > whether to create a replication connection or a normal connection? > > > > Yeah and in the future there could be more as well. The other function > added walrcv_get_dbname_from_conninfo doesn't appear to be a problem > either for now. > > > > Now, the concerns related to this could be that users would probably > > > need to change existing mechanisms/tools to update priamry_conninfo I'm concerned about this. In fact, a primary_conninfo value generated by pg_basebackup does not work with enable_syncslot. > > > and one of the alternatives proposed is to have an additional GUC like > > > slot_sync_dbname. Users won't be able to drop the database this worker > > > is connected to aka whatever is specified in slot_sync_dbname but as > > > the user herself sets up the configuration it shouldn't be a big deal. > > > > Yeah for this purpose users may use template1 or so which they > > generally don't plan to drop. > > > > Using template1 has other problems like users won't be able to create > a new database. See [2] (point number 2.2) > > > > > So in case the user wants to drop that > > database user needs to turn off the slot syncing option and then it > > can be done? > > > > Right. If the user wants to continue using slot syncing, they need to switch the database to connect. Which requires modifying primary_conninfo and reloading the configuration file. Which further leads to restarting the physical replication. If they use synchronous replication, it means the application temporarily stops during that. > > > > Then we also discussed whether extending libpqwalreceiver's connect > > > API is a good idea and whether we need to further extend it in the > > > future. As far as I can see, slotsync worker's primary requirement is > > > to execute SQL queries which the current API is sufficient, and don't > > > see something that needs any drastic change in this API. Note that > > > tablesync worker that executes SQL also uses these APIs, so we may > > > need something in the future for either of those. Then finally we need > > > a slotsync worker to also connect to a database to use SQL and fetch > > > results. > > > > While looking into the patch v64-0002 I could not exactly point out > > what sort of extensions are there in libpqwalreceiver.c, I just saw > > one extra API for fetching the dbname from connection info? > > > > Right, the worry was that we may need it in the future. Yes. IIUC the slotsync worker uses libpqwalreceiver to establish a non-replication connection and to execute SQL query. But neither of them are relevant with replication. I'm a bit concerned that when we need to extend the slotsync feature in the future we will end up extending libpqwalreceiver, even if the new feature is not also relevant with replication. > > > > Now, let us consider if we extend the replication commands like > > > READ_REPLICATION_SLOT and or introduce a new set of replication > > > commands to fetch the required information then we don't need a DB > > > connection with primary or a connection in slotsync worker. As per my > > > current understanding, it is quite doable but I think we will slowly > > > go in the direction of making replication commands something like SQL > > > because today we need to extend it to fetch all slots info that have > > > failover marked as true, the existenc
Re: partitioning and identity column
On 17.01.24 06:36, Ashutosh Bapat wrote: On Wed, Jan 17, 2024 at 12:30 AM Peter Eisentraut wrote: On 09.01.24 15:10, Ashutosh Bapat wrote: Here's complete patch-set. Looks good! Committed. Thanks a lot Peter. I found another piece of code that might need updating, or at least the comment. In MergeAttributes(), in the part that merges the specified column definitions into the inherited ones, it says /* * Identity is never inherited. The new column can have an * identity definition, so we always just take that one. */ def->identity = newdef->identity; This is still correct for regular inheritance, but not for partitioning. I think for partitioning, this is not reachable because you can't specify identity information when you create a partition(?). So maybe something like if (newdef->identity) { Assert(!is_partioning); /* * Identity is never inherited. The new column can have an * identity definition, so we always just take that one. */ def->identity = newdef->identity; } Thoughts?
Re: make dist using git archive
Hi, On Mon, Jan 22, 2024 at 3:32 PM Peter Eisentraut wrote: > > One of the goals is to make the creation of the distribution tarball > more directly traceable to the git repository. That is why we removed > the "make distprep" step. > > Here I want to take another step in that direction, by changing "make > dist" to directly use "git archive", rather than the custom shell script > it currently runs. > > The simple summary is that it would run > > git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o > postgresql-17devel.tar.gz > > (with appropriate version numbers, of course), and that's the tarball we > would ship. > > There are analogous commands for other compression formats. > > The actual command gets subtly more complicated if you need to run this > in a separate build directory. In my attached patch, the make version > doesn't support vpath at the moment, just so that it's easier to > understand for now. The meson version is a bit hairy. > > I have studied and tested this quite a bit, and I have found that the > archives produced this way are deterministic and reproducible, meaning > for a given commit the result file should always be bit-for-bit identical. > > The exception is that if you use a git version older than 2.38.0, gzip > records the platform in the archive, so you'd get a different output on > Windows vs. macOS vs. "UNIX" (everything else). In git 2.38.0, this was > changed so that everything is recorded as "UNIX" now. This is just > something to keep in mind. This issue is specific to the gzip format, > it does not affect other compression formats. > > Meson has its own distribution building command (meson dist), but opted > against using that. The main problem is that the way they have > implemented it, it is not deterministic in the above sense. (Another > point is of course that we probably want a "make" version for the time > being.) > > But the target name "dist" in meson is reserved for that reason, so I > needed to call the custom target "pgdist". > > I did take one idea from meson: It runs a check before git archive that > the checkout is clean. That way, you avoid mistakes because of > uncommitted changes. This works well in my "make" implementation. In > the meson implementation, I had to find a workaround, because a > custom_target cannot have a dependency on a run_target. As also > mentioned above, the whole meson implementation is a bit ugly. > > Anyway, with the attached patch you can do > > make dist > > or > > meson compile -C build pgdist I played this with meson build on macOS, the packages are generated in source root but not build root, I'm sure if this is by design but I think polluting *working directory* is not good. Another thing I'd like to point out is, should we also introduce *git commit* or maybe *git tag* to package name, something like: git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o postgresql-17devel-`git rev-parse --short HEAD`.tar.gz git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o postgresql-`git describe --tags`.tar.gz > > and it produces the same set of tarballs as before, except it's done > differently. > > The actual build scripts need some fine-tuning, but the general idea is > correct, I think. I think this is a good idea, thanks for working on this. -- Regards Junwang Zhao
Re: Remove unused fields in ReorderBufferTupleBuf
On Mon, Jan 22, 2024 at 4:17 PM Aleksander Alekseev wrote: > > Hi, > > > Hi, this patch was marked in CF as "Needs Review" [1], but there has > > been no activity on this thread for 5+ months. > > > > Do you wish to keep this open, or can you post something to elicit > > more interest in reviews for the latest patch set? Otherwise, if > > nothing happens then the CF entry will be closed ("Returned with > > feedback") at the end of this CF. > > I don't think that closing CF entries only due to inactivity is a good > practice, nor something we typically do. When someone will have spare > time this person will (hopefully) review the code. Agree. IMHO, a patch not picked up by anyone doesn't mean it's the author's problem to mark it "Returned with feedback". And, the timing to mark things this way is not quite right as we are getting close to the PG17 release. The patch may be fixing a problem (like the one that's closed due to inactivity https://commitfest.postgresql.org/46/3503/) or may be a feature or an improvement that no reviewer/committer has had a chance to look at. I think a new label such as "Returned due to lack of interest" or "Returned due to disinterest" (or some better wording) helps reviewers/committers to pick things up. This new label can be attributed to the class of patches for which initially there's some positive feedback on the idea, the patch is being kept latest, but finds no one to get it reviewed for say X number of months. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: partitioning and identity column
On Mon, Jan 22, 2024 at 5:32 PM Peter Eisentraut wrote: > > I found another piece of code that might need updating, or at least the > comment. > > In MergeAttributes(), in the part that merges the specified column > definitions into the inherited ones, it says > > /* > * Identity is never inherited. The new column can have an > * identity definition, so we always just take that one. > */ > def->identity = newdef->identity; > > This is still correct for regular inheritance, but not for partitioning. > I think for partitioning, this is not reachable because you can't > specify identity information when you create a partition(?). So maybe > something like You may specify the information when creating a partition, but it will cause an error. We have tests in identity.sql for the same (look for pitest1_pfail). > > if (newdef->identity) > { > Assert(!is_partioning); > /* > * Identity is never inherited. The new column can have an > * identity definition, so we always just take that one. > */ > def->identity = newdef->identity; > } > > Thoughts? That code block already has Assert(!is_partition) at line 3085. I thought that Assert is enough. There's another thing I found. The file isn't using check_stack_depth() in the function which traverse inheritance hierarchies. This isn't just a problem of the identity related function but most of the functions in that file. Do you think it's worth fixing it? -- Best Wishes, Ashutosh Bapat
Re: Synchronizing slots from primary to standby
On Mon, Jan 22, 2024 at 5:28 PM Masahiko Sawada wrote: > > On Sat, Jan 20, 2024 at 7:44 PM Amit Kapila wrote: > > > > On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar wrote: > > > > > > On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila > > > wrote: > > > > > > > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik > > > > wrote: > > > > > > > > > > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > > > > on the topic of extending replication commands instead of using the > > > > current model where we fetch the required slot information via SQL > > > > using a database connection. I would like to summarize the discussion > > > > and would like to know the thoughts of others on this topic. > > > > > > > > In the current patch, we launch the slotsync worker on physical > > > > standby which connects to the specified database (currently we let > > > > users specify the required dbname in primary_conninfo) on the primary. > > > > It then fetches the required information for failover marked slots > > > > from the primary and also does some primitive checks on the upstream > > > > node via SQL (the additional checks are like whether the upstream node > > > > has a specified physical slot or whether the upstream node is a > > > > primary node or a standby node). To fetch the required information it > > > > uses a libpqwalreciever API which is mostly apt for this purpose as it > > > > supports SQL execution but for this patch, we don't need a replication > > > > connection, so we extend the libpqwalreciever connect API. > > > > > > What sort of extension we have done to 'libpqwalreciever'? Is it > > > something like by default this supports replication connections so we > > > have done an extension to the API so that we can provide an option > > > whether to create a replication connection or a normal connection? > > > > > > > Yeah and in the future there could be more as well. The other function > > added walrcv_get_dbname_from_conninfo doesn't appear to be a problem > > either for now. > > > > > > Now, the concerns related to this could be that users would probably > > > > need to change existing mechanisms/tools to update priamry_conninfo > > I'm concerned about this. In fact, a primary_conninfo value generated > by pg_basebackup does not work with enable_syncslot. > Right, but if we want can't we extend pg_basebackup to do that? It is just that I am not sure that it is a good idea to extend pg_basebackup in the first version. > > > > and one of the alternatives proposed is to have an additional GUC like > > > > slot_sync_dbname. Users won't be able to drop the database this worker > > > > is connected to aka whatever is specified in slot_sync_dbname but as > > > > the user herself sets up the configuration it shouldn't be a big deal. > > > > > > Yeah for this purpose users may use template1 or so which they > > > generally don't plan to drop. > > > > > > > Using template1 has other problems like users won't be able to create > > a new database. See [2] (point number 2.2) > > > > > > > > So in case the user wants to drop that > > > database user needs to turn off the slot syncing option and then it > > > can be done? > > > > > > > Right. > > If the user wants to continue using slot syncing, they need to switch > the database to connect. Which requires modifying primary_conninfo and > reloading the configuration file. Which further leads to restarting > the physical replication. If they use synchronous replication, it > means the application temporarily stops during that. > Yes, that would be an inconvenience but the point is we don't expect this to change often. > > > > > > Then we also discussed whether extending libpqwalreceiver's connect > > > > API is a good idea and whether we need to further extend it in the > > > > future. As far as I can see, slotsync worker's primary requirement is > > > > to execute SQL queries which the current API is sufficient, and don't > > > > see something that needs any drastic change in this API. Note that > > > > tablesync worker that executes SQL also uses these APIs, so we may > > > > need something in the future for either of those. Then finally we need > > > > a slotsync worker to also connect to a database to use SQL and fetch > > > > results. > > > > > > While looking into the patch v64-0002 I could not exactly point out > > > what sort of extensions are there in libpqwalreceiver.c, I just saw > > > one extra API for fetching the dbname from connection info? > > > > > > > Right, the worry was that we may need it in the future. > > Yes. IIUC the slotsync worker uses libpqwalreceiver to establish a > non-replication connection and to execute SQL query. But neither of > them are relevant with replication. > But we are already using libpqwalreceiver to execute SQL queries via tablesync worker. I'm a bit concerned that when we > need to extend the slotsync feature in the future we will end up > extending libpqwalreceiver, even if the new feature is n
Does redundant extension exist During faster COPY in PG16
Hi, I'm learning faster COPY of PG16. I have some questions about extension lock improvement. From ./src/backend/storage/buffer/bufmgr.c:1901 (ExtendBufferedRelShared) ``` /* * Lock relation against concurrent extensions, unless requested not to. * * We use the same extension lock for all forks. That's unnecessarily * restrictive, but currently extensions for forks don't happen often * enough to make it worth locking more granularly. * * Note that another backend might have extended the relation by the time * we get the lock. */ if (!(flags & EB_SKIP_EXTENSION_LOCK)) { LockRelationForExtension(bmr.rel, ExclusiveLock); if (bmr.rel) bmr.smgr = RelationGetSmgr(bmr.rel); } ... smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false); ``` During concurrent extension, when we obtain the extension lock, we use smgrzeroextend() to extend relation files instead of searching fsm through GetPageWithFreeSpace(). Is this approach reasonable? During concurrent extensions, one backend bulk extend successfully, meaning that other backends waiting on extension lock have free pages to use. If all concurrent extend backends extend the relation file after getting the extension lock, the extension lock will be held (extention backends * smgrzeroextend() executing time). Any feedback is welcome. -- Best Regards Kewen He from 阿里邮箱 macOS
Re: Improving EXPLAIN's display of SubPlan nodes
Hi, > EXPLAIN has always been really poor at displaying SubPlan nodes > in expressions: you don't get much more than "(SubPlan N)". > This is mostly because every time I thought about it, I despaired > of trying to represent all the information in a SubPlan accurately. > However, a recent discussion[1] made me realize that we could do > a lot better just by displaying the SubLinkType and the testexpr > (if relevant). So here's a proposed patch. You can see what > it does by examining the regression test changes. > > There's plenty of room to bikeshed about exactly how to display > this stuff, and I'm open to suggestions. > > BTW, I was somewhat depressed to discover that we have exactly > zero regression coverage of the ROWCOMPARE_SUBLINK code paths; > not only was EXPLAIN output not covered, but the planner and > executor too. So I added some simple tests for that. Otherwise > I think existing coverage is enough for this. I reviewed the code and tested the patch on MacOS. It looks good to me. Although something like: ``` + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) + SubPlan 1 (returns $1) ``` ... arguably doesn't give much more information to the user comparing to what we have now: ``` - Filter: (SubPlan 1) - SubPlan 1 ``` ... I believe this is the right step toward more detailed EXPLAINs, and perhaps could be useful for debugging and/or educational purposes. Also the patch improves code coverage. -- Best regards, Aleksander Alekseev
Re: tablecmds.c/MergeAttributes() cleanup
On 06.12.23 09:23, Peter Eisentraut wrote: The (now) second patch is also still of interest to me, but I have since noticed that I think [0] should be fixed first, but to make that fix simpler, I would like the first patch from here. [0]: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org The remaining patch in this series needed a rebase and adjustment. The above precondition still applies. From c4c4462fa58c73fc383c4935c8d2753b7a0b4c9d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 22 Jan 2024 13:23:43 +0100 Subject: [PATCH v5] MergeAttributes: convert pg_attribute back to ColumnDef before comparing MergeAttributes() has a loop to merge multiple inheritance parents into a column column definition. The merged-so-far definition is stored in a ColumnDef node. If we have to merge columns from multiple inheritance parents (if the name matches), then we have to check whether various column properties (type, collation, etc.) match. The current code extracts the pg_attribute value of the currently-considered inheritance parent and compares it against the merged-so-far ColumnDef value. If the currently considered column doesn't match any previously inherited column, we make a new ColumnDef node from the pg_attribute information and add it to the column list. This patch rearranges this so that we create the ColumnDef node first in either case. Then the code that checks the column properties for compatibility compares ColumnDef against ColumnDef (instead of ColumnDef against pg_attribute). This makes the code more symmetric and easier to follow. Also, later in MergeAttributes(), there is a similar but separate loop that merges the new local column definition with the combination of the inheritance parents established in the first loop. That comparison is already ColumnDef-vs-ColumnDef. With this change, both of these can use similar-looking logic. (A future project might be to extract these two sets of code into a common routine that encodes all the knowledge of whether two column definitions are compatible. But this isn't currently straightforward because we want to give different error messages in the two cases.) Furthermore, by avoiding the use of Form_pg_attribute here, we make it easier to make changes in the pg_attribute layout without having to worry about the local needs of tablecmds.c. Because MergeAttributes() is hugely long, it's sometimes hard to know where in the function you are currently looking. To help with that, I also renamed some variables to make it clearer where you are currently looking. The first look is "prev" vs. "new", the second loop is "inh" vs. "new". Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e...@eisentraut.org --- src/backend/commands/tablecmds.c | 198 --- 1 file changed, 102 insertions(+), 96 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 2a56a4357c9..04e674bbdae 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2704,7 +2704,8 @@ MergeAttributes(List *columns, const List *supers, char relpersistence, parent_attno - 1); char *attributeName = NameStr(attribute->attname); int exist_attno; - ColumnDef *def; + ColumnDef *newdef; + ColumnDef *savedef; /* * Ignore dropped columns in the parent. @@ -2713,14 +2714,38 @@ MergeAttributes(List *columns, const List *supers, char relpersistence, continue; /* leave newattmap->attnums entry as zero */ /* -* Does it conflict with some previously inherited column? +* Create new column definition +*/ + newdef = makeColumnDef(attributeName, attribute->atttypid, + attribute->atttypmod, attribute->attcollation); + newdef->storage = attribute->attstorage; + newdef->generated = attribute->attgenerated; + if (CompressionMethodIsValid(attribute->attcompression)) + newdef->compression = + pstrdup(GetCompressionMethodName(attribute->attcompression)); + + /* +* Regular inheritance children are independent enough not to +* inherit identity columns. But partitions are integral part +* of a partitioned table and inherit identity column. +
Re: Add \syncpipeline command to pgbench
On 2024-Jan-22, Anthonin Bonnefoy wrote: > That looks like a bug with how opened pipelines are not caught at the > end of the script processing. startpipeline seems to have similar > related issues. Ah, yeah. Your fix looks necessary on a quick look. I'll review and see about backpatching this. > 0002 adds the \syncpipeline command (original patch with an additional > test case). I can look into this one later, unless Michael wants to. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia)
Re: [17] CREATE SUBSCRIPTION ... SERVER
Hi Jeff, On Tue, Jan 16, 2024 at 7:25 AM Jeff Davis wrote: > > On Fri, 2024-01-12 at 17:17 -0800, Jeff Davis wrote: > > I think 0004 needs a bit more work, so I'm leaving it off for now, > > but > > I'll bring it back in the next patch set. > > Here's the next patch set. 0001 - 0003 are mostly the same with some > improved error messages and some code fixes. I am looking to start > committing 0001 - 0003 soon, as they have received some feedback > already and 0004 isn't required for the earlier patches to be useful. > I am reviewing the patches. Here are some random comments. 0002 adds a prefix "regress_" to almost every object that is created in foreign_data.sql. The commit message doesn't say why it's doing so. But more importantly, the new tests added are lost in all the other changes. It will be good to have prefix adding changes into its own patch explaining the reason. The new tests may stay in 0002. Interestingly the foreign server created in the new tests doesn't have "regress_" prefix. Why? Dummy FDW makes me nervous. The way it's written, it may grow into a full-fledged postgres_fdw and in the process might acquire the same concerns that postgres_fdw has today. But I will study the patches and discussion around it more carefully. I enhanced the postgres_fdw TAP test to use foreign table. Please see the attached patch. It works as expected. Of course a follow-on work will require linking the local table and its replica on the publisher table so that push down will work on replicated tables. But the concept at least works with your changes. Thanks for that. I am not sure we need a full-fledged TAP test for testing subscription. I wouldn't object to it, but TAP tests are heavy. It should be possible to write the same test as a SQL test by creating two databases and switching between them. Do you think it's worth trying that way? > 0004 could use more discussion. The purpose is to split the privileges > of pg_create_subscription into two: pg_create_subscription, and > pg_create_connection. By separating the privileges, it's possible to > allow someone to create/manage subscriptions to a predefined set of > foreign servers (on which they have USAGE privileges) without allowing > them to write an arbitrary connection string. Haven't studied this patch yet. Will continue reviewing the patches. -- Best Wishes, Ashutosh Bapat diff --git a/contrib/postgres_fdw/t/010_subscription.pl b/contrib/postgres_fdw/t/010_subscription.pl index d1d80d0679..3ae2b6da4a 100644 --- a/contrib/postgres_fdw/t/010_subscription.pl +++ b/contrib/postgres_fdw/t/010_subscription.pl @@ -20,7 +20,7 @@ $node_subscriber->start; # Create some preexisting content on publisher $node_publisher->safe_psql('postgres', - "CREATE TABLE tab_ins AS SELECT generate_series(1,1002) AS a"); + "CREATE TABLE tab_ins AS SELECT a, a + 1 as b FROM generate_series(1,1002) AS a"); # Replicate the changes without columns $node_publisher->safe_psql('postgres', "CREATE TABLE tab_no_col()"); @@ -29,7 +29,7 @@ $node_publisher->safe_psql('postgres', # Setup structure on subscriber $node_subscriber->safe_psql('postgres', "CREATE EXTENSION postgres_fdw"); -$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_ins (a int)"); +$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_ins (a int, b int)"); # Setup logical replication my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres'; @@ -45,6 +45,9 @@ $node_subscriber->safe_psql('postgres', "CREATE USER MAPPING FOR PUBLIC SERVER tap_server" ); +$node_subscriber->safe_psql('postgres', + "CREATE FOREIGN TABLE f_tab_ins (a int, b int) SERVER tap_server OPTIONS(table_name 'tab_ins')" +); $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub SERVER tap_server PUBLICATION tap_pub WITH (password_required=false)" ); @@ -53,16 +56,16 @@ $node_subscriber->safe_psql('postgres', $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub'); my $result = - $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_ins"); + $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM (SELECT f.b = l.b as match FROM tab_ins l, f_tab_ins f WHERE l.a = f.a) WHERE match"); is($result, qq(1002), 'check initial data was copied to subscriber'); $node_publisher->safe_psql('postgres', - "INSERT INTO tab_ins SELECT generate_series(1,50)"); + "INSERT INTO tab_ins SELECT a, a + 1 FROM generate_series(1003,1050) a"); $node_publisher->wait_for_catchup('tap_sub'); $result = - $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM tab_ins"); -is($result, qq(1052), 'check initial data was copied to subscriber'); + $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM (SELECT f.b = l.b as match FROM tab_ins l, f_tab_ins f WHERE l.a = f.a) WHERE match"); +is($result, qq(1050), 'check initial data was copied to subscriber'); done_testing();
Re: Move walreceiver state assignment (to WALRCV_STREAMING) in WalReceiverMain()
Hi, On Mon, Jan 22, 2024 at 04:14:46PM +1100, Peter Smith wrote: > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there were CFbot test failures last time it was run [2]. Please have a > look and post an updated version if necessary. Thanks for the warning! I don't think the current code is causing any issues so given the feedback I've had so far I think I'll withdraw the patch. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [DOC] Add detail regarding resource consumption wrt max_connections
On Fri, 2024-01-19 at 17:37 -0500, reid.thomp...@crunchydata.com wrote: > On Sat, 2024-01-13 at 10:31 -0700, Roberto Mello wrote: > > > > I can add a suggestion for the user to consider increasing > > shared_buffers in accordance with higher max_connections, but it > > would be better if there was a "rule of thumb" guideline to go > > along. I'm open to suggestions. > > > > I can revise with a similar warning in max_prepared_xacts as well. > > > > Sincerely, > > > > Roberto > > Can a "close enough" rule of thumb be calculated from: > postgresql.conf -> log_min_messages = debug3 > > start postgresql with varying max_connections to get > CreateSharedMemoryAndSemaphores() sizes to generate a rough equation > or maybe it would be sufficient to advise to set log_min_messages = debug3 on a test DB and start/stop it with varying values of max_connections and look at the differing values in DEBUG: invoking IpcMemoryCreate(size=...) log messages for themselves.
Re: Remove unused fields in ReorderBufferTupleBuf
Hi, > > I played a bit more with the patch. There was an idea to make > > ReorderBufferTupleBufData an opaque structure known only within > > reorderbyffer.c but it turned out that replication/logical/decode.c > > accesses it directly so I abandoned that idea for now. > > > > > Alternatively we could convert ReorderBufferTupleBufData macro to an > > > inlined function. At least it will add some type safety. > > > > Here is v3 that implements it too as a separate patch. > > > > But why didn't you pursue your idea of getting rid of the wrapper > structure ReorderBufferTupleBuf which after this patch will have just > one member? I think there could be hassles in backpatching bug-fixes > in some cases but in the longer run it would make the code look clean. Indeed. In fact turned out that I suggested the same above but apparently forgot: > On top of that IMO it doesn't make much sense to keep a one-field > wrapper structure. Perhaps we should get rid of it entirely and just > use HeapTupleData instead. After actually trying the refactoring I agree that the code becomes cleaner and it's going to be beneficial in the long run. Here is the patch. -- Best regards, Aleksander Alekseev v4-0001-Remove-ReorderBufferTupleBuf-structure.patch Description: Binary data
Re: Improve WALRead() to suck data directly from WAL buffers when possible
Hi Bharath, Thanks for working on this. It seems like a nice improvement to have. Here are some comments on 0001 patch. 1- xlog.c + /* + * Fast paths for the following reasons: 1) WAL buffers aren't in use when + * server is in recovery. 2) WAL is inserted into WAL buffers on current + * server's insertion TLI. 3) Invalid starting WAL location. + */ Shouldn't the comment be something like "2) WAL is *not* inserted into WAL buffers on current server's insertion TLI" since the condition to break is tli != GetWALInsertionTimeLine() 2- + /* + * WAL being read doesn't yet exist i.e. past the current insert position. + */ + if ((startptr + count) > reservedUpto) + return ntotal; This question may not even make sense but I wonder whether we can read from startptr only to reservedUpto in case of startptr+count exceeds reservedUpto? 3- + /* Wait for any in-progress WAL insertions to WAL buffers to finish. */ + if ((startptr + count) > LogwrtResult.Write && + (startptr + count) <= reservedUpto) + WaitXLogInsertionsToFinish(startptr + count); Do we need to check if (startptr + count) <= reservedUpto as we already verified this condition a few lines above? 4- + Assert(nread > 0); + memcpy(dst, data, nread); + + /* + * Make sure we don't read xlblocks down below before the page + * contents up above. + */ + pg_read_barrier(); + + /* Recheck if the read page still exists in WAL buffers. */ + endptr = pg_atomic_read_u64(&XLogCtl->xlblocks[idx]); + + /* Return if the page got initalized while we were reading it. */ + if (expectedEndPtr != endptr) + break; + + /* + * Typically, we must not read a WAL buffer page that just got + * initialized. Because we waited enough for the in-progress WAL + * insertions to finish above. However, there can exist a slight + * window after the above wait finishes in which the read buffer page + * can get replaced especially under high WAL generation rates. After + * all, we are reading from WAL buffers without any locks here. So, + * let's not count such a page in. + */ + phdr = (XLogPageHeader) page; + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC && + phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) && + phdr->xlp_tli == tli)) + break; I see that you recheck if the page still exists and so at the end. What would you think about memcpy'ing only after being sure that we will need and use the recently read data? If we break the loop during the recheck, we simply discard the data read in the latest attempt. I guess that this may not be a big deal but the data would be unnecessarily copied into the destination in such a case. 5- xlogreader.c + nread = XLogReadFromBuffers(startptr, tli, count, buf); + + if (nread > 0) + { + /* + * Check if its a full read, short read or no read from WAL buffers. + * For short read or no read, continue to read the remaining bytes + * from WAL file. + * + * XXX: It might be worth to expose WAL buffer read stats. + */ + if (nread == count) /* full read */ + return true; + else if (nread < count) /* short read */ + { + buf += nread; + startptr += nread; + count -= nread; + } Typo in the comment. Should be like "Check if *it's* a full read, short read or no read from WAL buffers." Also I don't think XLogReadFromBuffers() returns anything less than 0 and more than count. Is verifying nread > 0 necessary? I think if nread does not equal to count, we can simply assume that it's a short read. (or no read at all in case nread is 0 which we don't need to handle specifically) 6- + /* + * We determined how much of the page can be validly read as 'count', read + * that much only, not the entire page. Since WALRead() can read the page + * from WAL buffers, in which case, the page is not guaranteed to be + * zero-padded up to the page boundary because of the concurrent + * insertions. + */ I'm not sure about pasting this into the most places we call WalRead(). Wouldn't it be better if we mention this somewhere around WALRead() only once? Best, -- Melih Mutlu Microsoft
Re: Remove unused fields in ReorderBufferTupleBuf
Hi, > > But why didn't you pursue your idea of getting rid of the wrapper > > structure ReorderBufferTupleBuf which after this patch will have just > > one member? I think there could be hassles in backpatching bug-fixes > > in some cases but in the longer run it would make the code look clean. > > Indeed. In fact turned out that I suggested the same above but > apparently forgot: > > > On top of that IMO it doesn't make much sense to keep a one-field > > wrapper structure. Perhaps we should get rid of it entirely and just > > use HeapTupleData instead. > > After actually trying the refactoring I agree that the code becomes > cleaner and it's going to be beneficial in the long run. Here is the > patch. I did a mistake in v4: ``` -alloc_len = tuple_len + SizeofHeapTupleHeader; +alloc_len = tuple_len + HEAPTUPLESIZE; ``` Here is the corrected patch. -- Best regards, Aleksander Alekseev v5-0001-Remove-ReorderBufferTupleBuf-structure.patch Description: Binary data
Re: UUID v7
Hi, > But now (after big timeseries project with multiple time zones and DST > problems) I think differently. > Even though timestamp and timestamptz are practically the same, timestamptz > should be used to store the time in UTC. > Choosing timestamp is more likely to lead to problems and misunderstandings > than timestamptz. As somebody who contributed TZ support to TimescaleDB I'm more or less aware about the pros and cons of Timestamp and TimestampTz :) Engineering is all about compromises. I can imagine a project where it makes sense to use only TimestampTz for the entire database, and the opposite - when it's better to use only UTC and Timestamp. In this particular case I was merely concerned that the particular choice could be confusing for the users but I think I changed my mind by now, see below. >> Here's v12. Changes: >> 1. Documentation improvements >> 2. Code comments >> 3. Better commit message and reviews list > > > Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and > functions work well. I especially like that fact that we keep > uuid_extract_time(..) here – this is a great thing to have for time-based > partitioning, and in many cases we will be able to decide not to have a > creation column timestamp (e.g., "created_at") at all, saving 8 bytes. > > The docs and comments look great too. > > Overall, the patch looks mature enough. It would be great to have it in pg17. > Yes, the RFC is not fully finalized yet, but it's very close. And many > libraries are already including implementation of UUIDv7 – here are some > examples: > > - https://www.npmjs.com/package/uuidv7 > - https://crates.io/crates/uuidv7 > - https://github.com/google/uuid/pull/139 Thanks! After playing with v12 I'm inclined to agree that it's RfC. I only have a couple of silly nitpicks: - It could make sense to decompose the C implementation of uuidv7() in two functions, for readability. - It could make sense to get rid of curly braces in SQL tests when calling uuid_extract_ver() and uuid_extract_ver(), for consistency. I'm not going to insist on these changes though and prefer leaving it to the author and the committer to decide. Also I take back what I said above about using Timestamp instead of TimestampTz. I forgot that Timestamps are implicitly casted to TimestampTz's, so users preferring Timestamps can do this: ``` =# select uuidv7('2024-01-22 12:34:56' :: timestamp); uuidv7 -- 018d3085-de00-77c1-9e7b-7b04ddb9ebb9 ``` Cfbot also seems to be happy with the patch so I'm changing the CF entry status to RfC. -- Best regards, Aleksander Alekseev
psql: Allow editing query results with \gedit
Assuming a SELECT statement reading from a single table, it is quite an effort to transform that statement to an UPDATE statement on that table, perhaps to fix a typo that the user has spotted in the query result. First, the general syntax is not the same with the order of syntax elements changed. Then the row in question needs to be pinned down by the primary key, requiring cut-and-paste of the PK columns. Furthermore, the value to be updated needs to be put into the command, with proper quoting. If the original value spans multiple line, copy-and-pasting it for editing is especially tedious. Suppose the following query where we spot a typo in the 2nd message: =# select id, language, message from messages where language = 'en'; id | language | message 1 | en | Good morning 2 | en | Hello warld The query needs to be transformed into this update: =# update messages set message = 'Hello world' where id = 2; This patch automates the tedious parts by opening the query result in a editor in JSON format, where the user can edit the data. On closing the editor, the JSON data is read back, and the differences are sent as UPDATE commands. New rows are INSERTed, and deleted rows are DELETEd. =# select id, language, message from messages where language = 'en' \gedit An editor opens: [ { "id": 1, "language": "en", "message": "Good morning" }, { "id": 2, "language": "en", "message": "Hello warld" } ] Let's fix the typo and save the file: [ { "id": 1, "language": "en", "message": "Good morning" }, { "id": 2, "language": "en", "message": "Hello world" } ] UPDATE messages SET message = 'Hello world' WHERE id = '2'; UPDATE 1 In this example, typing "WHERE id = 2" would not be too hard, but the primary key might be a composite key, with complex non-numeric values. This is supported as well. If expanded mode (\x) is enabled, \gedit will use the expanded JSON format, best suitable for long values. This patch requires the "psql JSON output format" patch. Christoph >From be3fef72af0adac7d3a6962f8cc78e167785a702 Mon Sep 17 00:00:00 2001 From: Christoph Berg Date: Mon, 22 Jan 2024 14:40:33 +0100 Subject: [PATCH] psql: Allow editing query results with \gedit Assuming a SELECT statement reading from a single table, it is quite an effort to transform that statement to an UPDATE statement on that table, perhaps to fix a typo that the user has spotted in the query result. First, the general syntax is not the same with the order of syntax elements changed. Then the row in question needs to be pinned down by the primary key, requiring cut-and-paste of the PK columns. Furthermore, the value to be updated needs to be put into the command, with proper quoting. If the original value spans multiple line, copy-and-pasting it for editing is especially tedious. Suppose the following query where we spot a typo in the 2nd message: =# select id, language, message from messages where language = 'en'; id | language | message 1 | en | Good morning 2 | en | Hello warld The query needs to be transformed into this update: =# update messages set message = 'Hello world' where id = 2; This patch automates the tedious parts by opening the query result in a editor in JSON format, where the user can edit the data. On closing the editor, the JSON data is read back, and the differences are sent as UPDATE commands. New rows are INSERTed, and deleted rows are DELETEd. =# select id, language, message from messages where language = 'en' \gedit An editor opens: [ { "id": 1, "language": "en", "message": "Good morning" }, { "id": 2, "language": "en", "message": "Hello warld" } ] Let's fix the typo and save the file: [ { "id": 1, "language": "en", "message": "Good morning" }, { "id": 2, "language": "en", "message": "Hello world" } ] UPDATE messages SET message = 'Hello world' WHERE id = '2'; UPDATE 1 In this example, typing "WHERE id = 2" would not be too hard, but the primary key might be a composite key, with complex non-numeric values. This is supported as well. If expanded mode (\x) is enabled, \gedit will use the expanded JSON format, best suitable for long values: =# \x =# select id, language, message from messages where language = 'en' \gedit [{ "id": 1, "language": "en", "message": "Good morning" },{ "id": 2, "language": "en", "message": "Hello world" }] \gedit requires the PK (or any UNIQUE key) to be part of the select statement: =# select language, message from messages \gedit \gedit: no key of table "messages" is contained in the returned query columns. Select more columns or manually specify a key using \gedit (key=...) If the user knows that their operation is sound even without a PK/UNIQUE key, they can force a (set of) column(s) to be used: =# select language, message from messages \gedit (key=message) [ { "language": "en", "message": "Good morning" }, { "language": "en_US", "message": "Hello world" } ] UPDATE messages SET language = 'en_US' WHERE message = 'Hello world'; UPDATE 1 -
Re: Synchronizing slots from primary to standby
On Mon, Jan 22, 2024 at 9:26 PM Amit Kapila wrote: > > On Mon, Jan 22, 2024 at 5:28 PM Masahiko Sawada wrote: > > > > On Sat, Jan 20, 2024 at 7:44 PM Amit Kapila wrote: > > > > > > On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar > > > wrote: > > > > > > > > On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik > > > > > wrote: > > > > > > > > > > > > > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > > > > > on the topic of extending replication commands instead of using the > > > > > current model where we fetch the required slot information via SQL > > > > > using a database connection. I would like to summarize the discussion > > > > > and would like to know the thoughts of others on this topic. > > > > > > > > > > In the current patch, we launch the slotsync worker on physical > > > > > standby which connects to the specified database (currently we let > > > > > users specify the required dbname in primary_conninfo) on the primary. > > > > > It then fetches the required information for failover marked slots > > > > > from the primary and also does some primitive checks on the upstream > > > > > node via SQL (the additional checks are like whether the upstream node > > > > > has a specified physical slot or whether the upstream node is a > > > > > primary node or a standby node). To fetch the required information it > > > > > uses a libpqwalreciever API which is mostly apt for this purpose as it > > > > > supports SQL execution but for this patch, we don't need a replication > > > > > connection, so we extend the libpqwalreciever connect API. > > > > > > > > What sort of extension we have done to 'libpqwalreciever'? Is it > > > > something like by default this supports replication connections so we > > > > have done an extension to the API so that we can provide an option > > > > whether to create a replication connection or a normal connection? > > > > > > > > > > Yeah and in the future there could be more as well. The other function > > > added walrcv_get_dbname_from_conninfo doesn't appear to be a problem > > > either for now. > > > > > > > > Now, the concerns related to this could be that users would probably > > > > > need to change existing mechanisms/tools to update priamry_conninfo > > > > I'm concerned about this. In fact, a primary_conninfo value generated > > by pg_basebackup does not work with enable_syncslot. > > > > Right, but if we want can't we extend pg_basebackup to do that? It is > just that I am not sure that it is a good idea to extend pg_basebackup > in the first version. Okay. > > > > > > and one of the alternatives proposed is to have an additional GUC like > > > > > slot_sync_dbname. Users won't be able to drop the database this worker > > > > > is connected to aka whatever is specified in slot_sync_dbname but as > > > > > the user herself sets up the configuration it shouldn't be a big deal. > > > > > > > > Yeah for this purpose users may use template1 or so which they > > > > generally don't plan to drop. > > > > > > > > > > Using template1 has other problems like users won't be able to create > > > a new database. See [2] (point number 2.2) > > > > > > > > > > > So in case the user wants to drop that > > > > database user needs to turn off the slot syncing option and then it > > > > can be done? > > > > > > > > > > Right. > > > > If the user wants to continue using slot syncing, they need to switch > > the database to connect. Which requires modifying primary_conninfo and > > reloading the configuration file. Which further leads to restarting > > the physical replication. If they use synchronous replication, it > > means the application temporarily stops during that. > > > > Yes, that would be an inconvenience but the point is we don't expect > this to change often. > > > > > > > > > Then we also discussed whether extending libpqwalreceiver's connect > > > > > API is a good idea and whether we need to further extend it in the > > > > > future. As far as I can see, slotsync worker's primary requirement is > > > > > to execute SQL queries which the current API is sufficient, and don't > > > > > see something that needs any drastic change in this API. Note that > > > > > tablesync worker that executes SQL also uses these APIs, so we may > > > > > need something in the future for either of those. Then finally we need > > > > > a slotsync worker to also connect to a database to use SQL and fetch > > > > > results. > > > > > > > > While looking into the patch v64-0002 I could not exactly point out > > > > what sort of extensions are there in libpqwalreceiver.c, I just saw > > > > one extra API for fetching the dbname from connection info? > > > > > > > > > > Right, the worry was that we may need it in the future. > > > > Yes. IIUC the slotsync worker uses libpqwalreceiver to establish a > > non-replication connection and to execute SQL query. But neither of > > them are releva
Re: Add code indentation check to cirrus-ci (was Re: Add BF member koel-like indentation checks to SanityCheck CI)
On 2024-01-21 Su 22:19, Peter Smith wrote: 2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems like there was some CFbot test failure last time it was run [2]. Please have a look and post an updated version if necessary. I don't think there's a consensus that we want this. It should probably be returned with feedback. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: psql JSON output format
Re: Laurenz Albe > > But I do think it has positive > > value. If we produce output that could be ingested back into PG later > > with the right tool, that leaves the door open for someone to build > > the tool later even if we don't have it today. If we produce output > > that loses information, no tool built later can make up for the loss. > I am all for losing as little information as possible, but I think > that this discussion is going off on a tangent. After all, we are not > talking about a data export tool here, we are talking about psql. I've just posted the other patch where I need the JSON format: https://www.postgresql.org/message-id/flat/Za6EfXeewwLRS_fs%40msg.df7cb.de There, I need to be able to read back the query output into psql, while at the same time being human-readable so the user can sanely edit the data in an editor. The default "aligned" format is only human-readable, while CSV is mostly only machine-readable. JSON is the best option between the two, I think. What I did now in v3 of this patch is to print boolean and numeric values (ints, floats, numeric) without quotes, while adding the quotes back to json. This solves the NULL vs 'null'::json problem. > I don't see anybody complain that float8 values lose precision in > the default aligned format, or that empty strings and NULL values > look the same in aligned format. Why do the goalposts move for the > JSON output format? I don't think psql output should be considered > a form of backup. Fwiw, not quoting numbers in JSON won't have any of these problems if the JSON reader just passes the strings read through. (Which PG's JSON parser does.) > Can we get consensus that SQL NULL columns should be omitted from the > output, and the rest left as it currently is? I think that would be an interesting option for a JSON export format. The psql JSON format is more for human inspection, where omitting the columns might create confusion. (We could make it a pset parameter of the format, but I think the default should be to show NULL columns.) Christoph >From 2b575846fee609237256fe8eaf38fd45005fe8da Mon Sep 17 00:00:00 2001 From: Christoph Berg Date: Fri, 8 Sep 2023 15:59:29 +0200 Subject: [PATCH] Add JSON output format to psql Query results are formatted as an array of JSON objects. In non-expanded mode, one object per line is printed, and in expanded mode, one key-value pair. Use `\pset format json` to enable, or run `psql --json` or `psql -J`. NULL values are printed as `null`, independently from the psql null setting. Numeric values are printed in plain, booleans as true/false, and other values are printed as quoted strings. --- doc/src/sgml/ref/psql-ref.sgml | 26 - src/bin/psql/command.c | 6 +- src/bin/psql/help.c| 1 + src/bin/psql/startup.c | 6 +- src/bin/psql/tab-complete.c| 2 +- src/fe_utils/print.c | 182 - src/include/fe_utils/print.h | 6 +- src/test/regress/expected/psql.out | 79 + src/test/regress/sql/psql.sql | 28 + 9 files changed, 325 insertions(+), 11 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index cc7d797159..f1f7eda082 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -274,6 +274,17 @@ EOF + + -J + --json + + + Switches to JSON output mode. This is + equivalent to \pset format json. + + + + -l --list @@ -2956,6 +2967,7 @@ lo_import 152801 asciidoc, csv, html, + json, latex, latex-longtable, troff-ms, unaligned, or wrapped. @@ -2972,7 +2984,7 @@ lo_import 152801 in by other programs, for example, tab-separated or comma-separated format. However, the field separator character is not treated specially if it appears in a column's value; - so CSV format may be better suited for such + the CSV or JSON formats may be better suited for such purposes. @@ -2997,6 +3009,18 @@ lo_import 152801 \pset csv_fieldsep. + json format + + JSON format + in psql + + writes output in JSON format, described in + https://www.ietf.org/rfc/rfc4627.txt";>RFC 4627. + The result is an array of JSON objects. In non-expanded mode, one + object per line is printed, while in expanded mode, each key-value + pair is a separate line. + + wrapped format is like aligned but wraps wide data values across lines to make the output fit in the target column width. The target width is determined as described under diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 5c906e4806..824900819b 100644 --- a/src/bin/psql/comm
Re: Support TZ format code in to_timestamp()
Hi, > > Hi, this patch was marked in CF as "Needs Review" [1], but there has > > been no activity on this thread for 7+ months. > > If nothing more is planned for this thread then it will be closed > > ("Returned with feedback") at the end of this CF. > > I still think it would be a good idea, but I can't deny the lack > of other interest in it. Unless someone steps up to review, > let's close it. I agree that it would be a good idea, and again I would like to condemn the approach "since no one reviews it we are going to reject it". A friendly reminder like "hey, this patch was waiting long enough, maybe someone could take a look" would be more appropriate IMO. I remember during previous commitfests some CF managers created a list of patches that could use more attention. That was useful. I will review the patch, but probably only tomorrow. -- Best regards, Aleksander Alekseev
Re: the s_lock_stuck on perform_spin_delay
On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote: > I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a > quickdie, however this is bad not only because it doesn't work on > Windows, but also it has too poor performance even it impacts on > USE_ASSERT_CHECKING build only. In v8, I introduced a new global > variable quickDieInProgress to handle this. OK, I like the split between 0001 and 0002. I still think 0001 has cosmetic problems, but if some committer wants to take it forward, they can decide what to do about that; you and I going back and forth doesn't seem like the right approach to sorting that out. Whether or not 0002 is adopted might affect what we do about the cosmetics in 0001, too. 0003 seems ... unfortunate. It seems like an admission that 0001 is wrong. Surely it *isn't* right to ignore the spinlock restrictions in quickdie() in general. For example, we could self-deadlock if we try to acquire a spinlock we already hold. If the problem here is merely the call in errstart() then maybe we need to rethink that particular call. If it goes any deeper than that, maybe we've got actual bugs we need to fix. +* It's likely to check the BlockSig to know if it is doing a quickdie +* with sigismember, but it is too expensive in test, so introduce +* quickDieInProgress to avoid that. This isn't very good English -- I realize that can sometimes be hard -- but also -- I don't think it likely that a future hacker would wonder why this isn't done that way. A static variable is normal for PostgreSQL; checking the signal mask would be a completely novel approach. So I think this comment is missing the mark topically. If this patch is right at all, the comment here should focus on why disabling these checks in quickdie() is necessary and appropriate, not why it's coded to match everything else in the system. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql: Allow editing query results with \gedit
Hi po 22. 1. 2024 v 16:06 odesílatel Christoph Berg napsal: > Assuming a SELECT statement reading from a single table, it is quite an > effort to transform that statement to an UPDATE statement on that table, > perhaps to fix a typo that the user has spotted in the query result. > > First, the general syntax is not the same with the order of syntax > elements changed. Then the row in question needs to be pinned down by > the primary key, requiring cut-and-paste of the PK columns. Furthermore, > the value to be updated needs to be put into the command, with proper > quoting. If the original value spans multiple line, copy-and-pasting it > for editing is especially tedious. > > Suppose the following query where we spot a typo in the 2nd message: > > =# select id, language, message from messages where language = 'en'; > id | language | message > 1 | en | Good morning > 2 | en | Hello warld > > The query needs to be transformed into this update: > > =# update messages set message = 'Hello world' where id = 2; > > This patch automates the tedious parts by opening the query result in a > editor in JSON format, where the user can edit the data. On closing the > editor, the JSON data is read back, and the differences are sent as > UPDATE commands. New rows are INSERTed, and deleted rows are DELETEd. > > =# select id, language, message from messages where language = 'en' \gedit > > An editor opens: > [ > { "id": 1, "language": "en", "message": "Good morning" }, > { "id": 2, "language": "en", "message": "Hello warld" } > ] > > Let's fix the typo and save the file: > [ > { "id": 1, "language": "en", "message": "Good morning" }, > { "id": 2, "language": "en", "message": "Hello world" } > ] > UPDATE messages SET message = 'Hello world' WHERE id = '2'; > UPDATE 1 > > In this example, typing "WHERE id = 2" would not be too hard, but the > primary key might be a composite key, with complex non-numeric values. > This is supported as well. > > If expanded mode (\x) is enabled, \gedit will use the expanded JSON > format, best suitable for long values. > > > This patch requires the "psql JSON output format" patch. > Introduction of \gedit is interesting idea, but in this form it looks too magic a) why the data are in JSON format, that is not native for psql (minimally now) b) the implicit transformation to UPDATEs and the next evaluation can be pretty dangerous. The concept of proposed feature is interesting, but the name \gedit is too generic, maybe too less descriptive for this purpose Maybe \geditupdates can be better - but still it can be dangerous and slow (without limits) In the end I am not sure if I like it or dislike it. Looks dangerous. I can imagine possible damage when some people will see vi first time and will try to finish vi, but in this command, it will be transformed to executed UPDATEs. More generating UPDATEs without knowledge of table structure (knowledge of PK) can be issue (and possibly dangerous too), and you cannot to recognize PK from result of SELECT (Everywhere PK is not "id" and it is not one column). Regards Pavel > Christoph >
Re: remaining sql/json patches
On Mon, Jan 22, 2024 at 10:28 PM Amit Langote wrote: > > > based on v35. > > Now I only applied from 0001 to 0007. > > For {DEFAULT expression ON EMPTY} | {DEFAULT expression ON ERROR} > > restrict DEFAULT expression be either Const node or FuncExpr node. > > so these 3 SQL/JSON functions can be used in the btree expression index. > > I'm not really excited about adding these restrictions into the > transformJsonFuncExpr() path. Index or any other code that wants to > put restrictions already have those in place, no need to add them > here. Moreover, by adding these restrictions, we might end up > preventing users from doing useful things with this like specify > column references. If there are semantic issues with allowing that, > we should discuss them. > after applying v36. The following index creation and query operation works. I am not 100% sure about these cases. just want confirmation, sorry for bothering you drop table t; create table t(a jsonb, b int); insert into t select '{"hello":11}',1; insert into t select '{"hello":12}',2; CREATE INDEX t_idx2 ON t (JSON_query(a, '$.hello1' RETURNING int default b + random() on error)); CREATE INDEX t_idx3 ON t (JSON_query(a, '$.hello1' RETURNING int default random()::int on error)); SELECT JSON_query(a, '$.hello1' RETURNING int default ret_setint() on error) from t; SELECT JSON_query(a, '$.hello1' RETURNING int default sum(b) over() on error) from t; SELECT JSON_query(a, '$.hello1' RETURNING int default sum(b) on error) from t group by a; but the following cases will fail related to index and default expression. create table zz(a int, b int); CREATE INDEX zz_idx1 ON zz ( (b + random()::int)); create table (a int, b int default ret_setint()); create table (a int, b int default sum(b) over());
Re: Support TZ format code in to_timestamp()
> On 22 Jan 2024, at 03:10, Tom Lane wrote: > I still think it would be a good idea, but I can't deny the lack > of other interest in it. Unless someone steps up to review, > let's close it. Since I had this on my (ever-growing) TODO I re-prioritized today and took a look at it since I think it's something we should support. Nothing really sticks out and I was unable to poke any holes so I don't have too much more to offer than a LGTM. + while (len > 0) + { + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, + zoneabbrevtbl->numabbrevs); My immediate reaction was that we should stop at prefix lengths of two since I could only think of abbreviations of two or more. Googling and reading found that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if it's worth mentioning that in the comment to help other readers who aren't neck deep in timezones? + /* FALL THRU */ Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall through" a few cases up in the same switch. While we are quite inconsistent across the tree, consistency within a file is preferrable (regardless of which). -- Daniel Gustafsson
Re: Add tuples_skipped to pg_stat_progress_copy
On 2024-01-17 14:47, Masahiko Sawada wrote: On Wed, Jan 17, 2024 at 2:22 PM torikoshia wrote: Hi, 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to skip malformed data, but there is no way to watch the number of skipped rows during COPY. Attached patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuples because source data is malformed. If SAVE_ERROR_TO is not specified, this column remains zero. The advantage would be that users can quickly notice and stop COPYing when there is a larger amount of skipped data than expected, for example. As described in commit log, it is expected to add more choices for SAVE_ERROR_TO like 'log' and using such options may enable us to know the number of skipped tuples during COPY, but exposed in pg_stat_progress_copy would be easier to monitor. What do you think? +1 The patch is pretty simple. Here is a comment: + (if SAVE_ERROR_TO is specified, otherwise zero). + + To be precise, this counter only advances when a value other than 'ERROR' is specified to SAVE_ERROR_TO option. Thanks for your comment and review! Updated the patch according to your comment and option name change by b725b7eec. BTW, based on this patch, I think we can add another option which specifies the maximum tolerable number of malformed rows. I remember this was discussed in [1], and feel it would be useful when loading 'dirty' data but there is a limit to how dirty it can be. Attached 0002 is WIP patch for this(I haven't added doc yet). This may be better discussed in another thread, but any comments(e.g. necessity of this option, option name) are welcome. [1] https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us -- Regards, -- Atsushi Torikoshi NTT DATA Group CorporationFrom 571ada768bdb68a31f295cbcb28f4348f253989d Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Mon, 22 Jan 2024 23:57:24 +0900 Subject: [PATCH v2 1/2] Add tuples_skipped to pg_stat_progress_copy 132de9968840c enabled COPY to skip malformed data, but there is no way to watch the number of skipped rows during COPY. This patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuple because source data is malformed. This column only advances when a value other than stop is specified to ON_ERROR. Needs catalog bump. --- doc/src/sgml/monitoring.sgml | 11 +++ src/backend/catalog/system_views.sql | 3 ++- src/backend/commands/copyfrom.c | 5 + src/include/commands/progress.h | 1 + src/test/regress/expected/rules.out | 3 ++- 5 files changed, 21 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 6e74138a69..cfc13b3580 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5780,6 +5780,17 @@ FROM pg_stat_get_backend_idset() AS backendid; WHERE clause of the COPY command. + + + + tuples_skipped bigint + + + Number of tuples skipped because they contain malformed data. + This counter only advances when a value other than + stop is specified to ON_ERROR. + + diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..6288270e2b 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1318,7 +1318,8 @@ CREATE VIEW pg_stat_progress_copy AS S.param1 AS bytes_processed, S.param2 AS bytes_total, S.param3 AS tuples_processed, -S.param4 AS tuples_excluded +S.param4 AS tuples_excluded, +S.param7 AS tuples_skipped FROM pg_stat_get_progress_info('COPY') AS S LEFT JOIN pg_database D ON S.datid = D.oid; diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 173a736ad5..8ab3777664 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -650,6 +650,7 @@ CopyFrom(CopyFromState cstate) CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */ int64 processed = 0; int64 excluded = 0; + int64 skipped = 0; bool has_before_insert_row_trig; bool has_instead_insert_row_trig; bool leafpart_use_multi_insert = false; @@ -1012,6 +1013,10 @@ CopyFrom(CopyFromState cstate) */ cstate->escontext->error_occurred = false; + /* Report that this tuple was skipped by the ON_ERROR clause */ + pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, + ++skipped); + continue; } diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h index a458c8c50a..73afa77a9c 100644 --- a/src/include/commands/progress.h +++ b/src/include/commands/progress.h @@ -142,6 +142,7 @@ #define PROGRESS_COPY_TUPLES_EXCLUDED 3 #define PROGRESS_COPY_COMMAND 4 #define PROGRESS_COPY_TYPE 5 +#define PROGRESS_COPY_TUPLES_SKIPPED 6 /*
Re: Adding facility for injection points (or probe points?) for more advanced tests
On 22/01/2024 06:38, Michael Paquier wrote: 0001~0004 have been now applied, and I'm marking the CF entry as committed. Woo-hoo! I wrote the attached patch to enable injection points in the Cirrus CI config, to run the injection tests I wrote for a GIN bug today [1]. But that led to a crash in the asan-enabled build [2]. I didn't investigate it yet. [1] https://www.postgresql.org/message-id/d8f0b068-0e6e-4b2c-8932-62507eb7e1c6%40iki.fi [2] https://cirrus-ci.com/task/5242888636858368 -- Heikki Linnakangas Neon (https://neon.tech) From d641364bccec051761a9360453162929ef77ea3d Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Mon, 22 Jan 2024 17:44:20 +0200 Subject: [PATCH 1/1] Enable injection points in cirrus CI --- .cirrus.tasks.yml | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..0eacbbf3721 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -172,7 +172,7 @@ task: su postgres <<-EOF meson setup \ --buildtype=debug \ --Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ +-Dcassert=true -Dinjection_points=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \ build @@ -327,7 +327,7 @@ task: configure_script: | su postgres <<-EOF ./configure \ ---enable-cassert --enable-debug --enable-tap-tests \ +--enable-cassert --enable-injection-points --enable-debug --enable-tap-tests \ --enable-nls \ --with-segsize-blocks=6 \ \ @@ -357,7 +357,7 @@ task: su postgres <<-EOF meson setup \ --buildtype=debug \ --Dcassert=true \ +-Dcassert=true -Dinjection_points=true \ ${LINUX_MESON_FEATURES} \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build @@ -370,7 +370,7 @@ task: export CC='ccache gcc -m32' meson setup \ --buildtype=debug \ --Dcassert=true \ +-Dcassert=true -Dinjection_points=true \ ${LINUX_MESON_FEATURES} \ -Dllvm=disabled \ --pkg-config-path /usr/lib/i386-linux-gnu/pkgconfig/ \ @@ -482,7 +482,7 @@ task: --buildtype=debug \ -Dextra_include_dirs=/opt/local/include \ -Dextra_lib_dirs=/opt/local/lib \ - -Dcassert=true \ + -Dcassert=true -Dinjection_points=true \ -Duuid=e2fs -Ddtrace=auto \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build @@ -556,7 +556,7 @@ task: # Use /DEBUG:FASTLINK to avoid high memory usage during linking configure_script: | vcvarsall x64 -meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build +meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Dinjection_points=true -Db_pch=true -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" build build_script: | vcvarsall x64 @@ -616,7 +616,7 @@ task: # disable -Dnls as the number of files it creates cause a noticable slowdown configure_script: | -%BASH% -c "meson setup -Ddebug=true -Doptimization=g -Dcassert=true -Db_pch=true -Dnls=disabled -DTAR=%TAR% build" +%BASH% -c "meson setup -Ddebug=true -Doptimization=g -Dcassert=true -Dinjection_points=true -Db_pch=true -Dnls=disabled -DTAR=%TAR% build" build_script: | %BASH% -c "ninja -C build" -- 2.39.2
Re: pg_stat_statements and "IN" conditions
> On Mon, Jan 22, 2024 at 05:33:26PM +1100, Peter Smith wrote: > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Needs Review" [1], but it seems > there was a CFbot test failure last time it was run [2]. Please have a > look and post an updated version if necessary. > > == > [1] https://commitfest.postgresql.org/46/2837/ > [2] https://cirrus-ci.com/task/6688578378399744 It's the same failing pipeline Vignesh C was talking above. I've fixed the issue in the latest patch version, but looks like it wasn't picked up yet (from what I understand, the latest build for this CF is 8 weeks old).
Re: psql: Allow editing query results with \gedit
Pavel Stehule writes: > po 22. 1. 2024 v 16:06 odesílatel Christoph Berg napsal: >> This patch automates the tedious parts by opening the query result in a >> editor in JSON format, where the user can edit the data. On closing the >> editor, the JSON data is read back, and the differences are sent as >> UPDATE commands. New rows are INSERTed, and deleted rows are DELETEd. > Introduction of \gedit is interesting idea, but in this form it looks too > magic Yeah, I don't like it either --- it feels like something that belongs in an ETL tool not psql. The sheer size of the patch shows how far afield it is from anything that psql already does, necessitating writing tons of stuff that was not there before. The bits that try to parse the query to get necessary information seem particularly half-baked. > In the end I am not sure if I like it or dislike it. Looks dangerous. I can > imagine possible damage when some people will see vi first time and will > try to finish vi, but in this command, it will be transformed to executed > UPDATEs. Yup -- you'd certainly want some way of confirming that you actually want the changes applied. Our existing edit commands load the edited string back into the query buffer, where you can \r it if you don't want to run it. But I fear that the results of this operation would be too long for that to be workable. > More generating UPDATEs without knowledge of table structure > (knowledge of PK) can be issue (and possibly dangerous too), and you cannot > to recognize PK from result of SELECT (Everywhere PK is not "id" and it is > not one column). It does look like it's trying to find out the pkey from the system catalogs ... however, it's also accepting unique constraints without a lot of thought about the implications of NULLs. Even if you have a pkey, it's not exactly clear to me what should happen if the user changes the contents of a pkey field. That could be interpreted as either an INSERT or UPDATE, I think. Also, while I didn't read too closely, it's not clear to me how the code could reliably distinguish INSERT vs UPDATE vs DELETE cases --- surely we're not going to try to put a "diff" engine into this, and even if we did, diff is well known for producing somewhat surprising decisions about exactly which old lines match which new ones. That's part of the reason why I really don't like the idea that the deduced change commands will be summarily applied without the user even seeing them. regards, tom lane
Re: remaining sql/json patches
On 2024-Jan-18, Alvaro Herrera wrote: > > commands/explain.c (Hmm, I think this is a preexisting bug actually) > > > > 3893 18 : case T_TableFuncScan: > > 3894 18 : Assert(rte->rtekind == RTE_TABLEFUNC); > > 3895 18 : if (rte->tablefunc) > > 3896 0 : if (rte->tablefunc->functype == > > TFT_XMLTABLE) > > 3897 0 : objectname = "xmltable"; > > 3898 : else/* Must be > > TFT_JSON_TABLE */ > > 3899 0 : objectname = "json_table"; > > 3900 : else > > 3901 18 : objectname = NULL; > > 3902 18 : objecttag = "Table Function Name"; > > 3903 18 : break; > > Indeed I was completely wrong about this, and in order to gain coverage the only thing we needed was to add an EXPLAIN that uses the JSON format. I did that just now. I think your addition here works just fine. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Support TZ format code in to_timestamp()
Daniel Gustafsson writes: > On 22 Jan 2024, at 03:10, Tom Lane wrote: > + while (len > 0) > + { > + const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs, > + zoneabbrevtbl->numabbrevs); > My immediate reaction was that we should stop at prefix lengths of two since I > could only think of abbreviations of two or more. Googling and reading found > that there are indeed one-letter timezones (Alpha, Bravo etc..). Not sure if > it's worth mentioning that in the comment to help other readers who aren't > neck > deep in timezones? The one I usually think of is "Z" for UTC; I wasn't actually aware that there were any other single-letter abbrevs. But in any case I don't see a reason for this code to be making such assumptions. > + /* FALL THRU */ > Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall > through" a few cases up in the same switch. While we are quite inconsistent > across the tree, consistency within a file is preferrable (regardless of > which). Fair. I tend to shorten it, but I failed to notice that there was nearby precedent for the other way. regards, tom lane
Re: psql: Allow editing query results with \gedit
On Mon, Jan 22, 2024 at 8:06 AM Christoph Berg wrote: > Assuming a SELECT statement reading from a single table, it is quite an > effort to transform that statement to an UPDATE statement on that table, > perhaps to fix a typo that the user has spotted in the query result. > > Building off the other comments, I'd suggest trying to get rid of the intermediate JSOn format and also just focus on a single row at any given time. For an update the first argument to the metacommand could be the unique key value present in the previous result. The resultant UPDATE would just put that into the where clause and every other column in the result would be a SET clause column with the thing being set the current value, ready to be edited. DELETE would be similar but without the need for a SET clause. INSERT can produce a template INSERT (cols) VALUES ... command with some smarts regarding auto incrementing keys and placeholder values. David J.
Re: pg_stat_statements and "IN" conditions
Dmitry Dolgov <9erthali...@gmail.com> writes: >> On Mon, Jan 22, 2024 at 05:33:26PM +1100, Peter Smith wrote: >> Hi, This patch has a CF status of "Needs Review" [1], but it seems >> there was a CFbot test failure last time it was run [2]. Please have a >> look and post an updated version if necessary. >> >> == >> [1] https://commitfest.postgresql.org/46/2837/ >> [2] https://cirrus-ci.com/task/6688578378399744 > It's the same failing pipeline Vignesh C was talking above. I've fixed > the issue in the latest patch version, but looks like it wasn't picked > up yet (from what I understand, the latest build for this CF is 8 weeks > old). Please notice that at the moment, it's not being tested at all because of a patch-apply failure -- that's what the little triangular symbol means. The rest of the display concerns the test results from the last successfully-applied patch version. (Perhaps that isn't a particularly great UI design.) If you click on the triangle you find out == Applying patches on top of PostgreSQL commit ID b0f0a9432d0b6f53634a96715f2666f6d4ea25a1 === === applying patch ./v17-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patch patching file contrib/pg_stat_statements/Makefile Hunk #1 FAILED at 19. 1 out of 1 hunk FAILED -- saving rejects to file contrib/pg_stat_statements/Makefile.rej patching file contrib/pg_stat_statements/expected/merging.out patching file contrib/pg_stat_statements/meson.build ... regards, tom lane
Re: Add \syncpipeline command to pgbench
On 2024-Jan-22, Anthonin Bonnefoy wrote: > 0001 introduces a new error when the end of a pgbench script is > reached while there's still an ongoing pipeline. Pushed, backpatched to 14. I reworded the error message to be client %d aborted: end of script reached with pipeline open I hope this is OK. I debated a half a dozen alternatives ("with open pipeline", "without closing pipeline", "with unclosed pipeline" (???), "leaving pipeline open") and decided this was the least ugly. Thanks, -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ really, I see PHP as like a strange amalgamation of C, Perl, Shell inflex: you know that "amalgam" means "mixture with mercury", more or less, right? i.e., "deadly poison"
Re: the s_lock_stuck on perform_spin_delay
Robert Haas writes: > On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote: >> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a >> quickdie, however this is bad not only because it doesn't work on >> Windows, but also it has too poor performance even it impacts on >> USE_ASSERT_CHECKING build only. In v8, I introduced a new global >> variable quickDieInProgress to handle this. > > OK, I like the split between 0001 and 0002. I still think 0001 has > cosmetic problems, but if some committer wants to take it forward, > they can decide what to do about that; you and I going back and forth > doesn't seem like the right approach to sorting that out. Whether or > not 0002 is adopted might affect what we do about the cosmetics in > 0001, too. Replacing ASSERT_NO_SPIN_LOCK with VerifyNoSpinLocksHeld or adding more comments to each call of VerifyNoSpinLocksHeld really makes me happy since they make things more cosmetic and practical. So I'd be absolutely willing to do more stuff like this. Thanks for such suggestions! Then I can't understand the left cosmetic problems... since you are saying it may related to 0002, I guess you are talking about the naming of START_SPIN_LOCK and END_SPIN_LOCK? > > 0003 seems ... unfortunate. It seems like an admission that 0001 is > wrong. Yes, that's what I was thinking. I doubted if I should merge 0003 to 0001 directly during this discussion, and finally I made it separate for easier dicussion. > Surely it *isn't* right to ignore the spinlock restrictions in > quickdie() in general. For example, we could self-deadlock if we try > to acquire a spinlock we already hold. If the problem here is merely > the call in errstart() then maybe we need to rethink that particular > call. If it goes any deeper than that, maybe we've got actual bugs we > need to fix. I get your point! Acquiring an already held spinlock in quickdie is unlikely to happen, but since our existing infrastructure can handle it, then there is no reason to bypass it. Since the problem here is just errstart, we can do a if(!quickDieInProgress) VerifyNoSpinLocksHeld(); in errstart only. Another place besides the errstart is the CHECK_FOR_INTERRUPTS in errfinish. I think we can add the same check for the VerifyNoSpinLocksHeld in CHECK_FOR_INTERRUPTS. > > +* It's likely to check the BlockSig to know if it is doing a quickdie > +* with sigismember, but it is too expensive in test, so introduce > +* quickDieInProgress to avoid that. > > This isn't very good English -- I realize that can sometimes be hard > -- but also -- I don't think it likely that a future hacker would > wonder why this isn't done that way. A static variable is normal for > PostgreSQL; checking the signal mask would be a completely novel > approach. So I think this comment is missing the mark topically. I was wrong to think sigismember is a syscall, but now I see it is just a function in glibc. Even I can't get the source code of it, I think it should just be some bit-operation based on the definition of __sigset_t. Another badness of sigismember is it is not avaiable in windows. It is still unclear to me why sigaddset in quickdie can compile in windows. (I have a sigismember version and get a linker error at windows). This should be a blocker for me to submit the next version of patch. > If > this patch is right at all, the comment here should focus on why > disabling these checks in quickdie() is necessary and appropriate, not > why it's coded to match everything else in the system. I agree, and I think the patch 0003 is not right at all:( -- Best Regards Andy Fan
Re: XLog size reductions: smaller XLRec block header for PG17
On Mon, 22 Jan 2024 at 16:08, Aleksander Alekseev wrote: > > Hi, > > > I'm seeing that there has been no activity in this thread for nearly 4 > > months, I'm planning to close this in the current commitfest unless > > someone is planning to take it forward. > > I don't think that closing CF entries purely due to inactivity is a > good practice (neither something we did before) as long as there is > code, it applies, etc. There are a lot of patches and few people > working on them. Inactivity in a given thread doesn't necessarily > indicate lack of interest, more likely lack of resources. There are a lot of patches like this and there is no clear way to find out if someone wants to work on it or if they have lost interest in it. That is the reason, I thought to send out a mail so that the author/reviewer can reply and take it to the next state like ready for committer state. If the author/reviewer is not planning to work in this commitfest, but has plans to work in the next commitfest we can move this to the next commitfest. I don't see a better way to identify if the patch has interest or not. Regards, Vignesh
Re: the s_lock_stuck on perform_spin_delay
On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: > I get your point! Acquiring an already held spinlock in quickdie is > unlikely to happen, but since our existing infrastructure can handle it, > then there is no reason to bypass it. No, the existing infrastructure cannot handle that at all. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Improving EXPLAIN's display of SubPlan nodes
Aleksander Alekseev writes: > Although something like: > ``` > + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) > + SubPlan 1 (returns $1) > ``` > ... arguably doesn't give much more information to the user comparing > to what we have now: > ``` > - Filter: (SubPlan 1) > - SubPlan 1 > ``` Yeah, I would probably not have thought to do this on my own; but we had an actual user request for it. I think arguably the main benefit is to confirm "yes, this is the sub-select you think it is". The main thing that's still missing compared to what is in the plan data structure is information about which Param is which. I think we have the subplan output Params relatively well covered through the expedient of listing them in the generated plan_name, but it's still not apparent to me how we could shoehorn subplan input Params into this (or whether it's worth doing). regards, tom lane
Re: pg_stat_statements and "IN" conditions
> On Mon, Jan 22, 2024 at 11:35:22AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthali...@gmail.com> writes: > >> On Mon, Jan 22, 2024 at 05:33:26PM +1100, Peter Smith wrote: > >> Hi, This patch has a CF status of "Needs Review" [1], but it seems > >> there was a CFbot test failure last time it was run [2]. Please have a > >> look and post an updated version if necessary. > >> > >> == > >> [1] https://commitfest.postgresql.org/46/2837/ > >> [2] https://cirrus-ci.com/task/6688578378399744 > > > It's the same failing pipeline Vignesh C was talking above. I've fixed > > the issue in the latest patch version, but looks like it wasn't picked > > up yet (from what I understand, the latest build for this CF is 8 weeks > > old). > > Please notice that at the moment, it's not being tested at all because > of a patch-apply failure -- that's what the little triangular symbol > means. The rest of the display concerns the test results from the > last successfully-applied patch version. (Perhaps that isn't a > particularly great UI design.) > > If you click on the triangle you find out > > == Applying patches on top of PostgreSQL commit ID > b0f0a9432d0b6f53634a96715f2666f6d4ea25a1 === > === applying patch > ./v17-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patch > patching file contrib/pg_stat_statements/Makefile > Hunk #1 FAILED at 19. > 1 out of 1 hunk FAILED -- saving rejects to file > contrib/pg_stat_statements/Makefile.rej > patching file contrib/pg_stat_statements/expected/merging.out > patching file contrib/pg_stat_statements/meson.build Oh, I see, thanks. Give me a moment, will fix this.
Re: Improving EXPLAIN's display of SubPlan nodes
Hi Aleksander and Tom I do confirm that I requested to get this information, in order to recover the formula to filter on. Thanks to both of you Chantal Le 22/01/2024 à 18:07, Tom Lane a écrit : Aleksander Alekseev writes: Although something like: ``` + Filter: (ANY (base_tbl.a = $1) FROM SubPlan 1 (returns $1)) + SubPlan 1 (returns $1) ``` ... arguably doesn't give much more information to the user comparing to what we have now: ``` - Filter: (SubPlan 1) - SubPlan 1 ``` Yeah, I would probably not have thought to do this on my own; but we had an actual user request for it. I think arguably the main benefit is to confirm "yes, this is the sub-select you think it is". The main thing that's still missing compared to what is in the plan data structure is information about which Param is which. I think we have the subplan output Params relatively well covered through the expedient of listing them in the generated plan_name, but it's still not apparent to me how we could shoehorn subplan input Params into this (or whether it's worth doing). regards, tom lane
Re: the s_lock_stuck on perform_spin_delay
Robert Haas writes: > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: >> I get your point! Acquiring an already held spinlock in quickdie is >> unlikely to happen, but since our existing infrastructure can handle it, >> then there is no reason to bypass it. > > No, the existing infrastructure cannot handle that at all. Actually I mean we can handle it without 0003. am I still wrong? Without the 0003, if we acquiring the spin lock which is held by ourself already. VerifyNoSpinLocksHeld in SpinLockAcquire should catch it. -- Best Regards Andy Fan
Re: Permute underscore separated components of columns before fuzzy matching
Thank you for bringing that to my attention. Is there a way to subscribe to cf-bot failures? Apparently I confused myself with my naming. I attached a patch that fixes the bug (at least at my cassert test-world run). Regards Arne On 2024-01-22 06:38, Peter Smith wrote: 2024-01 Commitfest. Hi, This patch has a CF status of "Needs Review" [1], but it seems like there were CFbot test failures last time it was run [2]. Please have a look and post an updated version if necessary. == [1] https://commitfest.postgresql.org/46/4282/ [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4282 Kind Regards, Peter Smith.diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c index 34a0ec5901..c416be2ea0 100644 --- a/src/backend/parser/parse_relation.c +++ b/src/backend/parser/parse_relation.c @@ -579,32 +579,13 @@ GetCTEForRTE(ParseState *pstate, RangeTblEntry *rte, int rtelevelsup) return NULL;/* keep compiler quiet */ } -/* - * updateFuzzyAttrMatchState - * Using Levenshtein distance, consider if column is best fuzzy match. - */ static void -updateFuzzyAttrMatchState(int fuzzy_rte_penalty, - FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte, - const char *actual, const char *match, int attnum) +updateFuzzyAttrMatchStateSingleString(int fuzzy_rte_penalty, + FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte, + const char *actual, const char *match, int attnum, int matchlen) { - int columndistance; - int matchlen; - - /* Bail before computing the Levenshtein distance if there's no hope. */ - if (fuzzy_rte_penalty > fuzzystate->distance) - return; - - /* - * Outright reject dropped columns, which can appear here with apparent - * empty actual names, per remarks within scanRTEForColumn(). - */ - if (actual[0] == '\0') - return; - /* Use Levenshtein to compute match distance. */ - matchlen = strlen(match); - columndistance = + int columndistance = varstr_levenshtein_less_equal(actual, strlen(actual), match, matchlen, 1, 1, 1, fuzzystate->distance + 1 @@ -667,6 +648,142 @@ updateFuzzyAttrMatchState(int fuzzy_rte_penalty, } } +static void putUnderscores(char* string, int start, int underscore_amount, int len) { + for (int i = 0; start + i < len && i < underscore_amount; i++) + string[start + i] = '_'; +} + +/* + * updateFuzzyAttrMatchState + * Using Levenshtein distance, consider if column is best fuzzy match. + */ +static void +updateFuzzyAttrMatchState(int fuzzy_rte_penalty, + FuzzyAttrMatchState *fuzzystate, RangeTblEntry *rte, + const char *actual, const char *match, int attnum) +{ + /* Memory segment to store the current permutation of the match string. */ + char* tmp_match; + int matchlen = strlen(match); + /* We keep track how many permutations we have already processed, to avoid long runtimes. */ + int underscore_permutations_count = 0; + /* The location the underscore we currently process within the match string. */ + int underscore_current = 1; + /* Variables to track the amount of underscores delimiting sections */ + int underscore_amount = 1; + int underscore_second_amount = 1; + + /* Bail before computing the Levenshtein distance if there's no hope. */ + if (fuzzy_rte_penalty > fuzzystate->distance) + return; + + /* + * Outright reject dropped columns, which can appear here with apparent + * empty actual names, per remarks within scanRTEForColumn(). + */ + if (actual[0] == '\0') + return; + + updateFuzzyAttrMatchStateSingleString(fuzzy_rte_penalty, fuzzystate, rte, actual, match, attnum, matchlen); + /* We don't want to permute zero length strings, so check whether the string starts with an underscore. */ + if (match[0] == '_') { + while (underscore_current < matchlen - 1 && match[underscore_current] == '_') { + underscore_current++; + } + } + /* Advance to the next underscore. We do this once here to avoid pallocing, if the string does't contain an underscore at all. */ + while (underscore_current < matchlen - 1 && match[underscore_current] != '_') { + underscore_current++; + } + /* + * Check for permuting up to three sections separated by underscores. + * + * We count the number of underscores here, because we want to know whether we should consider + * permuting underscore separated sections. + */ + if (underscore_current < matchlen - 1) { + tmp_match = palloc(matchlen + 1); + tmp_match[matchlen] = '\0'; + while (underscore_permutations_count < 300 && underscore_current < matchlen - 1) { + /* + * If sections contain more than one underscore, we want to swap the sections separated by more than one instead. + * There would be no point in swapping zero length strings around. + * So we check how many consecutive underscores we have here. + */ + underscore_amount = 1; + while (underscore_current + underscore_amount < matchlen && match[underscore_current + underscore_amount] == '_') { +underscore_amount++; +
Re: psql: Allow editing query results with \gedit
po 22. 1. 2024 v 17:34 odesílatel David G. Johnston < david.g.johns...@gmail.com> napsal: > On Mon, Jan 22, 2024 at 8:06 AM Christoph Berg wrote: > >> Assuming a SELECT statement reading from a single table, it is quite an >> effort to transform that statement to an UPDATE statement on that table, >> perhaps to fix a typo that the user has spotted in the query result. >> >> > Building off the other comments, I'd suggest trying to get rid of the > intermediate JSOn format and also just focus on a single row at any given > time. > > For an update the first argument to the metacommand could be the unique > key value present in the previous result. The resultant UPDATE would just > put that into the where clause and every other column in the result would > be a SET clause column with the thing being set the current value, ready to > be edited. > > DELETE would be similar but without the need for a SET clause. > > INSERT can produce a template INSERT (cols) VALUES ... command with some > smarts regarding auto incrementing keys and placeholder values. > > David J. > Can you imagine using it? I like psql, I like term applications, my first database was FoxPro, but for dataeditation I am almost sure so I don't want to use psql. I can imagine enhancing the current \gexec command because it is executed directly without possibility to edit. I see valuable some special clause "edit" like \gexec_edit or \gexec(edit) or \gexec edit This is like current gexec but with possibility to edit the result in editor and with possibility to finish without saving. Then we can introduce SQL functions UpdateTemplate(cols text[], rowvalue), InsertTemplate, ... and then you can write SELECT UpdateTemplace(ARRAY['a','b','c'], foo) FROM foo WHERE id IN (1,2) \gexec_with_edit But still looks strange to me - like we try reintroduce of necessity sed or awk to SQL and psql I would have forms like FoxPro, I would have a grid like FoxPro, but not in psql, and I would not develop it :-)
Commitfest 2024-01 third week update
Hi, Here's a quick status report after the third week: Status summary: status| w1 | w2 | w3 ---+---++-- Needs review: |238 | 213| 181 Waiting on Author: | 44 | 46 | 52 Ready for Committer:| 27 | 27 | 26 Committed:| 36 | 46 | 57 Moved to next CF | 1 | 3 | 4 Withdrawn:| 2 | 4 | 12 Returned with Feedback: | 3 | 12 | 18 Rejected: | 1 | 1 | 2 Total: | 352| 352 | 352 If you have submitted a patch and it's in "Waiting for author" state, please aim to get it to "Needs review" state soon if you can, as that's where people are most likely to be looking for things to review. I have pinged most threads that are in "Needs review" state and don't apply, compile warning-free, or pass check-world. I'll do some more of that sort of thing. I have sent a private mail through commitfest to patch owners who have submitted one or more patches but have not picked any of the patches for review. I have sent out mails for which there has been no activity for a long time, please respond to the mails if you are planning to continue to work on it or if you are planning to work on it in the next commitfest, please move it to the next commitfest. If there is no response I'm planning to return these patches and it can be added again when it will be worked upon actively. Regards, Vignesh
Re: XLog size reductions: smaller XLRec block header for PG17
On Mon, Jan 22, 2024 at 5:38 AM Aleksander Alekseev wrote: > I don't think that closing CF entries purely due to inactivity is a > good practice (neither something we did before) as long as there is > code, it applies, etc. There are a lot of patches and few people > working on them. Inactivity in a given thread doesn't necessarily > indicate lack of interest, more likely lack of resources. If the CF entry doesn't serve the purpose of allowing someone to find patches in need of review, it's worse than useless. Patches that aren't getting reviewed should stay in the CF until they do, or until we make a collective decision that we don't care about or want that particular patch enough to bother. But patches that are not ready for review need to get out until such time as they are. Otherwise they're just non-actionable clutter. And unfortunately we have so much of that in the CF app now that finding things that really need review is time consuming and difficult. That REALLY needs to be cleaned up. In the case of this particular patch, I think the problem is that there's no consensus on the design. There's not a ton of debate on this thread, but thread [1] linked in the original post contains a lot of vigorous debate about what the right thing to do is here and I don't believe we reached any meeting of the minds. In light of that lack of agreement, I'm honestly a bit confused why Matthias even found it worthwhile to submit this on a new thread. I think we all agree with him that there's room for improvement here, but we don't agree on what particular form that improvement should take, and as long as that agreement is lacking, I find it hard to imagine anything getting committed. The task right now is to get agreement on something, and leaving the CF entry open longer isn't going make the people who have already expressed opinions start agreeing with each other more than they do already. It looks like I never replied to https://www.postgresql.org/message-id/20221019192130.ebjbycpw6bzjry4v%40awork3.anarazel.de but, FWIW, I agree with Andres that applying the same technique to multiple fields that are stored together (DB OID, TS OID, rel #, block #) is unlikely in practice to produce many cases that regress. But the question for this thread is really more about whether we're OK with using ad-hoc bit swizzling to reduce the size of xlog records or whether we want to insist on the use of a uniform varint encoding. Heikki and Andres both seem to favor the latter. IIRC, I was initially more optimistic about ad-hoc bit swizzling being a potentially acceptable technique, but I'm not convinced enough about it to argue against two very smart committers both of whom know more about micro-optimizing performance than I do, and nobody else seems to making this argument on this thread either, so I just don't really see how this patch is ever going to go anywhere in its current form. -- Robert Haas EDB: http://www.enterprisedb.com
Re: the s_lock_stuck on perform_spin_delay
On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote: > > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: > >> I get your point! Acquiring an already held spinlock in quickdie is > >> unlikely to happen, but since our existing infrastructure can handle it, > >> then there is no reason to bypass it. > > > > No, the existing infrastructure cannot handle that at all. > > Actually I mean we can handle it without 0003. am I still wrong? > Without the 0003, if we acquiring the spin lock which is held by > ourself already. VerifyNoSpinLocksHeld in SpinLockAcquire should catch > it. But that's only going to run in assert-only builds. The whole point of the patch set is to tell developers that there are bugs in the code that need fixing, not to catch problems that actually occur in production. -- Robert Haas EDB: http://www.enterprisedb.com
Re: the s_lock_stuck on perform_spin_delay
Robert Haas writes: > On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote: >> > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: >> >> I get your point! Acquiring an already held spinlock in quickdie is >> >> unlikely to happen, but since our existing infrastructure can handle it, >> >> then there is no reason to bypass it. >> > >> > No, the existing infrastructure cannot handle that at all. >> >> Actually I mean we can handle it without 0003. am I still wrong? >> Without the 0003, if we acquiring the spin lock which is held by >> ourself already. VerifyNoSpinLocksHeld in SpinLockAcquire should catch >> it. > > But that's only going to run in assert-only builds. The whole point of > the patch set is to tell developers that there are bugs in the code > that need fixing, not to catch problems that actually occur in > production. I see. As to this aspect, then yes. -- Best Regards Andy Fan
Re: Network failure may prevent promotion
On Thu, Jan 18, 2024 at 10:42 PM Heikki Linnakangas wrote: > Given that commit 728f86fec6 that introduced this issue was not strictly > required, perhaps we should just revert it for v16. +1 for the revert. This issue should be fixed in the upcoming minor release since it might cause unexpected delays in failover times. Regards, -- Fujii Masao
Re: Teach predtest about IS [NOT] proofs
James Coleman writes: > 0001 does the initial pure refactor. 0003 makes a lot of modifications > to what we can prove about implication and refutation. Finally, 0003 > isn't intended to be committed, but attempts to validate more > holistically that none of the changes creates any invalid proofs > beyond the mostly happy-path tests added in 0004. > I ended up not tackling changing how test_predtest tests run for now. > That's plausibly still useful, and I'd be happy to add that if you > generally agree with the direction of the patch and with that > abstraction being useful. > I added some additional verifications to the test_predtest module to > prevent additional obvious flaws. I looked through 0001 and made some additional cosmetic changes, mostly to get comments closer to the associated code; I also ran pgindent on it (see v5-0001 attached). That seems pretty committable to me at this point. I also like your 0002 additions to test_predtest.c (although why the mixture of ERROR and WARNING? ISTM they should all be WARNING, so we can press on with the test). One other thought is that maybe separating out predicate_implied_not_null_by_clause should be part of 0001? I'm less excited about the rest of v4-0002. @@ -740,6 +747,16 @@ predicate_refuted_by_recurse(Node *clause, Node *predicate, !weak)) return true; +/* + * Because weak refutation expands the allowed outcomes for B + * from "false" to "false or null", we can additionally prove + * weak refutation in the case that strong refutation is proven. + */ +if (weak && not_arg && +predicate_implied_by_recurse(predicate, not_arg, + true)) +return true; + switch (pclass) { case CLASS_AND: I don't buy this bit at all. If the prior recursive call fails to prove weak refutation in a case where strong refutation holds, how is that not a bug lower down? Moreover, in order to mask such a bug, you're doubling the time taken by failed proofs, which is an unfortunate thing --- we don't like spending a lot of time on something that fails to improve the plan. @@ -1138,32 +1155,114 @@ predicate_implied_by_simple_clause(Expr *predicate, Node *clause, Assert(list_length(op->args) == 2); rightop = lsecond(op->args); -/* - * We might never see a null Const here, but better check - * anyway - */ -if (rightop && IsA(rightop, Const) && -!((Const *) rightop)->constisnull) +if (rightop && IsA(rightop, Const)) { +Const*constexpr = (Const *) rightop; Node *leftop = linitial(op->args); -if (DatumGetBool(((Const *) rightop)->constvalue)) -{ -/* X = true implies X */ -if (equal(predicate, leftop)) -return true; -} +if (constexpr->constisnull) +return false; + +if (DatumGetBool(constexpr->constvalue)) +return equal(predicate, leftop); else -{ -/* X = false implies NOT X */ -if (is_notclause(predicate) && -equal(get_notclausearg(predicate), leftop)) -return true; -} +return is_notclause(predicate) && +equal(get_notclausearg(predicate), leftop); } } } break; I don't understand what this bit is doing ... and the fact that the patch removes all the existing comments and adds none isn't helping that. What it seems to mostly be doing is adding early "return false"s, which I'm not sure is a good thing, because it seems possible that operator_predicate_proof could apply here. +case IS_UNKNOWN: +/* + * When the clause is in the form "foo IS UNKNOWN" then + * we can prove weak implication of a predicate that + * is strict for "foo" and negated. This doesn't work + * for strong implication since if "foo" is "null" then + * the predicate will evaluate to "null" rather than + * "true". + */ The phrasing of this comment seems randomly inconsistent with others making similar arguments. +
Re: psql: Allow editing query results with \gedit
Pavel Stehule writes: > I would have forms like FoxPro, I would have a grid like FoxPro, but not in > psql, and I would not develop it :-) Yeah, that's something that was also bothering me, but I failed to put my finger on it. "Here's some JSON, edit it, and don't forget to keep the quoting correct" does not strike me as a user-friendly way to adjust data content. A spreadsheet-like display where you can change data within cells seems like a far better API, although I don't want to write that either. This kind of API would not readily support INSERT or DELETE cases, but TBH I think that's better anyway --- you're adding too much ambiguity in pursuit of a very secondary use-case. The stated complaint was "it's too hard to build UPDATE commands", which I can sympathize with. (BTW, I wonder how much of this already exists in pgAdmin.) regards, tom lane
Re: Permute underscore separated components of columns before fuzzy matching
Arne Roland writes: > Thank you for bringing that to my attention. Is there a way to subscribe > to cf-bot failures? I don't know of any push notification support in cfbot, but you can bookmark the page with your own active patches, and check it periodically: http://commitfest.cputube.org/arne-roland.html (For others, click on your own name in the main cfbot page's entry for one of your patches to find out how it spelled your name for this purpose.) regards, tom lane
Re: make dist using git archive
On 22.01.24 13:10, Junwang Zhao wrote: I played this with meson build on macOS, the packages are generated in source root but not build root, I'm sure if this is by design but I think polluting *working directory* is not good. Yes, it's not good, but I couldn't find a way to make it work. This is part of the complications with meson I referred to. The @BUILD_ROOT@ placeholder in custom_target() is apparently always a relative path, but it doesn't know that git -C changes the current directory. Another thing I'd like to point out is, should we also introduce *git commit* or maybe *git tag* to package name, something like: git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o postgresql-17devel-`git rev-parse --short HEAD`.tar.gz git archive --format tar.gz --prefix postgresql-17devel/ HEAD -o postgresql-`git describe --tags`.tar.gz I'm not sure why we would need it built-in. It can be done by hand, of course.
Re: Built-in CTYPE provider
On 18.01.24 23:03, Jeff Davis wrote: On Thu, 2024-01-18 at 13:53 +0100, Peter Eisentraut wrote: I think that would be a terrible direction to take, because it would regress the default sort order from "correct" to "useless". I don't agree that the current default is "correct". There are a lot of ways it can be wrong: * the environment variables at initdb time don't reflect what the users of the database actually want * there are so many different users using so many different applications connected to the database that no one "correct" sort order exists * libc has some implementation quirks * the version of Unicode that libc is based on is not what you expect * the version of libc is not what you expect These are arguments why the current defaults are not universally perfect, but I'd argue that they are still most often the right thing as the default. Aside from the overall message this sends about how PostgreSQL cares about locales and Unicode and such. Unicode is primarily about the semantics of characters and their relationships. The patches I propose here do a great job of that. Collation (relationships between *strings*) is a part of Unicode, but not the whole thing or even the main thing. I don't get this argument. Of course, people care about sorting and sort order. Whether you consider this part of Unicode or adjacent to it, people still want it. Maybe you don't intend for this to be the default provider? I am not proposing that this provider be the initdb-time default. ok But then who would really use it? I mean, sure, some people would, but how would you even explain, in practice, the particular niche of users or use cases? It's for users who want to respect Unicode support text from international sources in their database; but are not experts on the subject and don't know precisely what they want or understand the consequences. If and when such users do notice a problem with the sort order, they'd handle it at that time (perhaps with a COLLATE clause, or sorting in the application). Vision: * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive matching, and customization with rules. It's the solution for everything from "slightly more advanced" to "very advanced". I am astonished by this. In your world, do users not want their text data sorted? Do they not care what the sort order is? You consider UCA sort order an "advanced" feature?
Re: partitioning and identity column
On 22.01.24 13:23, Ashutosh Bapat wrote: if (newdef->identity) { Assert(!is_partioning); /* * Identity is never inherited. The new column can have an * identity definition, so we always just take that one. */ def->identity = newdef->identity; } Thoughts? That code block already has Assert(!is_partition) at line 3085. I thought that Assert is enough. Ok. Maybe just rephrase that comment somehow then? There's another thing I found. The file isn't using check_stack_depth() in the function which traverse inheritance hierarchies. This isn't just a problem of the identity related function but most of the functions in that file. Do you think it's worth fixing it? I suppose the number of inheritance levels is usually not a problem for stack depth?
Re: Adding facility for injection points (or probe points?) for more advanced tests
On 22/01/2024 18:08, Heikki Linnakangas wrote: I wrote the attached patch to enable injection points in the Cirrus CI config, to run the injection tests I wrote for a GIN bug today [1]. But that led to a crash in the asan-enabled build [2]. I didn't investigate it yet. Pushed a fix for the crash. What do you think of enabling this in the Cirrus CI config? -- Heikki Linnakangas Neon (https://neon.tech)
Re: [17] CREATE SUBSCRIPTION ... SERVER
On Mon, 2024-01-22 at 18:41 +0530, Ashutosh Bapat wrote: > 0002 adds a prefix "regress_" to almost every object that is created > in foreign_data.sql. psql \dew outputs the owner, which in the case of a built-in FDW is the bootstrap superuser, which is not a stable name. I used the prefix to exclude the built-in FDW -- if you have a better suggestion, please let me know. (Though reading below, we might not even want a built-in FDW.) > Dummy FDW makes me nervous. The way it's written, it may grow into a > full-fledged postgres_fdw and in the process might acquire the same > concerns that postgres_fdw has today. But I will study the patches > and > discussion around it more carefully. I introduced that based on this comment[1]. I also thought it fit with your previous suggestion to make it work with postgres_fdw, but I suppose it's not required. We could just not offer the built-in FDW, and expect users to either use postgres_fdw or create their own dummy FDW. > I enhanced the postgres_fdw TAP test to use foreign table. Please see > the attached patch. It works as expected. Of course a follow-on work > will require linking the local table and its replica on the publisher > table so that push down will work on replicated tables. But the > concept at least works with your changes. Thanks for that. Thank you, I'll include it in the next patch set. > I am not sure we need a full-fledged TAP test for testing > subscription. I wouldn't object to it, but TAP tests are heavy. It > should be possible to write the same test as a SQL test by creating > two databases and switching between them. Do you think it's worth > trying that way? I'm not entirely sure what you mean here, but I am open to test simplifications if you see an opportunity. Regards, Jeff Davis > [1] https://www.postgresql.org/message-id/172273.1693403385%40sss.pgh.pa.us
Re: psql: Allow editing query results with \gedit
Re: Pavel Stehule > Introduction of \gedit is interesting idea, but in this form it looks too > magic > > a) why the data are in JSON format, that is not native for psql (minimally > now) Because we need something machine-readable. CSV would be an alternative, but that is hardly human-readable. > b) the implicit transformation to UPDATEs and the next evaluation can be > pretty dangerous. You can always run it in a transaction: begin; select * from tbl \gedit commit; Alternatively, we could try to make it not send the commands right away - either by putting them into the query buffer (that currently work only for single commands without any ';') or by opening another editor. Doable, but perhaps the transaction Just Works? > The concept of proposed feature is interesting, but the name \gedit is too > generic, maybe too less descriptive for this purpose > > Maybe \geditupdates can be better - but still it can be dangerous and slow > (without limits) An alternative I was considering was \edittable, but that would put more emphasis on "table" when it's actually more powerful than that, it works on a query result. (Similar in scope to updatable views.) > In the end I am not sure if I like it or dislike it. Looks dangerous. I can > imagine possible damage when some people will see vi first time and will > try to finish vi, but in this command, it will be transformed to executed > UPDATEs. If you close the editor with touching the file, nothing will be sent. And if you mess up the file, it will complain. I think it's unlikely that people who end up in an editor they can't operate would be able to modify JSON in a way that is still valid. Also, \e has the same problem. Please don't let "there are users who don't know what they are doing" spoil the idea. > More generating UPDATEs without knowledge of table structure > (knowledge of PK) can be issue (and possibly dangerous too), and you cannot > to recognize PK from result of SELECT (Everywhere PK is not "id" and it is > not one column). It *does* retrieve the proper PK from the table. All updates are based on the PK. Re: Tom Lane > > Introduction of \gedit is interesting idea, but in this form it looks too > > magic > > Yeah, I don't like it either --- it feels like something that belongs > in an ETL tool not psql. I tried to put it elsewhere first, starting with pspg: https://github.com/okbob/pspg/issues/200 The problem is that external programs like the pager neither have access to the query string/table name, nor the database connection. ETL would also not quite fit, this is meant for interactive use. > The sheer size of the patch shows how far > afield it is from anything that psql already does, necessitating > writing tons of stuff that was not there before. I've been working on this for several months, so it's already larger than the MVP would be. It does have features like (key=col1,col2) that could be omitted. > The bits that try > to parse the query to get necessary information seem particularly > half-baked. Yes, that's not pretty, and I'd be open for suggestions on how to improve that. I was considering: 1) this (dumb query parsing) 2) EXPLAIN the query to get the table 3) use libpq's PQftable The problem with 2+3 is that on views and partitioned tables, this would yield the base table name, not the table name used in the query. 1 turned out to be the most practical, and worked for me so far. If the parse tree would be available, using that would be much better. Should we perhaps add something like "explain (parse) select...", or add pg_parsetree(query) function? > Yup -- you'd certainly want some way of confirming that you actually > want the changes applied. Our existing edit commands load the edited > string back into the query buffer, where you can \r it if you don't > want to run it. But I fear that the results of this operation would > be too long for that to be workable. The results are as long as you like. The intended use case would be to change just a few rows. As said above, I was already thinking of making it user-confirmable, just the current version doesn't have it. > It does look like it's trying to find out the pkey from the system > catalogs ... however, it's also accepting unique constraints without > a lot of thought about the implications of NULLs. Right, the UNIQUE part doesn't take care of NULLs yet. Will fix that. (Probably by erroring out if any key column is NULL.) > Even if you have > a pkey, it's not exactly clear to me what should happen if the user > changes the contents of a pkey field. That could be interpreted as > either an INSERT or UPDATE, I think. A changed PK will be interpreted as DELETE + INSERT. (I shall make that more clear in the documentation.) > Also, while I didn't read too closely, it's not clear to me how the > code could reliably distinguish INSERT vs UPDATE vs DELETE cases --- > surely we're not going to try to put a "diff" engine into this, and > even if we did, diff is
Multiple startup messages over the same connection
Hello, A question about protocol design - would it be possible to extend the protocol, so it can handle multiple startup / authentication messages over a single connection? Are there any serious obstacles? (possible issues with re-initialization of backends, I guess?) If that is possible, it could improve one important edge case - where you have to talk to multiple databases on a single host currently, you need to open a separate connection to each of them. In some cases (multitenancy for example), you may have thousands of databases on a host, which leads to inefficient connection utilization on clients (on the db side too). A lot of other RDBMSes don't have this limitation. thank you, -Vladimir Churyukin
Re: make dist using git archive
On Mon Jan 22, 2024 at 1:31 AM CST, Peter Eisentraut wrote: From 4b128faca90238d0a0bb6949a8050c2501d1bd67 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Sat, 20 Jan 2024 21:54:36 +0100 Subject: [PATCH v0] make dist uses git archive --- GNUmakefile.in | 34 -- meson.build| 38 ++ 2 files changed, 50 insertions(+), 22 deletions(-) diff --git a/GNUmakefile.in b/GNUmakefile.in index eba569e930e..3e04785ada2 100644 --- a/GNUmakefile.in +++ b/GNUmakefile.in @@ -87,29 +87,19 @@ update-unicode: | submake-generated-headers submake-libpgport distdir= postgresql-$(VERSION) dummy = =install= +GIT = git + dist: $(distdir).tar.gz $(distdir).tar.bz2 - rm -rf $(distdir) - -$(distdir).tar: distdir - $(TAR) chf $@ $(distdir) - -.INTERMEDIATE: $(distdir).tar - -distdir-location: - @echo $(distdir) - -distdir: - rm -rf $(distdir)* $(dummy) - for x in `cd $(top_srcdir) && find . \( -name CVS -prune \) -o \( -name .git -prune \) -o -print`; do \ - file=`expr X$$x : 'X\./\(.*\)'`; \ - if test -d "$(top_srcdir)/$$file" ; then \ - mkdir "$(distdir)/$$file" && chmod 777 "$(distdir)/$$file"; \ - else \ - ln "$(top_srcdir)/$$file" "$(distdir)/$$file" >/dev/null 2>&1 \ - || cp "$(top_srcdir)/$$file" "$(distdir)/$$file"; \ - fi || exit; \ - done - $(MAKE) -C $(distdir) distclean + +.PHONY: check-dirty-index +check-dirty-index: + $(GIT) diff-index --quiet HEAD + +$(distdir).tar.gz: check-dirty-index + $(GIT) archive --format tar.gz --prefix $(distdir)/ HEAD -o $@ + +$(distdir).tar.bz2: check-dirty-index + $(GIT) -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $@ distcheck: dist rm -rf $(dummy) diff --git a/meson.build b/meson.build index c317144b6bc..f0d870c5192 100644 --- a/meson.build +++ b/meson.build @@ -3347,6 +3347,44 @@ run_target('help', +### +# Distribution archive +### + +git = find_program('git', required: false, native: true, disabler: true) +bzip2 = find_program('bzip2', required: false, native: true, disabler: true) This doesn't need to be a disabler. git is fine as-is. See later comment. Disablers only work like you are expecting when they are used like how git is used. Once you call a method like .path(), all bets are off. +distdir = meson.project_name() + '-' + meson.project_version() + +check_dirty_index = run_target('check-dirty-index', + command: [git, 'diff-index', '--quiet', 'HEAD']) Seems like you might want to add -C here too? + +tar_gz = custom_target('tar.gz', + build_always_stale: true, + command: [git, '-C', '@SOURCE_ROOT@', 'archive', +'--format', 'tar.gz', +'--prefix', distdir + '/', +'-o', '@BUILD_ROOT@/@OUTPUT@', +'HEAD', '.'], + install: false, + output: distdir + '.tar.gz', +) + +tar_bz2 = custom_target('tar.bz2', + build_always_stale: true, + command: [git, '-C', '@SOURCE_ROOT@', '-c', 'tar.tar.bz2.command=' + bzip2.path() + ' -c', 'archive', +'--format', 'tar.bz2', +'--prefix', distdir + '/', -'-o', '@BUILD_ROOT@/@OUTPUT@', +'-o', join_paths(meson.build_root(), '@OUTPUT@'), This will generate the tarballs in the build directory. Do the same for the previous target. Tested locally. +'HEAD', '.'], + install: false, + output: distdir + '.tar.bz2', +) The bz2 target should be wrapped in an `if bzip2.found()`. It is possible for git to be found, but not bzip2. I might also define the bz2 command out of line. Also, you may want to add these programs to meson_options.txt for overriding, even though the "meson-ic" way is to use a machine file. + +alias_target('pgdist', [check_dirty_index, tar_gz, tar_bz2]) Are you intending for the check_dirty_index target to prohibit the other two targets from running? Currently that is not the case. If it is what you intend, use a stamp file or something to indicate a relationship. Alternatively, inline the git diff-index into the other commands. These might also do better as external scripts. It would reduce duplication between the autotools and Meson builds. + + + ### # The End, The End, My Friend ### I am not really following why we can't use the builtin Meson dist command. The only difference from my testing is it doesn't use a --prefix argument. -- Tristan Partin Neon (https://neon.tech)
Re: Does redundant extension exist During faster COPY in PG16
Hi, On 2024-01-22 19:54:00 +0800, 何柯文(渊云) wrote: > I'm learning faster COPY of PG16. I have some questions about extension lock > improvement. > From ./src/backend/storage/buffer/bufmgr.c:1901 (ExtendBufferedRelShared) > ``` > /* > * Lock relation against concurrent extensions, unless requested not to. > * > * We use the same extension lock for all forks. That's unnecessarily > * restrictive, but currently extensions for forks don't happen often > * enough to make it worth locking more granularly. > * > * Note that another backend might have extended the relation by the time > * we get the lock. > */ > if (!(flags & EB_SKIP_EXTENSION_LOCK)) > { > LockRelationForExtension(bmr.rel, ExclusiveLock); > if (bmr.rel) > bmr.smgr = RelationGetSmgr(bmr.rel); > } > ... > smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false); > ``` > During concurrent extension, when we obtain the extension lock, we use > smgrzeroextend() to extend relation files instead of searching fsm through > GetPageWithFreeSpace(). Is this approach reasonable? I think so, yes. > During concurrent extensions, one backend bulk extend successfully, meaning > that other backends waiting on extension lock have free pages to use. If > all concurrent extend backends extend the relation file after getting the > extension lock, the extension lock will be held (extention backends * > smgrzeroextend() executing time). If there's this much contention on the extension lock, there's no harm in extending more - the space will be used soon. The alternatives would be a) to search the FSM with the extension lock held, making contention worse, b) to release the extension lock again if we couldn't immediately acquire it, search the fsm, and retry if we couldn't find any free space, which would substantially increase contention. The FSM is the source of substantial contention, disabling it actually results in substantial throughput increases. Vastly increasing the number of lookups in the FSM would make that considerably worse, without a meaningful gain in density. Greetings, Andres Freund
Re: Improve WALRead() to suck data directly from WAL buffers when possible
Hi, On 2024-01-10 19:59:29 +0530, Bharath Rupireddy wrote: > + /* > + * Typically, we must not read a WAL buffer page that just got > + * initialized. Because we waited enough for the in-progress WAL > + * insertions to finish above. However, there can exist a slight > + * window after the above wait finishes in which the read > buffer page > + * can get replaced especially under high WAL generation rates. > After > + * all, we are reading from WAL buffers without any locks here. > So, > + * let's not count such a page in. > + */ > + phdr = (XLogPageHeader) page; > + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC && > + phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) && > + phdr->xlp_tli == tli)) > + break; I still think that anything that requires such checks shouldn't be merged. It's completely bogus to check page contents for validity when we should have metadata telling us which range of the buffers is valid and which not. Greetings, Andres Freund
core dumps in auto_prewarm, tests succeed
Hi, I noticed that I was getting core dumps while executing the tests, without the tests failing. Backtraces are vriations of this: #0 0x00ca29cd in pg_atomic_read_u32_impl (ptr=0x7fe13497a004) at ../../../../../home/andres/src/postgresql/src/include/port/atomics/generic.h:48 #1 0x00ca2b08 in pg_atomic_read_u32 (ptr=0x7fe13497a004) at ../../../../../home/andres/src/postgresql/src/include/port/atomics.h:239 #2 0x00ca3c3d in LWLockAttemptLock (lock=0x7fe13497a000, mode=LW_EXCLUSIVE) at ../../../../../home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:825 #3 0x00ca440c in LWLockAcquire (lock=0x7fe13497a000, mode=LW_EXCLUSIVE) at ../../../../../home/andres/src/postgresql/src/backend/storage/lmgr/lwlock.c:1264 #4 0x7fe130204ab4 in apw_detach_shmem (code=0, arg=0) at ../../../../../home/andres/src/postgresql/contrib/pg_prewarm/autoprewarm.c:788 #5 0x00c81c99 in shmem_exit (code=0) at ../../../../../home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:276 #6 0x00c81a7c in proc_exit_prepare (code=0) at ../../../../../home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:198 #7 0x00c819a8 in proc_exit (code=0) at ../../../../../home/andres/src/postgresql/src/backend/storage/ipc/ipc.c:111 #8 0x00bdd0b3 in BackgroundWorkerMain () at ../../../../../home/andres/src/postgresql/src/backend/postmaster/bgworker.c:841 #9 0x00be861d in do_start_bgworker (rw=0x341ff20) at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:5756 #10 0x00be8a34 in maybe_start_bgworkers () at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:5980 #11 0x00be4f9f in process_pm_child_exit () at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:3039 #12 0x00be2de4 in ServerLoop () at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1765 #13 0x00be27bf in PostmasterMain (argc=4, argv=0x33dbba0) at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1475 #14 0x00aca326 in main (argc=4, argv=0x33dbba0) at ../../../../../home/andres/src/postgresql/src/backend/main/main.c:198 The most likely culprit seems to be: https://postgr.es/m/E1rQvjC-002Chd-Cd%40gemulon.postgresql.org The test encountering this is pg_prewarm/001_basic: (gdb) p DataDir $12 = 0x33ef8a0 "/srv/dev/build/postgres/m-dev-assert/testrun/pg_prewarm/001_basic/data/t_001_basic_main_data/pgdata" A secondary issue is that the tests suceed despite two segfaults. The problem here likely is that we don't have sufficient error handling during shutdowns: 2024-01-22 12:31:34.133 PST [3054823] LOG: background worker "logical replication launcher" (PID 3054836) exited with exit code 1 2024-01-22 12:31:34.443 PST [3054823] LOG: background worker "autoprewarm leader" (PID 3054835) was terminated by signal 11: Segmentation fault 2024-01-22 12:31:34.443 PST [3054823] LOG: terminating any other active server processes 2024-01-22 12:31:34.444 PST [3054823] LOG: abnormal database system shutdown 2024-01-22 12:31:34.469 PST [3054823] LOG: database system is shut down 2024-01-22 12:31:34.555 PST [3055133] LOG: starting PostgreSQL 17devel on x86_64-linux, compiled by gcc-14.0.0, 64-bit 2024-01-22 12:31:34.555 PST [3055133] LOG: listening on Unix socket "/tmp/p6XG0kQW9w/.s.PGSQL.60402" 2024-01-22 12:31:34.557 PST [3055148] LOG: database system was interrupted; last known up at 2024-01-22 12:31:33 PST 2024-01-22 12:31:34.557 PST [3055148] LOG: database system was not properly shut down; automatic recovery in progress 2024-01-22 12:31:34.558 PST [3055148] LOG: redo starts at 0/1531090 2024-01-22 12:31:34.559 PST [3055148] LOG: invalid record length at 0/1555F60: expected at least 24, got 0 2024-01-22 12:31:34.559 PST [3055148] LOG: redo done at 0/1555F38 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2024-01-22 12:31:34.559 PST [3055146] LOG: checkpoint starting: end-of-recovery immediate wait 2024-01-22 12:31:34.570 PST [3055146] LOG: checkpoint complete: wrote 42 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.011 s; sync files=0, longest=0.000 s, average=0.000 s; distance=147 kB, estimate=147 kB; lsn=0/1555F60, redo lsn=0/1555F60 2024-01-22 12:31:34.573 PST [3055133] LOG: database system is ready to accept connections ISTM that we shouldn't basically silently overlook shutdowns due to crashes in the tests. How to not do so is unfortunately not immediately obvious to me... Greetings, Andres Freund
Re: core dumps in auto_prewarm, tests succeed
On Mon, Jan 22, 2024 at 12:41:17PM -0800, Andres Freund wrote: > I noticed that I was getting core dumps while executing the tests, without the > tests failing. Backtraces are vriations of this: Looking, thanks for the heads-up. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Evaluate arguments of correlated SubPlans in the referencing ExprState
Hi, On 2024-01-22 10:30:22 +1100, Peter Smith wrote: > 2024-01 Commitfest. > > Hi, This patch has a CF status of "Ready for Committer", but it is > currently failing some CFbot tests [1]. Please have a look and post an > updated version.. I think this failure is independent of this patch - by coincidence I just sent an email about the issue https://www.postgresql.org/message-id/20240122204117.swton324xcoodnyi%40awork3.anarazel.de a few minutes ago. Greetings, Andres Freund
Re: pg_stat_statements and "IN" conditions
> On Mon, Jan 22, 2024 at 06:07:27PM +0100, Dmitry Dolgov wrote: > > Please notice that at the moment, it's not being tested at all because > > of a patch-apply failure -- that's what the little triangular symbol > > means. The rest of the display concerns the test results from the > > last successfully-applied patch version. (Perhaps that isn't a > > particularly great UI design.) > > > > If you click on the triangle you find out > > > > == Applying patches on top of PostgreSQL commit ID > > b0f0a9432d0b6f53634a96715f2666f6d4ea25a1 === > > === applying patch > > ./v17-0001-Prevent-jumbling-of-every-element-in-ArrayExpr.patch > > patching file contrib/pg_stat_statements/Makefile > > Hunk #1 FAILED at 19. > > 1 out of 1 hunk FAILED -- saving rejects to file > > contrib/pg_stat_statements/Makefile.rej > > patching file contrib/pg_stat_statements/expected/merging.out > > patching file contrib/pg_stat_statements/meson.build > > Oh, I see, thanks. Give me a moment, will fix this. Here is it. >From ac8a7c93fbb72c469ca7128280f52024adb860ab Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Mon, 22 Jan 2024 21:11:15 +0100 Subject: [PATCH v18 1/4] Prevent jumbling of every element in ArrayExpr pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is jumbled. In certain situations it's undesirable, especially if the list becomes too large. Make an array of Const expressions contribute only the first/last elements to the jumble hash. Allow to enable this behavior via the new pg_stat_statements parameter query_id_const_merge with the default value off. Reviewed-by: Zhihong Yu, Sergey Dudoladov, Robert Haas, Tom Lane, Michael Paquier, Sergei Kornilov, Alvaro Herrera, David Geier Tested-by: Chengxi Sun --- contrib/pg_stat_statements/Makefile | 2 +- .../pg_stat_statements/expected/merging.out | 167 ++ contrib/pg_stat_statements/meson.build| 1 + .../pg_stat_statements/pg_stat_statements.c | 62 ++- contrib/pg_stat_statements/sql/merging.sql| 58 ++ doc/src/sgml/pgstatstatements.sgml| 57 +- src/backend/nodes/gen_node_support.pl | 21 ++- src/backend/nodes/queryjumblefuncs.c | 100 ++- src/backend/postmaster/postmaster.c | 3 + src/backend/utils/misc/postgresql.conf.sample | 1 - src/include/nodes/primnodes.h | 2 +- src/include/nodes/queryjumble.h | 9 +- 12 files changed, 458 insertions(+), 25 deletions(-) create mode 100644 contrib/pg_stat_statements/expected/merging.out create mode 100644 contrib/pg_stat_statements/sql/merging.sql diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 414a30856e4..03a62d685f3 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -19,7 +19,7 @@ LDFLAGS_SL += $(filter -lm, $(LIBS)) REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf REGRESS = select dml cursors utility level_tracking planning \ - user_activity wal entry_timestamp cleanup oldextversions + user_activity wal entry_timestamp cleanup oldextversions merging # Disabled because these tests require "shared_preload_libraries=pg_stat_statements", # which typical installcheck users do not have (e.g. buildfarm clients). NO_INSTALLCHECK = 1 diff --git a/contrib/pg_stat_statements/expected/merging.out b/contrib/pg_stat_statements/expected/merging.out new file mode 100644 index 000..1e58283afe8 --- /dev/null +++ b/contrib/pg_stat_statements/expected/merging.out @@ -0,0 +1,167 @@ +-- +-- Const merging functionality +-- +CREATE EXTENSION pg_stat_statements; +CREATE TABLE test_merge (id int, data int); +-- IN queries +-- No merging is performed, as a baseline result +SELECT pg_stat_statements_reset() IS NOT NULL AS t; + t +--- + t +(1 row) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9); + id | data ++-- +(0 rows) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); + id | data ++-- +(0 rows) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11); + id | data ++-- +(0 rows) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; +query| calls +-+--- + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) | 1 + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) | 1 + SELECT * FROM test_merge WHERE id IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11) | 1 + SELECT pg_stat_statements_reset() IS NOT NULL AS t
Re: Refactoring backend fork+exec code
Hi, On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote: > Here's a patch that gets rid of AuxProcType. It's independent of the other > patches in this thread; if this is committed, I'll rebase the rest of the > patches over this and get rid of the new PMC_* enum. > > Three patches, actually. The first one fixes an existing comment that I > noticed to be incorrect while working on this. I'll push that soon, barring > objections. The second one gets rid of AuxProcType, and the third one > replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and > IsAutoVacuumWorkerProcess() with checks on MyBackendType. So MyBackendType > is now the primary way to check what kind of a process the current process > is. > > 'am_walsender' would also be fairly straightforward to remove and replace > with AmWalSenderProcess(). I didn't do that because we also have > am_db_walsender and am_cascading_walsender which cannot be directly replaced > with MyBackendType. Given that, it might be best to keep am_walsender for > symmetry. > @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn); > static void ShmemBackendArrayRemove(Backend *bn); > #endif /* EXEC_BACKEND > */ > > -#define StartupDataBase()StartChildProcess(StartupProcess) > -#define StartArchiver() > StartChildProcess(ArchiverProcess) > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess) > -#define StartCheckpointer() StartChildProcess(CheckpointerProcess) > -#define StartWalWriter() StartChildProcess(WalWriterProcess) > -#define StartWalReceiver() StartChildProcess(WalReceiverProcess) > -#define StartWalSummarizer() StartChildProcess(WalSummarizerProcess) > +#define StartupDataBase()StartChildProcess(B_STARTUP) > +#define StartArchiver() StartChildProcess(B_ARCHIVER) > +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER) > +#define StartCheckpointer() StartChildProcess(B_CHECKPOINTER) > +#define StartWalWriter() StartChildProcess(B_WAL_WRITER) > +#define StartWalReceiver() StartChildProcess(B_WAL_RECEIVER) > +#define StartWalSummarizer() StartChildProcess(B_WAL_SUMMARIZER) Not for this commit, but we ought to rip out these macros - all they do is to make it harder to understand the code. > @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type) > errno = save_errno; > switch (type) > { > - case StartupProcess: > + case B_STARTUP: > ereport(LOG, > (errmsg("could not fork startup > process: %m"))); > break; > - case ArchiverProcess: > + case B_ARCHIVER: > ereport(LOG, > (errmsg("could not fork > archiver process: %m"))); > break; > - case BgWriterProcess: > + case B_BG_WRITER: > ereport(LOG, > (errmsg("could not fork > background writer process: %m"))); > break; > - case CheckpointerProcess: > + case B_CHECKPOINTER: > ereport(LOG, > (errmsg("could not fork > checkpointer process: %m"))); > break; > - case WalWriterProcess: > + case B_WAL_WRITER: > ereport(LOG, > (errmsg("could not fork WAL > writer process: %m"))); > break; > - case WalReceiverProcess: > + case B_WAL_RECEIVER: > ereport(LOG, > (errmsg("could not fork WAL > receiver process: %m"))); > break; > - case WalSummarizerProcess: > + case B_WAL_SUMMARIZER: > ereport(LOG, > (errmsg("could not fork WAL > summarizer process: %m"))); > break; Seems we should replace this with something slightly more generic one of these days... > diff --git a/src/backend/utils/activity/backend_status.c > b/src/backend/utils/activity/backend_status.c > index 1a1050c8da1..92f24db4e18 100644 > --- a/src/backend/utils/activity/backend_status.c > +++ b/src/backend/utils/activity/backend_status.c > @@ -257,17 +257,16 @@ pgstat_beinit(void) > else > { > /* Must be an auxiliary process */ > - Assert(MyAuxP
Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan wrote: > Thank you for your replies so far. > On Thu, Jan 18, 2024 at 11:39 AM Matthias van de Meent > wrote: > > I would agree with you if this was about new APIs and features, but > > here existing APIs are being repurposed without changing them. A > > maintainer of index AMs would not look at the commit title 'Enhance > > nbtree ScalarArrayOp execution' and think "oh, now I have to make sure > > my existing amsearcharray+amcanorder handling actually supports > > non-prefix arrays keys and return data in index order". > > This is getting ridiculous. > > It is quite likely that there are exactly zero affected out-of-core > index AMs. I don't count pgroonga as a counterexample (I don't think > that it actually fullfills the definition of a ). Basically, > "amcanorder" index AMs more or less promise to be compatible with > nbtree, down to having the same strategy numbers. So the idea that I'm > going to upset amsearcharray+amcanorder index AM authors is a > completely theoretical problem. The planner code evolved with nbtree, > hand-in-glove. And this is where I disagree with your (percieved) implicit argument that this should be and always stay this way. I don't mind changes in the planner for nbtree per se, but as I've mentioned before in other places, I really don't like how we handle amcanorder as if it were amisbtree. But it's not called that, so we shouldn't expect that to remain the case; and definitely not keep building on those expectations that it always is going to be the nbtree when amcanorder is set (or amsearcharray is set, or ..., or any combination of those that is currently used by btree). By keeping that expectation alive, this becomes a self-fulfilling prophecy, and I really don't like such restrictive self-fulfilling prophecies. It's nice that we have index AM feature flags, but if we only effectively allow one implementation for this set of flags by ignoring potential other users when changing behavior, then what is the API good for (apart from abstraction layering, which is nice)? /rant I'll see about the > I'm more than happy to add a reminder to the commit message about > adding a proforma listing to the compatibility section of the Postgres > 17 release notes. Can we actually agree on which theoretical index AM > types are affected first, though? Isn't it actually > amsearcharray+amcanorder+amcanmulticol external index AMs only? Do I > have that right? I think that may be right, but I could very well have built a btree-lite that doesn't have the historical baggage of having to deal with pages from before v12 (version 4) and some improvements that haven't made it to core yet. > BTW, how should we phrase this compatibility note, so that it can be > understood? It's rather awkward. Something like the following, maybe? """ Compatibility: The planner will now generate paths with array scan keys in any column for ordered indexes, rather than on only a prefix of the index columns. The planner still expects fully ordered data from those indexes. Historically, the btree AM couldn't output correctly ordered data for suffix array scan keys, which was "fixed" by prohibiting the planner from generating array scan keys without an equality prefix scan key up to that attribute. In this commit, the issue has been fixed, and the planner restriction has thus been removed as the only in-core IndexAM that reports amcanorder now supports array scan keys on any column regardless of what prefix scan keys it has. """ > > > As I said to Heikki, thinking about this some more is on my todo list. > > > I mean the way that this worked had substantial revisions on HEAD > > > right before I posted v9. v9 was only to fix the bit rot that that > > > caused. > > > > Then I'll be waiting for that TODO item to be resolved. > > My TODO item is to resolve the question of whether and to what extent > these two optimizations should be combined. It's possible that I'll > decide that it isn't worth it, for whatever reason. That's fine, this decision (and any work related to it) is exactly what I was referring to with that mention of the TODO. > > > Even single digit > > > thousand of array elements should be considered huge. Plus this code > > > path is only hit with poorly written queries. > > > > Would you accept suggestions? > > You mean that you want to have a go at it yourself? Sure, go ahead. > > I cannot promise that I'll accept your suggested revisions, but if > they aren't too invasive/complicated compared to what I have now, then > I will just accept them. I maintain that this isn't really necessary, > but it might be the path of least resistance at this point. Attached 2 patches: v11.patch-a and v11.patch-b. Both are incremental on top of your earlier set, and both don't allocate additional memory in the merge operation in non-assertion builds. patch-a is a trivial and clean implementation of mergesort, which tends to reduce the total number of compare
Re: core dumps in auto_prewarm, tests succeed
On Mon, Jan 22, 2024 at 02:44:57PM -0600, Nathan Bossart wrote: > On Mon, Jan 22, 2024 at 12:41:17PM -0800, Andres Freund wrote: >> I noticed that I was getting core dumps while executing the tests, without >> the >> tests failing. Backtraces are vriations of this: > > Looking, thanks for the heads-up. I think this is because the autoprewarm state was moved to a DSM segment, and DSM segments are detached before the on_shmem_exit callbacks are called during process exit. Moving apw_detach_shmem to the before_shmem_exit list seems to resolve the crashes. diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index 9ea6c2252a..88c3a04109 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -165,7 +165,7 @@ autoprewarm_main(Datum main_arg) first_time = false; /* Set on-detach hook so that our PID will be cleared on exit. */ -on_shmem_exit(apw_detach_shmem, 0); +before_shmem_exit(apw_detach_shmem, 0); /* * Store our PID in the shared memory area --- unless there's already -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: core dumps in auto_prewarm, tests succeed
Hi, On 2024-01-22 15:19:36 -0600, Nathan Bossart wrote: > On Mon, Jan 22, 2024 at 02:44:57PM -0600, Nathan Bossart wrote: > > On Mon, Jan 22, 2024 at 12:41:17PM -0800, Andres Freund wrote: > >> I noticed that I was getting core dumps while executing the tests, without > >> the > >> tests failing. Backtraces are vriations of this: > > > > Looking, thanks for the heads-up. > > I think this is because the autoprewarm state was moved to a DSM segment, > and DSM segments are detached before the on_shmem_exit callbacks are called > during process exit. Moving apw_detach_shmem to the before_shmem_exit list > seems to resolve the crashes. That seems plausible. Would still be nice to have at least this test ensure that the shutdown code works. Perhaps just a check of the control file after shutdown, ensuring that the state is "shutdowned" vs crashed? Greetings, Andres Freund
Re: Network failure may prevent promotion
Hi, On 2024-01-19 12:28:05 +0900, Michael Paquier wrote: > On Thu, Jan 18, 2024 at 03:42:28PM +0200, Heikki Linnakangas wrote: > > Given that commit 728f86fec6 that introduced this issue was not strictly > > required, perhaps we should just revert it for v16. > > Is there a point in keeping 728f86fec6 as well on HEAD? That does not > strike me as wise to keep that in the tree for now. If it needs to be > reworked, looking at this problem from scratch would be a safer > approach. IDK, I think we'll introduce this type of bug over and over if we don't fix it properly. Greetings, Andres Freund
Re: Improving EXPLAIN's display of SubPlan nodes
I wrote: > The main thing that's still missing compared to what is in the plan > data structure is information about which Param is which. I think > we have the subplan output Params relatively well covered through > the expedient of listing them in the generated plan_name, but it's > still not apparent to me how we could shoehorn subplan input > Params into this (or whether it's worth doing). Actually ... it looks like it probably isn't worth doing, because it's already the case that we don't expose input Params as such. EXPLAIN searches for the referent of an input Param and displays that (cf find_param_referent()). Just for experimental purposes, I wrote a follow-on patch to add printout of the parParam and args list (attached, as .txt so the cfbot doesn't think it's a patch). This produces results like explain (verbose, costs off) select array(select sum(x+y) s from generate_series(1,3) y group by y order by s) from generate_series(1,3) x; QUERY PLAN --- Function Scan on pg_catalog.generate_series x Output: ARRAY(SubPlan 1 PASSING $0 := x.x) ^ added by delta patch Function Call: generate_series(1, 3) SubPlan 1 -> Sort Output: (sum((x.x + y.y))), y.y Sort Key: (sum((x.x + y.y))) -> HashAggregate Output: sum((x.x + y.y)), y.y ^^^ actual reference to $0 Group Key: y.y -> Function Scan on pg_catalog.generate_series y Output: y.y Function Call: generate_series(1, 3) (13 rows) As you can see, it's not necessary to explain what $0 is because that name isn't shown anywhere else --- the references to "x.x" in the subplan are actually uses of $0. So now I'm thinking that we do have enough detail in the present proposal, and we just need to think about whether there's some nicer way to present it than the particular spelling I used here. One idea I considered briefly is to pull the same trick with regards to output parameters --- that is, instead of adding all the "returns $n" annotations to subplans, drill down and print the subplan's relevant targetlist expression instead of "$n". On balance I think that might be more confusing not less so, though. SQL users are used to the idea that a sub-select can "see" variables from the outer query, but not at all vice-versa. I think it probably wouldn't be formally ambiguous, because ruleutils already de-duplicates table aliases across the whole tree, but it still seems likely to be confusing. Also, people are already pretty used to seeing $n to represent the outputs of InitPlans, and I've not heard many complaints suggesting that we should change that. regards, tom lane diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 9ee35640ba..b3da98a2f7 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -8911,9 +8911,30 @@ get_rule_expr(Node *node, deparse_context *context, break; } if (subplan->useHashTable) - appendStringInfo(buf, "HASHED %s)", subplan->plan_name); + appendStringInfo(buf, "HASHED %s", subplan->plan_name); else - appendStringInfo(buf, "%s)", subplan->plan_name); + appendStringInfo(buf, "%s", subplan->plan_name); + if (subplan->parParam) + { + ListCell *lc3; + ListCell *lc4; + bool first = true; + + appendStringInfoString(buf, " PASSING "); + forboth(lc3, subplan->parParam, lc4, subplan->args) + { + int paramid = lfirst_int(lc3); + Node *arg = (Node *) lfirst(lc4); + + if (first) + first = false; + else + appendStringInfoString(buf, ", "); + appendStringInfo(buf, "$%d := ", paramid); + get_rule_expr(arg, context, showimplicit); + } + } +
Re: core dumps in auto_prewarm, tests succeed
On Mon, Jan 22, 2024 at 01:24:54PM -0800, Andres Freund wrote: > On 2024-01-22 15:19:36 -0600, Nathan Bossart wrote: >> I think this is because the autoprewarm state was moved to a DSM segment, >> and DSM segments are detached before the on_shmem_exit callbacks are called >> during process exit. Moving apw_detach_shmem to the before_shmem_exit list >> seems to resolve the crashes. > > That seems plausible. Would still be nice to have at least this test ensure > that the shutdown code works. Perhaps just a check of the control file after > shutdown, ensuring that the state is "shutdowned" vs crashed? I'll give that a try. I'll also expand the comment above the before_shmem_exit() call. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: core dumps in auto_prewarm, tests succeed
On Mon, Jan 22, 2024 at 03:38:15PM -0600, Nathan Bossart wrote: > On Mon, Jan 22, 2024 at 01:24:54PM -0800, Andres Freund wrote: >> On 2024-01-22 15:19:36 -0600, Nathan Bossart wrote: >>> I think this is because the autoprewarm state was moved to a DSM segment, >>> and DSM segments are detached before the on_shmem_exit callbacks are called >>> during process exit. Moving apw_detach_shmem to the before_shmem_exit list >>> seems to resolve the crashes. >> >> That seems plausible. Would still be nice to have at least this test ensure >> that the shutdown code works. Perhaps just a check of the control file after >> shutdown, ensuring that the state is "shutdowned" vs crashed? > > I'll give that a try. I'll also expand the comment above the > before_shmem_exit() call. Here is a patch. This might be a topic for another thread, but I do wonder whether we could put a generic pg_controldata check in node->stop that would at least make sure that the state is some sort of expected shut-down state. My first thought is that it could be a tad expensive, but... maybe it wouldn't be too bad. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From b339a84802b76d42f4863e52e6d91ab28873e00a Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 22 Jan 2024 16:22:57 -0600 Subject: [PATCH v1 1/1] fix autoprewarm core dump --- contrib/pg_prewarm/autoprewarm.c | 10 -- contrib/pg_prewarm/t/001_basic.pl | 6 ++ 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index 9ea6c2252a..06ee21d496 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -164,8 +164,14 @@ autoprewarm_main(Datum main_arg) if (apw_init_shmem()) first_time = false; - /* Set on-detach hook so that our PID will be cleared on exit. */ - on_shmem_exit(apw_detach_shmem, 0); + /* + * Set on-detach hook so that our PID will be cleared on exit. + * + * NB: Autoprewarm's state is stored in a DSM segment, and DSM segments + * are detached before calling the on_shmem_exit callbacks, so we must put + * apw_detach_shmem in the before_shmem_exit callback list. + */ + before_shmem_exit(apw_detach_shmem, 0); /* * Store our PID in the shared memory area --- unless there's already diff --git a/contrib/pg_prewarm/t/001_basic.pl b/contrib/pg_prewarm/t/001_basic.pl index bcd23a6914..825d3448ee 100644 --- a/contrib/pg_prewarm/t/001_basic.pl +++ b/contrib/pg_prewarm/t/001_basic.pl @@ -55,4 +55,10 @@ $node->wait_for_log( $node->stop; +# control file should indicate normal shut down +command_like( + [ 'pg_controldata', $node->data_dir() ], + qr/Database cluster state:\s*shut down/, + 'cluster shut down normally'); + done_testing(); -- 2.25.1
Re: psql: Allow editing query results with \gedit
Re: David G. Johnston > Building off the other comments, I'd suggest trying to get rid of the > intermediate JSOn format and also just focus on a single row at any given > time. We need *some* machine-readable format. It doesn't have to be JSON, but JSON is actually pretty nice to read - and if values are too long, or there are too many values, switch to extended mode: select * from messages \gedit (expanded) [{ "id": "1", "language": "en", "message": "This is a very long test text with little actual meaning." },{ "id": "2", "language": "en", "message": "Another one, a bit shorter." }] I tweaked the indentation in the psql JSON output patch specifically to make it readable. Restricting to a single line might make sense if it helps editing, but I don't think it does. > For an update the first argument to the metacommand could be the unique key > value present in the previous result. The resultant UPDATE would just put > that into the where clause and every other column in the result would be a > SET clause column with the thing being set the current value, ready to be > edited. Hmm, then you would still have to cut-and-paste the PK value. If that that's a multi-column non-numeric key, you are basically back to the original problem. Re: Tom Lane > Yeah, that's something that was also bothering me, but I failed to > put my finger on it. "Here's some JSON, edit it, and don't forget > to keep the quoting correct" does not strike me as a user-friendly > way to adjust data content. A spreadsheet-like display where you > can change data within cells seems like a far better API, although > I don't want to write that either. Right. I wouldn't want a form editing system in there either. But perhaps this middle ground of using a well-established format that is easy to generate and to parse (it's using the JSON parser from pgcommon) makes it fit into psql. If parsing the editor result fails, the user is asked if they want to re-edit with a parser error message, and if they go to the editor again, the cursor is placed in the line where the error is. (Also, what's wrong with having to strictly adhere to some syntax, we are talking about SQL here.) It's admittedly larger than the average \backslash command, but it does fit into psql's interactive usage. \crosstabview is perhaps a similar thing - it doesn't really fit into a simple "send query and display result" client, but since it solves an actual problem, it makes well sense to spend the extra code on it. > This kind of API would not readily support INSERT or DELETE cases, but > TBH I think that's better anyway --- you're adding too much ambiguity > in pursuit of a very secondary use-case. The stated complaint was > "it's too hard to build UPDATE commands", which I can sympathize with. I've been using the feature already for some time, and it's a real relief. In my actual use case here, I use it on my ham radio logbook: =# select start, call, qrg, name from log where cty = 'CE9' order by start; start │ call │ qrg │ name ┼┼─┼─── 2019-03-12 20:34:00+00 │ RI1ANL │7.076253 │ ∅ 2021-03-16 21:24:00+00 │ DP0GVN │2400.395 │ Felix 2022-01-15 17:19:00+00 │ DP0GVN │ 2400.01 │ Felix 2022-10-23 19:17:15+00 │ DP0GVN │ 2400.041597 │ ∅ 2023-10-01 14:05:00+00 │ 8J1RL │ 28.182575 │ ∅ 2024-01-22 21:15:15+00 │ DP1POL │ 10.138821 │ ∅ (6 Zeilen) The primary key is (start, call). If I now want to note that the last contact with Antarctica there was also with Felix, I'd have to transform that into update log set name = 'Felix' where start = '2024-01-22 21:15:15+00' and call = 'DP1POL'; \gedit is just so much easier. UPDATE is the core feature. If we want to say INSERT and DELETE aren't supported, but UPDATE support can go in, that'd be fine with me. > (BTW, I wonder how much of this already exists in pgAdmin.) pgadmin seems to support it. (Most other clients don't.) Obviously, I would want to do the updating using the client I also use for querying. Christoph
Re: Things I don't like about \du's "Attributes" column
On Sun, Jan 21, 2024 at 2:35 PM Pavel Luzanov wrote: > Another approach based on early suggestions. > > The Attributes column includes only the enabled logical attributes. > Regardless of whether the attribute is enabled by default or not. > > > The attribute names correspond to the keywords of the CREATE ROLE command. > The attributes are listed in the same order as in the documentation. > (I think that the LOGIN attribute should be moved to the first place, > both in the documentation and in the command.) > > I'd just flip INHERIT and LOGIN > The "Connection limit" and "Valid until" attributes are placed in separate > columns. > The "Password?" column has been added. > > Sample output. > > Patch v3: > =# \du > List of roles > Role name |Attributes > | Password? | Valid until | Connection limit > ---+---+---++-- > admin | INHERIT > | no|| -1 > alice | SUPERUSER LOGIN > | yes | infinity |5 > bob | CREATEDB INHERIT LOGIN REPLICATION BYPASSRLS > | no| 2022-01-01 00:00:00+03 | -1 > charlie | CREATEROLE INHERIT LOGIN > | yes ||0 > postgres | SUPERUSER CREATEDB CREATEROLE INHERIT LOGIN REPLICATION > BYPASSRLS | no|| -1 > (5 rows) > > > Small modification with newline separator for Attributes column: > > Patch v3 with newlines: > =# \du > List of roles > Role name | Attributes | Password? | Valid until | Connection > limit > ---+-+---++-- > postgres | SUPERUSER +| no|| > -1 >| CREATEDB +| || >| CREATEROLE +| || >| INHERIT+| || >| LOGIN +| || >| REPLICATION+| || >| BYPASSRLS | || > (5 rows) > > I'm strongly in favor of using mixed-case for the attributes. The SQL Command itself doesn't care about capitalization and it is much easier on the eyes. I'm also strongly in favor of newlines, as can be seen by the default bootstrap superuser entry putting everything on one line eats up 65 characters. List of roles Role name | Attributes | Password? | Valid until | Connection limit | Description ---+-+---+-+--+- davidj| Superuser +| no| | -1 | | CreateDB +| | | | | CreateRole +| | | | | Inherit+| | | | | Login +| | | | | Replication+| | | | | BypassRLS | | | | (1 row) As noted by Peter this patch didn't update the two affected expected output files. psql.out and, due to the system view change, rules.out. That particular change requires a documentation update to the pg_roles system view page. I'd suggest pulling out this system view change into its own patch. I will take another pass later when I get some more time. I want to re-review some of the older messages. But the tweaks I show and breaking out the view changes in to a separate patch both appeal to me right now. David J.
Re: Built-in CTYPE provider
On Mon, 2024-01-22 at 19:49 +0100, Peter Eisentraut wrote: > > > I don't get this argument. Of course, people care about sorting and > sort order. Whether you consider this part of Unicode or adjacent to > it, people still want it. You said that my proposal sends a message that we somehow don't care about Unicode, and I strongly disagree. The built-in provider I'm proposing does implement Unicode semantics. Surely a database that offers UCS_BASIC (a SQL spec feature) isn't sending a message that it doesn't care about Unicode, and neither is my proposal. > > > > * ICU offers COLLATE UNICODE, locale tailoring, case-insensitive > > matching, and customization with rules. It's the solution for > > everything from "slightly more advanced" to "very advanced". > > I am astonished by this. In your world, do users not want their text > data sorted? Do they not care what the sort order is? I obviously care about Unicode and collation. I've put a lot of effort recently into contributions in this area, and I wouldn't have done that if I thought users didn't care. You've made much greater contributions and I thank you for that. The logical conclusion of your line of argument would be that libc's "C.UTF-8" locale and UCS_BASIC simply should not exist. But they do exist, and for good reason. One of those good reasons is that only *human* users care about the human-friendliness of sort order. If Postgres is just feeding the results to another system -- or an application layer that re-sorts the data anyway -- then stability, performance, and interoperability matter more than human-friendliness. (Though Unicode character semantics are still useful even when the data is not going directly to a human.) > You consider UCA > sort order an "advanced" feature? I said "slightly more advanced" compared with "basic". "Advanced" can be taken in either a positive way ("more useful") or a negative way ("complex"). I'm sorry for the misunderstanding, but my point was this: * The builtin provider is for people who are fine with code point order and no tailoring, but want Unicode character semantics, collation stability, and performance. * ICU is the right solution for anyone who wants human-friendly collation or tailoring, and is willing to put up with some collation stability risk and lower collation performance. Both have their place and the user is free to mix and match as needed, thanks to the COLLATE clause for columns and queries. Regards, Jeff Davis