Re: Retry in pgbench
On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii wrote: > Currently standard pgbench scenario produces transaction serialize > errors "could not serialize access due to concurrent update" if > PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the > session aborts. In order to achieve meaningful results even in these > transaction isolation levels, I would like to propose an automatic > retry feature if "could not serialize access due to concurrent update" > error occurs. > > Probably just adding a switch to retry is not enough, maybe retry > method (random interval etc.) and max retry number are needed to be > added. > > I would like to hear your thoughts, See also: https://www.postgresql.org/message-id/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru
Re: Retry in pgbench
> On Tue, Apr 13, 2021 at 5:51 PM Tatsuo Ishii wrote: >> Currently standard pgbench scenario produces transaction serialize >> errors "could not serialize access due to concurrent update" if >> PostgreSQL runs in REPEATABLE READ or SERIALIZABLE level, and the >> session aborts. In order to achieve meaningful results even in these >> transaction isolation levels, I would like to propose an automatic >> retry feature if "could not serialize access due to concurrent update" >> error occurs. >> >> Probably just adding a switch to retry is not enough, maybe retry >> method (random interval etc.) and max retry number are needed to be >> added. >> >> I would like to hear your thoughts, > > See also: > > https://www.postgresql.org/message-id/flat/72a0d590d6ba06f242d75c2e641820ec%40postgrespro.ru Thanks for the pointer. It seems we need to resume the discussion. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: TRUNCATE on foreign table
On 2021/04/13 14:22, Kohei KaiGai wrote: Let me remind the discussion at the design level. If postgres_fdw (and other FDW drivers) needs to consider whether ONLY-clause is given on the foreign tables of them, what does a foreign table represent in PostgreSQL system? My assumption is, a foreign table provides a view to external data, as if it performs like a table. TRUNCATE command eliminates all the segment files, even if a table contains multiple underlying files, never eliminate them partially. If a foreign table is equivalent to a table in SQL operation level, indeed, ONLY-clause controls which tables are picked up by the TRUNCATE command, but never controls which portion of the data shall be eliminated. So, I conclude that ExecForeignTruncate() shall eliminate the entire external data on behalf of a foreign table, regardless of ONLY-clause. I think it is more significant to clarify prior to the implementation details. How about your opinions? I'm still thinking that it's better to pass all information including ONLY clause about TRUNCATE command to FDW and leave FDW to determine how to use them. How postgres_fdw should use the information about ONLY is debetable. But for now IMO that users who explicitly specify ONLY clause for foreign tables understand the structure of remote tables and want to use ONLY in TRUNCATE command issued by postgres_fdw. But my opinion might be minority, so I'd like to hear more opinion about this, from other developers. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: pgsql: Move tablespace path re-creation from the makefiles to pg_regres
On Mon, Apr 12, 2021 at 10:36:10PM -0700, Noah Misch wrote: > Christoph Berg's first message on this thread reported doing that. If > supporting server_user!=pg_regress_user is unwarranted and Christoph Berg > should stop, then already-committed code suffices. Not sure that we have ever claimed that. It is unfortunate that this has broken a case that used to work, perhaps accidentally. At the same time, Christoph has a workaround for the Debian suite, so it does not seem like there is anything to do now, anyway. > The race that commit 6c788d9 fixed is not inherent to Makefile rules. For > example, you could have instead caused test.sh to issue 'make > installcheck-parallel TABLESPACEDIR="$outputdir"/testtablespace' and have the > makefiles consult $(TABLESPACEDIR) rather than hard-code ./testtablespace. > (That said, I like how 6c788d9 consolidated Windows and non-Windows paths.) FWIW, I don't really want to split again this code path across platforms. Better to have one to rule them all. > I don't know. I never considered server_user!=pg_regress_user before this > thread, and I don't plan to use it myself. Your latest patch originated to > make that case work, and my last message was reporting that the patch works > for a rather-narrow interpretation of server_user!=pg_regress_user, failing on > variations thereof. That might be fine. Okay. So.. As I am not sure if there is anything that needs to be acted on here for the moment, I would just close the case. If there are more voices in favor of the SQL function using mkdir(), I would not object to use that, as that looks to work for all the cases where pg_regress is used. -- Michael signature.asc Description: PGP signature
Re: [PATCH] force_parallel_mode and GUC categories
On Mon, Apr 12, 2021 at 01:40:52AM -0400, Tom Lane wrote: > Michael Paquier writes: >> However, I'd like to think that we can do better than what's proposed >> in the patch. There are a couple of things to consider here: >> - Should the parameter be renamed to reflect that it should only be >> used for testing purposes? > > -1 to that part, because it would break a bunch of buildfarm animals' > configurations. I doubt that any gain in clarity would be worth it. Okay. >> - Should we make more general the description of the developer options >> in the docs? > > Perhaps ... what did you have in mind? The first sentence of the page now says that: "The following parameters are intended for work on the PostgreSQL source code, and in some cases to assist with recovery of severely damaged databases." That does not stick with force_parallel_mode IMO. Maybe: "The following parameters are intended for development work related to PostgreSQL. Some of them work on the PostgreSQL source code, some of them can be used to control the run-time behavior of the server, and in some cases they can be used to assist with the recovery of severely damaged databases." -- Michael signature.asc Description: PGP signature
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Mon, Apr 12, 2021 at 11:29:28AM +0530, Bharath Rupireddy wrote: > I changed to 5min. If at all there's any server that would take more > than 5min to remove a process from the system processes list, then it > would see a warning on timeout. Looks fine to me. Let's wait a bit first to see if Fujii-san has any objections to this cleanup as that's his code originally, from 32a9c0bd. -- Michael signature.asc Description: PGP signature
Re: vacuum freeze - possible improvements
Thanks Masahiko for the response. "What is the use case where users want to freeze fewer transactions, meaning invoking anti-wraparound frequently?" My overall focus here is anti wraparound vacuum on huge tables in emergency situations (where we reached very close to 2B transactions or already in outage window). In this situation we want to recover ASAP instead of having many hours of outage.The Purpose of increasing "vacuum_freeze_min_age" to high value is that anti wraparound vacuum will have to do less work because we are asking less transactions/tuples to freeze (Of Course subsequent vacuum has to do the remaining work). "So the vacuum freeze will still have to process tuples that are inserted/modified during consuming 1 billion transactions. It seems to me that it’s not fewer transactions." Yes another thing here is anti wraparound vacuum also cleans dead tuples but i am not sure what we can do to avoid that. There can be vacuum to only freeze the tulpes? Thanks for sharing PG14 improvements, those are nice to have. But still the anti wraparound vacuum will have to scan all the pages (from visibility map) even if we are freezing fewer transactions because currently there is no way to know what block/tuple contains which transaction id. If there is a way then it would be easier to directly freeze those tuples quickly and advance the relfrozenxid for the table. On Tue, Apr 13, 2021 at 7:52 AM Masahiko Sawada wrote: > On Mon, Apr 12, 2021 at 5:38 PM Virender Singla > wrote: > > > > Hi Postgres Community, > > > > Regarding anti wraparound vacuums (to freeze tuples), I see it has to > scan all the pages which are not frozen-all (looking at visibility map). > That means even if we want to freeze less transactions only (For ex - by > increasing parameter vacuum_freeze_min_age to 1B), still it will scan all > the pages in the visibility map and a time taking process. > > If vacuum_freeze_min_age is 1 billion, autovacuum_freeze_max_age is 2 > billion (vacuum_freeze_min_age is limited to the half of > autovacuum_freeze_max_age). So vacuum freeze will still have to > process tuples that are inserted/modified during consuming 1 billion > transactions. It seems to me that it’s not fewer transactions. What is > the use case where users want to freeze fewer transactions, meaning > invoking anti-wraparound frequently? > > > > > Can there be any improvement on this process so VACUUM knows the > tuple/pages of those transactions which need to freeze up. > > > > Benefit of such an improvement is that if we are reaching transaction id > close to 2B (and downtime), that time we can quickly recover the database > with vacuuming freeze only a few millions rows with quick lookup rather > than going all the pages from visibility map. > > Apart from this idea, in terms of speeding up vacuum, > vacuum_failsafe_age parameter, introduced to PG14[1], would also be > helpful. When the failsafe is triggered, cost-based delay is no longer > be applied, and index vacuuming is bypassed in order to finish vacuum > work and advance relfrozenxid as quickly as possible. > > Regards > > [1] > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e55e7d1755cefbb44982fbacc7da461fa8684e6 > > -- > Masahiko Sawada > EDB: https://www.enterprisedb.com/ >
Unresolved repliaction hang and stop problem.
Hello, After upgrading the cluster from 10.x to 13.1 we've started getting a problem describe pgsql-general: https://www.postgresql.org/message-id/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com We've noticed similar issue being described on this list in https://www.postgresql-archive.org/Logical-replication-CPU-bound-with-TRUNCATE-DROP-CREATE-many-tables-tt6155123.html with a fix being rolled out in 13.2. After the 13.2 release, we've upgraded to it and unfortunately this did not solve the issue - the replication still stalls just as described in the original issue. Please advise, how to debug and solve this issue.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Mon, Apr 12, 2021 at 10:12:46PM -0400, Bruce Momjian wrote: > On Thu, Apr 8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote: > > On Thu, Apr 8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote: > > > On 2021-Apr-08, Bruce Momjian wrote: > > > > > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > > > long the query will take --- having one using an underscore and another > > > > one not seems odd. > > > > > > OK. So far, you have one vote for queryid (your own) and two votes for > > > query_id (mine and Julien's). And even yourself were hesitating about > > > it earlier in the thread. > > > > OK, if people are fine with pg_stat_activity.query_id not matching > > pg_stat_statements.queryid, I am fine with that. I just don't want > > someone to say it was a big mistake later. ;-) > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > have not changed any of the APIs which existed before this feature was > added, and are called "queryid" or "queryId" --- it is kind of a mess. > I assume I should leave those unchanged. It will also need a catversion > bump. - uint64 st_queryid; + uint64 st_query_id; I thought we would internally keep queryid/queryId, at least for the variable names as this is the name of the saved field in PlannedStmt. -extern void pgstat_report_queryid(uint64 queryId, bool force); +extern void pgstat_report_query_id(uint64 queryId, bool force); But if we don't then it should be "uint64 query_id".
Re: Replication slot stats misgivings
On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada wrote: > > On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila wrote: > > > > On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada > > wrote: > > > > > > On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila > > > wrote: > > > > > > > > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > It seems Vignesh has changed patches based on the latest set of > > > > > > > > comments so you might want to rebase. > > > > > > > > > > > > > > I've merged my patch into the v6 patch set Vignesh submitted. > > > > > > > > > > > > > > I've attached the updated version of the patches. I didn't change > > > > > > > anything in the patch that changes char[NAMEDATALEN] to NameData > > > > > > > (0001 > > > > > > > patch) and patches that add tests. > > > > > > > > > > > > > > > > > > > I think we can push 0001. What do you think? > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > In 0003 patch I reordered the > > > > > > > output parameters of pg_stat_replication_slots; showing total > > > > > > > number > > > > > > > of transactions and total bytes followed by statistics for > > > > > > > spilled and > > > > > > > streamed transactions seems appropriate to me. > > > > > > > > > > > > > > > > > > > I am not sure about this because I think we might want to add some > > > > > > info of stream/spill bytes in total_bytes description (something > > > > > > like > > > > > > stream/spill bytes are not in addition to total_bytes). > > > > > > BTW doesn't it confuse users that stream/spill bytes are not in > > > addition to total_bytes? User will need to do "total_bytes + > > > spill/stream_bytes" to know the actual total amount of data sent to > > > the decoding output plugin, is that right? > > > > > > > No, total_bytes includes the spill/stream bytes. So, the user doesn't > > need to do any calculation to compute totel_bytes sent to output > > plugin. > > The following test for the latest v8 patch seems to show different. > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > something? > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > 'test_decoding'); > pg_create_logical_replication_slot > > (s,0/1884468) > (1 row) > > postgres(1:85969)=# create table a (i int); > CREATE TABLE > postgres(1:85969)=# insert into a select generate_series(1, 10); > INSERT 0 10 > postgres(1:85969)=# set logical_decoding_work_mem to 64; > SET > postgres(1:85969)=# select * from pg_stat_replication_slots ; > slot_name | total_txns | total_bytes | spill_txns | spill_count | > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > ---++-++-+-+-+--+--+- > s | 0 | 0 | 0 | 0 | > 0 | 0 |0 |0 | > (1 row) > > postgres(1:85969)=# select count(*) from > pg_logical_slot_peek_changes('s', NULL, NULL); > count > > 14 > (1 row) > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > slot_name | total_txns | total_bytes | spill_txns | spill_count | > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > ---++-++-+-+-+--+--+- > s | 2 |1808 | 1 | 202 | > 1320 | 0 |0 |0 | > (1 row) > Thanks for identifying this issue, while spilling the transactions reorder buffer changes gets released, we will not be able to get the total size for spilled transactions from reorderbuffer size. I have fixed it by including spilledbytes to totalbytes in case of spilled transactions. Attached patch has the fix for this. Thoughts? Regards, Vignesh From f22ecb366a461a5203ec07dde1450f2c269ee675 Mon Sep 17 00:00:00 2001 From: vignesh Date: Sat, 10 Apr 2021 08:14:52 +0530 Subject: [PATCH v9 1/5] Changed char datatype to NameData datatype for slotname. Changed char datatype to NameData datatype for slotname. --- src/backend/postmaster/pgstat.c | 32 +++ src/backend/replication/logical/logical.c | 13 ++--- src/backend/replication/slot.c| 7 - src/backend/utils/adt/pgstatfuncs.c | 2 +- src/include/pgstat.h | 11 +++- 5 files changed, 36 insertions(+), 29 deletions(-) diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index f4467625f7..666ce95d08 100644 --- a/src/backend/pos
Re: TRUNCATE on foreign table
At Tue, 13 Apr 2021 16:17:12 +0900, Fujii Masao wrote in > > > On 2021/04/13 14:22, Kohei KaiGai wrote: > > Let me remind the discussion at the design level. > > If postgres_fdw (and other FDW drivers) needs to consider whether > > ONLY-clause is given > > on the foreign tables of them, what does a foreign table represent in > > PostgreSQL system? > > My assumption is, a foreign table provides a view to external data, as > > if it performs like a table. > > TRUNCATE command eliminates all the segment files, even if a table > > contains multiple > > underlying files, never eliminate them partially. > > If a foreign table is equivalent to a table in SQL operation level, > > indeed, ONLY-clause controls > > which tables are picked up by the TRUNCATE command, but never controls > > which portion of > > the data shall be eliminated. So, I conclude that > > ExecForeignTruncate() shall eliminate the entire > > external data on behalf of a foreign table, regardless of ONLY-clause. > > I think it is more significant to clarify prior to the implementation > > details. > > How about your opinions? > > I'm still thinking that it's better to pass all information including > ONLY clause about TRUNCATE command to FDW and leave FDW to determine > how to use them. How postgres_fdw should use the information about > ONLY > is debetable. But for now IMO that users who explicitly specify ONLY > clause for > foreign tables understand the structure of remote tables and want to > use ONLY > in TRUNCATE command issued by postgres_fdw. But my opinion might be > minority, > so I'd like to hear more opinion about this, from other developers. >From the syntactical point of view, my opinion on this is that the "ONLY" in "TRUNCATE ONLY" is assumed to be consumed to tell to disregard local children so it cannot be propagate further whichever the target relation has children or not. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRUNCATE on foreign table
2021年4月13日(火) 16:17 Fujii Masao : > > On 2021/04/13 14:22, Kohei KaiGai wrote: > > Let me remind the discussion at the design level. > > > > If postgres_fdw (and other FDW drivers) needs to consider whether > > ONLY-clause is given > > on the foreign tables of them, what does a foreign table represent in > > PostgreSQL system? > > > > My assumption is, a foreign table provides a view to external data, as > > if it performs like a table. > > TRUNCATE command eliminates all the segment files, even if a table > > contains multiple > > underlying files, never eliminate them partially. > > If a foreign table is equivalent to a table in SQL operation level, > > indeed, ONLY-clause controls > > which tables are picked up by the TRUNCATE command, but never controls > > which portion of > > the data shall be eliminated. So, I conclude that > > ExecForeignTruncate() shall eliminate the entire > > external data on behalf of a foreign table, regardless of ONLY-clause. > > > > I think it is more significant to clarify prior to the implementation > > details. > > How about your opinions? > > I'm still thinking that it's better to pass all information including > ONLY clause about TRUNCATE command to FDW and leave FDW to determine > how to use them. How postgres_fdw should use the information about ONLY > is debetable. But for now IMO that users who explicitly specify ONLY clause > for > foreign tables understand the structure of remote tables and want to use ONLY > in TRUNCATE command issued by postgres_fdw. But my opinion might be minority, > so I'd like to hear more opinion about this, from other developers. > Here are two points to discuss. Regarding to the FDW-APIs, yes, nobody can deny someone want to implement their own FDW module that adds special handling when its foreign table is specified with ONLY-clause, even if we usually ignore. On the other hand, when we consider a foreign table is an abstraction of an external data source, at least, the current postgres_fdw's behavior is not consistent. When a foreign table by postgres_fdw that maps a remote parent table, has a local child table, This command shows all the rows from both of local and remote. postgres=# select * from f_table ; id | v +- 1 | remote table t_parent id=1 2 | remote table t_parent id=2 3 | remote table t_parent id=3 10 | remote table t_child1 id=10 11 | remote table t_child1 id=11 12 | remote table t_child1 id=12 20 | remote table t_child2 id=20 21 | remote table t_child2 id=21 22 | remote table t_child2 id=22 50 | it is l_child id=50 51 | it is l_child id=51 52 | it is l_child id=52 53 | it is l_child id=53 (13 rows) If f_table is specified with "ONLY", it picks up only the parent table (f_table), however, ONLY-clause is not push down to the remote side. postgres=# select * from only f_table ; id | v +- 1 | remote table t_parent id=1 2 | remote table t_parent id=2 3 | remote table t_parent id=3 10 | remote table t_child1 id=10 11 | remote table t_child1 id=11 12 | remote table t_child1 id=12 20 | remote table t_child2 id=20 21 | remote table t_child2 id=21 22 | remote table t_child2 id=22 (9 rows) On the other hands, TRUNCATE ONLY f_table works as follows... postgres=# truncate only f_table; TRUNCATE TABLE postgres=# select * from f_table ; id | v +- 10 | remote table t_child1 id=10 11 | remote table t_child1 id=11 12 | remote table t_child1 id=12 20 | remote table t_child2 id=20 21 | remote table t_child2 id=21 22 | remote table t_child2 id=22 50 | it is l_child id=50 51 | it is l_child id=51 52 | it is l_child id=52 53 | it is l_child id=53 (10 rows) It eliminates the rows only from the remote parent table although it is a part of the foreign table. My expectation at the above command shows rows from the local child table (id=50...53). Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Performance Evaluation of Result Cache by using TPC-DS
Hello, Recently, the result cache feature was committed to PostgreSQL. I tested its performance by executing TPC-DS. As a result, I found that there were some regressions in the query performance. I used the TPC-DS scale factor 100 in the evaluation. I executed all of the 99 queries in the TPC-DS, and the result cache worked in the 21 queries of them. However, some queries took too much time, so I skipped their execution. I set work_mem to 256MB, and max_parallel_workers_per_gather to 0. Evaluation results are as follows. The negative speedup ratio indicates that the execution time increased by the result cache. Query No | Execution time with result cache | Execution time without result cache | Speedup ratio 7 142.1 142.20.03% 8 144.0 142.8 -0.82% 13 164.6 162.0 -1.65% 27 138.9 138.7 -0.16% 34 135.7 137.11.02% 43 209.5 207.2 -1.10% 48 181.5 170.7 -6.32% 55 130.6 123.8 -5.48% 61 0.014 0.037 62.06% 62 66.759.9 -11.36% 68 131.3 127.2 -3.17% 72 567.0 563.4 -0.65% 73 130.1 129.8 -0.29% 88 1044.5 1048.70.40% 911.2 1.1 -7.93% 96 132.2 131.7 -0.37% As you can see from these results, many queries have a negative speedup ratio, which means that there are negative impacts on the query performance. In query 62, the execution time increased by 11.36%. I guess these regressions are due to the misestimation of the cost in the planner. I attached the execution plan of query 62. The result cache is currently enabled by default. However, if this kind of performance regression is common, we have to change its default behavior. Best regards, Yuya Watari QUERY PLAN -- Limit (cost=2816266.54..2816267.79 rows=100 width=110) (actual time=59613.454..59613.473 rows=100 loops=1) -> Sort (cost=2816266.54..2816267.38 rows=336 width=110) (actual time=59613.453..59613.464 rows=100 loops=1) Sort Key: (substr((warehouse.w_warehouse_name)::text, 1, 20)), ship_mode.sm_type, web_site.web_name Sort Method: top-N heapsort Memory: 49kB -> HashAggregate (cost=2816248.24..2816252.44 rows=336 width=110) (actual time=59612.622..59612.730 rows=360 loops=1) Group Key: substr((warehouse.w_warehouse_name)::text, 1, 20), ship_mode.sm_type, web_site.web_name Batches: 1 Memory Usage: 157kB -> Hash Join (cost=2510.74..2794150.54 rows=368295 width=78) (actual time=7.597..45818.495 rows=14336926 loops=1) Hash Cond: (web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk) -> Hash Join (cost=2509.29..2792056.55 rows=368356 width=36) (actual time=7.571..38820.092 rows=14337390 loops=1) Hash Cond: (web_sales.ws_web_site_sk = web_site.web_site_sk) -> Hash Join (cost=2506.75..2790916.14 rows=368430 width=33) (actual time=7.554..35314.217 rows=14338265 loops=1) Hash Cond: (web_sales.ws_warehouse_sk = warehouse.w_warehouse_sk) -> Hash Join (cost=2505.41..2789674.19 rows=368516 width=20) (actual time=7.544..31214.782 rows=14340028 loops=1) Hash Cond: (web_sales.ws_ship_date_sk = date_dim.d_date_sk) -> Seq Scan on web_sales (cost=0.00..2598153.08 rows=72001808 width=20) (actual time=0.026..17405.391 rows=72001237 loops=1) -> Hash (cost=2500.73..2500.73 rows=374 width=4) (actual time=7.505..7.506 rows=365 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 21kB -> Seq Scan on date_dim (cost=0.00..2500.73 rows=374 width=4) (actual time=3.599..7.462 rows=365 loops=1) Filter: ((d_month_seq >= 1212) AND (d_month_seq <= 1223)) Rows Removed by Filter: 72684 -> Hash (cost=1.15..1.15 rows=15 width=21) (actual time=0.007..0.007 rows=15 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on warehouse (cost=0.00..1.15 rows=15 width=21) (actual time=0.003..0.004 rows=15 loops=1) -> Hash (cost=2.24..2.24 rows=24 width=11) (actual time=0.01
Re: Performance Evaluation of Result Cache by using TPC-DS
On Tue, 13 Apr 2021 at 21:29, Yuya Watari wrote: > I used the TPC-DS scale factor 100 in the evaluation. I executed all > of the 99 queries in the TPC-DS, and the result cache worked in the 21 > queries of them. However, some queries took too much time, so I > skipped their execution. I set work_mem to 256MB, and > max_parallel_workers_per_gather to 0. Many thanks for testing this. > As you can see from these results, many queries have a negative > speedup ratio, which means that there are negative impacts on the > query performance. In query 62, the execution time increased by > 11.36%. I guess these regressions are due to the misestimation of the > cost in the planner. I attached the execution plan of query 62. Can you share if these times were to run EXPLAIN ANALYZE or if they were just the queries being executed normally? The times in the two files you attached do look very similar to the times in your table, so I suspect either TIMING ON is not that high an overhead on your machine, or the results are that of EXPLAIN ANALYZE. It would be really great if you could show the EXPLAIN (ANALYZE, TIMING OFF) for query 62. There's a chance that the slowdown comes from the additional EXPLAIN ANALYZE timing overhead with the Result Cache version. > The result cache is currently enabled by default. However, if this > kind of performance regression is common, we have to change its > default behavior. Yes, the feedback we get during the beta period will help drive that decision or if the costing needs to be adjusted. David
Re: Old Postgresql version on i7-1165g7
Yura Sokolov писал 2021-04-09 16:28: Good day, hackers. I've got HP ProBook 640g8 with i7-1165g7. I've installed Ubuntu 20.04 LTS on it and started to play with PostgreSQL sources. Occasinally I found I'm not able to `make check` old Postgresql versions. At least 9.6 and 10. They are failed at the initdb stage in the call to postgresql. Raw postgresql version 9.6.8 and 10.0 fails in boostrap stage: running bootstrap script ... 2021-04-09 12:33:26.424 MSK [161121] FATAL: could not find tuple for opclass 1 2021-04-09 12:33:26.424 MSK [161121] PANIC: cannot abort transaction 1, it was already committed Aborted (core dumped) child process exited with exit code 134 Our modified custom version 9.6 fails inside of libc __strncmp_avx2 during post-bootstrap with segmentation fault: Program terminated with signal SIGSEGV, Segmentation fault. #0 __strncmp_avx2 () #1 0x557168a7eeda in nameeq #2 0x557168b4c4a0 in FunctionCall2Coll #3 0x557168659555 in heapgettup_pagemode #4 0x55716865a617 in heap_getnext #5 0x557168678cf1 in systable_getnext #6 0x557168b5651c in GetDatabaseTuple #7 0x557168b574a4 in InitPostgres #8 0x5571689dcb7d in PostgresMain #9 0x5571688844d5 in main I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()" and found 372728b0d49552641f0ea83d9d2e08817de038fa Replace our traditional initial-catalog-data format with a better design. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=372728b0d49552641f0ea83d9d2e08817de038fa This is first commit where `make check` doesn't fail during initdb on my machine. Therefore 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 is the last one where `make check` fails. I've tried with gcc9, gcc10 and clang10. I've configured either without parameters or with `CFLAGS=-O0 ./configure --enable-debug`. Thing doesn't happen on Intel CPU of 10th series (i7-10510U and i9-10900K). Unfortunately, I have no fellows or colleagues with Intel CPU 11 series, therefore I couldn't tell if this bug of 11 series or bug of concrete CPU installed in the notebook. It will be great if some with i7-11* could try to make check and report if it also fails or not. BTW, problem remains in Debian stable (10.4) inside docker on same machine. With regards, Yura Sokolov PostgresPro
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
After reading the code and the patch, I think the patch is good. If the type is non-collatable, we do not add a CollateExpr node as a 'parent' node to the coerced node. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane
回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));
I think it is better to add this test case to regress. 发件人: Tom Lane 发送时间: 2021年4月13日 0:59 收件人: Yulin PEI 抄送: pgsql-hackers@lists.postgresql.org 主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); Yulin PEI writes: > I found it could cause a crash when executing sql statement: `CREATE VIEW > v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in > postgres 13.2 release. Nice catch. I don't think the code in DefineVirtualRelation is wrong: exprCollation shouldn't report any collation for an expression of a non-collatable type. Rather the problem is with an old kluge in coerce_type(), which will push a type coercion underneath a CollateExpr ... without any mind for the possibility that the coercion result isn't collatable. So the right fix is more or less the attached. regards, tom lane 0001-add-regress.patch Description: 0001-add-regress.patch
Re: vacuum freeze - possible improvements
On Tue, 13 Apr 2021 at 19:48, Virender Singla wrote: > Yes another thing here is anti wraparound vacuum also cleans dead tuples but > i am not sure what we can do to avoid that. > There can be vacuum to only freeze the tulpes? You might want to have a look at [1], which was just pushed for PG14. In particular: > When the failsafe triggers, VACUUM takes extraordinary measures to > finish as quickly as possible so that relfrozenxid and/or relminmxid can > be advanced. VACUUM will stop applying any cost-based delay that may be > in effect. VACUUM will also bypass any further index vacuuming and heap > vacuuming -- it only completes whatever remaining pruning and freezing > is required. Bypassing index/heap vacuuming is enabled by commit > 8523492d, which made it possible to dynamically trigger the mechanism > already used within VACUUM when it is run with INDEX_CLEANUP off. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1e55e7d1755cefbb44982fbacc7da461fa8684e6
Re: Old Postgresql version on i7-1165g7
On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote: > Good day, hackers. > > I've got HP ProBook 640g8 with i7-1165g7. I've installed Ubuntu 20.04 LTS on > it > and started to play with PostgreSQL sources. > > Occasinally I found I'm not able to `make check` old Postgresql versions. Do you mean that HEAD works consistently, but v9.6 and v10 sometimes work but sometimes fail ? > #5 0x557168678cf1 in systable_getnext > #6 0x557168b5651c in GetDatabaseTuple > #7 0x557168b574a4 in InitPostgres > #8 0x5571689dcb7d in PostgresMain > #9 0x5571688844d5 in main > > I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()" > and > found 372728b0d49552641f0ea83d9d2e08817de038fa > > Replace our traditional initial-catalog-data format with a better > > design. > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=372728b0d49552641f0ea83d9d2e08817de038fa > > This is first commit where `make check` doesn't fail during initdb on my > machine. Therefore 02f3e558f21c0fbec9f94d5de9ad34f321eb0e57 is the last one > where > `make check` fails. This doesn't make much sense or help much, since 372728b doesn't actually change the catalogs, or any .c file. > I've tried with gcc9, gcc10 and clang10. > I've configured either without parameters or with `CFLAGS=-O0 ./configure > --enable-debug`. You used make clean too, right ? I would also use --with-cassert, since it might catch problems you'd otherwise miss. If that doesn't expose anything, maybe try to #define USE_VALGRIND in src/include/pg_config_manual.h, and run with valgrind --trace-children=yes -- Justin
Re: TRUNCATE on foreign table
On Tue, Apr 13, 2021 at 2:37 PM Kohei KaiGai wrote: > Here are two points to discuss. > > Regarding to the FDW-APIs, yes, nobody can deny someone want to implement > their own FDW module that adds special handling when its foreign table > is specified > with ONLY-clause, even if we usually ignore. > > > On the other hand, when we consider a foreign table is an abstraction > of an external > data source, at least, the current postgres_fdw's behavior is not consistent. > > When a foreign table by postgres_fdw that maps a remote parent table, > has a local > child table, > > This command shows all the rows from both of local and remote. > > postgres=# select * from f_table ; > id | v > +- > 1 | remote table t_parent id=1 > 2 | remote table t_parent id=2 > 3 | remote table t_parent id=3 > 10 | remote table t_child1 id=10 > 11 | remote table t_child1 id=11 > 12 | remote table t_child1 id=12 > 20 | remote table t_child2 id=20 > 21 | remote table t_child2 id=21 > 22 | remote table t_child2 id=22 > 50 | it is l_child id=50 > 51 | it is l_child id=51 > 52 | it is l_child id=52 > 53 | it is l_child id=53 > (13 rows) > > If f_table is specified with "ONLY", it picks up only the parent table > (f_table), > however, ONLY-clause is not push down to the remote side. > > postgres=# select * from only f_table ; > id | v > +- > 1 | remote table t_parent id=1 > 2 | remote table t_parent id=2 > 3 | remote table t_parent id=3 > 10 | remote table t_child1 id=10 > 11 | remote table t_child1 id=11 > 12 | remote table t_child1 id=12 > 20 | remote table t_child2 id=20 > 21 | remote table t_child2 id=21 > 22 | remote table t_child2 id=22 > (9 rows) > > On the other hands, TRUNCATE ONLY f_table works as follows... > > postgres=# truncate only f_table; > TRUNCATE TABLE > postgres=# select * from f_table ; > id | v > +- > 10 | remote table t_child1 id=10 > 11 | remote table t_child1 id=11 > 12 | remote table t_child1 id=12 > 20 | remote table t_child2 id=20 > 21 | remote table t_child2 id=21 > 22 | remote table t_child2 id=22 > 50 | it is l_child id=50 > 51 | it is l_child id=51 > 52 | it is l_child id=52 > 53 | it is l_child id=53 > (10 rows) > > It eliminates the rows only from the remote parent table although it > is a part of the foreign table. > > My expectation at the above command shows rows from the local child > table (id=50...53). Yeah, ONLY clause is not pushed to the remote server in case of SELECT commands. This is also true for DELETE and UPDATE commands on foreign tables. I'm not sure if it wasn't thought necessary or if there is an issue to push it or I may be missing something here. I think we can start a separate thread to see other hackers' opinions on this. I'm not sure whether all the clauses that are possible for SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote server by postgres_fdw. Well, now foreign TRUNCATE pushes the ONLY clause to the remote server which is inconsistent when compared to SELECT/UPDATE/DELETE commands. If we were to keep it consistent across all foreign commands that ONLY clause is not pushed to remote server, then we can restrict for TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I don't see any real problem in pushing the ONLY clause, at least in case of TRUNCATE. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] force_parallel_mode and GUC categories
On Tue, Apr 13, 2021 at 04:34:23PM +0900, Michael Paquier wrote: > On Mon, Apr 12, 2021 at 01:40:52AM -0400, Tom Lane wrote: > >> - Should we make more general the description of the developer options > >> in the docs? > > > > Perhaps ... what did you have in mind? > > The first sentence of the page now says that: > "The following parameters are intended for work on the PostgreSQL > source code, and in some cases to assist with recovery of severely > damaged databases." > > That does not stick with force_parallel_mode IMO. Maybe: Good point. -- Justin >From b5355f3384df1aab87e4cb89afcf4d9703c14bdb Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 3 Apr 2021 19:06:37 -0500 Subject: [PATCH v3 1/3] track_activity_query_size is STATS_COLLECTOR category Not Resource Usage / Memory, as since 995fb7420 --- src/backend/utils/misc/guc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index d0a51b507d..e65d62e71d 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3506,7 +3506,7 @@ static struct config_int ConfigureNamesInt[] = }, { - {"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM, + {"track_activity_query_size", PGC_POSTMASTER, STATS_COLLECTOR, gettext_noop("Sets the size reserved for pg_stat_activity.query, in bytes."), NULL, GUC_UNIT_BYTE -- 2.17.0 >From b491b4520f157fb1328925f500f15fe2ef8e457c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 3 Apr 2021 19:17:03 -0500 Subject: [PATCH v3 2/3] track_commit_timestamp is REPLICATION_SENDING If I'm not wrong, this was missed at 4bd8ed31b --- src/backend/utils/misc/guc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index e65d62e71d..a3c2ebbc53 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -1188,7 +1188,7 @@ static struct config_bool ConfigureNamesBool[] = check_bonjour, NULL, NULL }, { - {"track_commit_timestamp", PGC_POSTMASTER, REPLICATION, + {"track_commit_timestamp", PGC_POSTMASTER, REPLICATION_SENDING, gettext_noop("Collects transaction commit time."), NULL }, -- 2.17.0 >From e9b556f2bc75dc8350759e4b32e588325086218e Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 3 Apr 2021 19:24:50 -0500 Subject: [PATCH v3 3/3] Change force_parallel_mode to a DEVELOPER GUC, and remove it from sample config.. ..to discourage users from changing this option in hopes that it'll make their queries faster, but without reading the documentation or understanding what it does. --- doc/src/sgml/config.sgml | 96 ++- src/backend/utils/misc/guc.c | 2 +- src/backend/utils/misc/postgresql.conf.sample | 1 - 3 files changed, 50 insertions(+), 49 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index f749fe9ce7..a7350c6ddc 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5912,51 +5912,6 @@ SELECT * FROM parent WHERE key = 2400; - - force_parallel_mode (enum) - - force_parallel_mode configuration parameter - - - - -Allows the use of parallel queries for testing purposes even in cases -where no performance benefit is expected. -The allowed values of force_parallel_mode are -off (use parallel mode only when it is expected to improve -performance), on (force parallel query for all queries -for which it is thought to be safe), and regress (like -on, but with additional behavior changes as explained -below). - - - -More specifically, setting this value to on will add -a Gather node to the top of any query plan for which this -appears to be safe, so that the query runs inside of a parallel worker. -Even when a parallel worker is not available or cannot be used, -operations such as starting a subtransaction that would be prohibited -in a parallel query context will be prohibited unless the planner -believes that this will cause the query to fail. If failures or -unexpected results occur when this option is set, some functions used -by the query may need to be marked PARALLEL UNSAFE -(or, possibly, PARALLEL RESTRICTED). - - - -Setting this value to regress has all of the same effects -as setting it to on plus some additional effects that are -intended to facilitate automated regression testing. Normally, -messages from a parallel worker include a context line indicating that, -but a setting of regress suppresses this line so that the -output is the same as in non-parallel execution. Also, -the Gather nodes added to plans by this setting are hidden -in EXPLAIN output so that the outp
Re: vacuum freeze - possible improvements
On Tue, Apr 13, 2021 at 1:51 PM Virender Singla wrote: > > Thanks Masahiko for the response. > > "What is > the use case where users want to freeze fewer transactions, meaning > invoking anti-wraparound frequently?" > > My overall focus here is anti wraparound vacuum on huge tables in emergency > situations (where we reached very close to 2B transactions or already in > outage window). In this situation we want to recover ASAP instead of having > many hours of outage.The Purpose of increasing "vacuum_freeze_min_age" to > high value is that anti wraparound vacuum will have to do less work because > we are asking less transactions/tuples to freeze (Of Course subsequent vacuum > has to do the remaining work). I think I understood your proposal. For example, if we insert 500GB tuples during the first 1 billion transactions and then insert more 500GB tuples into another 500GB blocks during the next 1 billion transactions, vacuum freeze scans 1TB whereas we scans only 500GB that are modified by the first insertions if we’re able to freeze directly tuples that are older than the cut-off. Is that right? > > "So the vacuum freeze will still have to > process tuples that are inserted/modified during consuming 1 billion > transactions. It seems to me that it’s not fewer transactions." > > Yes another thing here is anti wraparound vacuum also cleans dead tuples but > i am not sure what we can do to avoid that. > There can be vacuum to only freeze the tulpes? I think it's a good idea to skip all work except for freezing tuples in emergency cases. Thanks to vacuum_failsafe_age we can avoid index vacuuming, index cleanup, and heap vacuuming. > > Thanks for sharing PG14 improvements, those are nice to have. But still the > anti wraparound vacuum will have to scan all the pages (from visibility map) > even if we are freezing fewer transactions because currently there is no way > to know what block/tuple contains which transaction id. Yes, that feature is to speed up vacuum by dynamically disabling both cost-based delay and some cleanup work whereas your idea is to do that by speeding up heap scan. > If there is a way then it would be easier to directly freeze those tuples > quickly and advance the relfrozenxid for the table. Maybe we can track the oldest xid per page in a map like visiblity map or integrate it with visibility map. We need to freeze only pages that are all-visible and whose oldest xid is older than the cut-off xid. I think we need to track both xid and multi xid. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Monitoring stats docs inconsistency
Hi, Few of the statistics description in monitoring_stats.sgml doc is not consistent. Made all the descriptions consistent by including punctuation marks at the end of each description. Thoughts? Regards, Vignesh From b74179aec11eb1f2439ef43e1830531c2cde78a2 Mon Sep 17 00:00:00 2001 From: vignesh Date: Mon, 12 Apr 2021 20:49:52 +0530 Subject: [PATCH] Monitoring stats docs inconsistency. Few of the statistics description in monitoring_stats.sgml doc is not consistent. Made all the description consistent by including fullstop. --- doc/src/sgml/monitoring.sgml | 354 +-- 1 file changed, 177 insertions(+), 177 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 8287587f61..c3339c6753 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -695,7 +695,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser datid oid - OID of the database this backend is connected to + OID of the database this backend is connected to. @@ -704,7 +704,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser datname name - Name of the database this backend is connected to + Name of the database this backend is connected to. @@ -713,7 +713,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser pid integer - Process ID of this backend + Process ID of this backend. @@ -733,7 +733,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser usesysid oid - OID of the user logged into this backend + OID of the user logged into this backend. @@ -742,7 +742,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser usename name - Name of the user logged into this backend + Name of the user logged into this backend. @@ -752,7 +752,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Name of the application that is connected - to this backend + to this backend. @@ -819,7 +819,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser Time when the currently active query was started, or if state is not active, when the last query - was started + was started. @@ -828,7 +828,7 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser state_change timestamp with time zone - Time when the state was last changed + Time when the state was last changed. @@ -2302,7 +2302,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i pid integer - Process ID of a WAL sender process + Process ID of a WAL sender process. @@ -2311,7 +2311,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i usesysid oid - OID of the user logged into this WAL sender process + OID of the user logged into this WAL sender process. @@ -2320,7 +2320,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i usename name - Name of the user logged into this WAL sender process + Name of the user logged into this WAL sender process. @@ -2330,7 +2330,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Name of the application that is connected - to this WAL sender + to this WAL sender. @@ -2362,7 +2362,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i TCP port number that the client is using for communication - with this WAL sender, or -1 if a Unix socket is used + with this WAL sender, or -1 if a Unix socket is used. @@ -2372,7 +2372,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Time when this process was started, i.e., when the - client connected to this WAL sender + client connected to this WAL sender. @@ -2430,7 +2430,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i sent_lsn pg_lsn - Last write-ahead log location sent on this connection + Last write-ahead log location sent on this connection. @@ -2440,7 +2440,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i Last write-ahead log location wri
Re: Identify missing publications from publisher while create/alter subscription.
On Thu, Apr 8, 2021 at 12:13 PM Bharath Rupireddy wrote: > > On Wed, Apr 7, 2021 at 10:37 PM vignesh C wrote: > > > I think, we can also have validate_publication option allowed for > > > ALTER SUBSCRIPTION SET PUBLICATION and REFRESH PUBLICATION commands > > > with the same behaviour i.e. error out when specified publications > > > don't exist in the publisher. Thoughts? > > > > Sorry for the delayed reply, I was working on a few other projects so > > I was not able to reply quickly. > > Since we are getting the opinion that if we make the check > > publications by default it might affect the existing users, I'm fine > > with having an option validate_option to check if the publication > > exists in the publisher, that way there is no change for the existing > > user. I have made a patch in similar lines, attached patch has the > > changes for the same. > > Thoughts? > > Here are some comments on v3 patch: > Thanks for the comments > 1) Please mention what's the default value of the option > + > +validate_publication > (boolean) > + > + > + Specifies whether the subscriber must verify if the specified > + publications are present in the publisher. By default, the > subscriber > + will not check if the publications are present in the publisher. > + > + > + > Modified. > 2) How about > + Specifies whether the subscriber must verify the > publications that are > + being subscribed to are present in the publisher. By default, > the subscriber > instead of > + Specifies whether the subscriber must verify if the specified > + publications are present in the publisher. By default, the > subscriber > Slightly reworded and modified. > 3) I think we can make below common code into a single function with > flags to differentiate processing for both, something like: > StringInfoData *get_publist_str(List *publicaitons, bool use_quotes, > bool is_fetch_table_list); > check_publications: > +/* Convert the publications which does not exist into a string. */ > +initStringInfo(&nonExistentPublications); > +foreach(lc, publicationsCopy) > +{ > and get_appended_publications_query: > foreach(lc, publications) > > With the new function that only prepares comma separated list of > publications, you can get rid of get_appended_publications_query and > just append the returned list to the query. > fetch_table_list: get_publist_str(publications, true, true); > check_publications: for select query preparation > get_publist_str(publications, true, false); and for error string > preparation get_publist_str(publications, false, false); > > And also let the new function get_publist_str allocate the string and > just mention as a note in the function comment that the callers should > pfree the returned string. > I felt the existing code looks better, if we have a common function, we will have to lot of if conditions as both the functions is not same to same, they operate on different data types and do the preparation appropriately. Like fetch_table_list get nspname & relname and converts it to RangeVar and adds to the list other function prepares a text and deletes the entries present from the list. So I did not fix this. Thoughts? > 4) We could do following, > ereport(ERROR, > (errcode(ERRCODE_TOO_MANY_ARGUMENTS), > errmsg_plural("publication %s does not exist in the > publisher", >"publications %s do not exist in the publisher", > list_length(publicationsCopy), > nonExistentPublications.data))); > instead of > +ereport(ERROR, > +(errmsg("publication(s) %s does not exist in the publisher", > +nonExistentPublications.data))); > > if (list_member(cte->ctecolnames, > makeString(cte->search_clause->search_seq_column))) > Modified. > 5) I think it's better with > + * Check the specified publication(s) is(are) present in the publisher. > instead of > + * Verify that the specified publication(s) exists in the publisher. > Modified. > 6) Instead of such a longer variable name "nonExistentPublications" > how about just "pubnames" and add a comment there saying "going to > error out with the list of non-existent publications" with that the > variable and that part of code's context is clear. > Modified. > 7) You can just do > publications = list_copy(publications); > instead of using another variable publicationsCopy > publicationsCopy = list_copy(publications); publications is an input list to this function, I did not want this function to change this list. I felt existing is fine. Thoughts? > 8) If you have done StringInfoData *cmd = makeStringInfo();, then no > need of initStringInfo(cmd); Modified. Attached v4 patch has the fixes for the comments. Regards, Vignesh From ffce0a164730bf872149d04d
Re: [PATCH] Identify LWLocks in tracepoints
On Tue, 13 Apr 2021 at 11:06, Andres Freund wrote: > > Each backend can have different tranche IDs (right?) > > No, they have to be the same in each. Note how the tranche ID is part of > struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock > etc. Ah. I misunderstood that at some point. That makes it potentially more sensible to skip reporting tranche names. Not great, because it's much less convenient to work with trace data full of internal ordinals that must be re-mapped in post-processing. But I'm generally OK with deferring runtime costs to tooling rather than the db itself so long as doing so is moderately practical. In this case, I think we could likely get away with removing the tranche names from the tracepoints if we instead emit a trace event on each dynamic tranche registration that reports the tranche id -> name mapping. It still sucks for tools, since they have to scrape up the static tranche registrations from somewhere else, but ... it'd be tolerable. > > The kernel is packed with extremely useful trace events, and for very > > good reasons. Some on very hot paths. > > IIRC those aren't really comparable - the kernel actually does modify > the executable code to replace the tracepoints with nops. Same with userspace static trace markers (USDTs). A followup mail will contain a testcase and samples to demonstrate this. > Although I still don't really buy that static tracepoints are the best > way to measure this kind of thing, given the delay introducing them and > the cost of having them around. I think I pointed out > https://postgr.es/m/20200813004233.hdsdfvufqrbdwzgr%40alap3.anarazel.de > before. Yeah. Semaphores are something hot enough that I'd hesitate to touch them. > > LWLock lock-ordering deadlocks > > This seems unrelated to tracepoints to me. If I can observe which locks are acquired in which order by each proc, I can then detect excessive waits and report the stack of held locks of both procs and their order of acquisition. Since LWLocks shmem state doesn't AFAICS track any information on the lock holder(s) I don't see a way to do this in-process. It's not vital, it's just one of the use cases I have in mind. I suspect that any case where such deadlocks are possible represents a misuse of LWLocks anyway. > > and there's no way to know what a given non-built-in tranche ID means > > for any given backend without accessing backend-specific in-memory > > state. Including for non-user-accessible backends like bgworkers and > > auxprocs, where it's not possible to just query the state from a view > > directly. > > The only per-backend part is that some backends might not know the > tranche name for dynamically registered tranches where the > LWLockRegisterTranche() hasn't been executed in a backend. Which should > pretty much never be an aux process or such. And even for bgworkers it > seems like a pretty rare thing, because those need to be started by > something... > > It might be worth proposing a shared hashtable with tranch names and > jut reserving enough space for ~hundred entries... Yeah, that'd probably work and be cheap enough not to really matter. Might even save us a chunk of memory by not turning CoW pages into private mappings for each backend during registration. > > And you can always build without `--enable-dtrace` and ... just not care. > > Practically speaking, distributions enable it, which then incurs the > cost for everyone. Yep. That's part of why I was so surprised to notice the GetLWTrancheName() function call in LWLock tracepoints. Nearly anywhere else it wouldn't matter at all, but LWLocks are hot enough that it just might matter for the no-wait fastpath.
CTE push down
Hi. Currently PostgreSQL supports CTE push down for SELECT statements, but it is implemented as turning each CTE reference into subquery. When CTE is referenced multiple times, we have choice - to materialize CTE (and disable quals distribution to the CTE query) or inline it (and so run CTE query multiple times, which can be inefficient, for example, when CTE references foreign tables). I was looking if it is possible to collect quals referencing CTE, combine in OR qual and add them to CTE query. So far I consider the following changes. 1) Modify SS_process_ctes() to add a list of RestrictInfo* to PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this list cte_restrictinfos for now)/ 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE relkind and sure that can safely pushdown restrictinfo, preserve restrictinfo in cte_restrictinfos, converting multiple restrictions to "OR" RestrictInfos. 3) In the end of subquery_planner() (after inheritance_planner() or grouping_planner()) we can check if cte_restrictinfos contain some non-null RestrictInfo pointers and recreate plan for corresponding CTEs, distributing quals to relations inside CTE queries. For now I'm not sure how to handle vars mapping when we push restrictinfos to the level of cte root or when we push it down to the cte plan, but properly mapping vars seems seems to be doable. Is there something else I miss? Does somebody work on alternative solution or see issues in such approach? -- Best regards, Alexander Pyhalov, Postgres Professional
potential deadlock in parallel hashjoin grow-buckets-barrier and blocking nodes?
Hi, Whilst trying to debug a deadlock in some tpc-ds query I noticed something that could cause problems in the hashjoin implementation and cause potentially deadlocks (if my analysis is right). Whilst building the inner hash table, the whole time the grow barriers are attached (the PHJ_BUILD_HASHING_INNER phase). Usually this is not a problem, however if one of the nodes blocks somewhere further down in the plan whilst trying to fill the inner hash table whilst the others are trying to e.g. extend the number of buckets using ExecParallelHashIncreaseNumBuckets, they would all wait until the blocked process comes back to the hashjoin node and also joins the effort. Wouldn't this give potential deadlock situations? Or why would a worker that is hashing the inner be required to come back and join the effort in growing the hashbuckets? With very skewed workloads (one node providing all data) I was at least able to have e.g. 3 out of 4 workers waiting in ExecParallelHashIncreaseNumBuckets, whilst one was in the execprocnode(outernode). I tried to detatch and reattach the barrier but this proved to be a bad idea :) Regards, Luc
Re: [PATCH] Identify LWLocks in tracepoints
On Tue, 13 Apr 2021 at 21:05, Craig Ringer wrote: > On Tue, 13 Apr 2021 at 11:06, Andres Freund wrote: > > IIRC those aren't really comparable - the kernel actually does modify > > the executable code to replace the tracepoints with nops. > > Same with userspace static trace markers (USDTs). > > A followup mail will contain a testcase and samples to demonstrate this. Demo follows, with source attached too. gcc 10.2 compiling with -O2, using dtrace and from systemtap 4.4 . Trivial empty function definition: __attribute__((noinline)) void no_args(void) { SDT_NOOP_NO_ARGS(); } Disassembly when SDT_NOOP_NO_ARGS is defined as #define SDT_NOOP_NO_ARGS() is: : retq When built with a probes.d definition processed by the dtrace script instead, the disassembly becomes: : nop retq So ... yup, it's a nop. Now, if we introduce semaphores that changes. __attribute__((noinline)) void no_args(void) { if (SDT_NOOP_NO_ARGS_ENABLED()) SDT_NOOP_NO_ARGS(); } disassembles to: : cmpw $0x0,0x2ec4(%rip)# jne retq nopl 0x0(%rax,%rax,1) nop retq so the semaphore test is actually quite harmful and wasteful in this case. That's not surprising since this SDT is a simple marker point. But what if we supply arguments to it? It turns out that the disassembly is the same if args are passed, whether locals or globals, including globals assigned based on program input that can't be determined at compile time. Still just a nop. If I pass a function call as an argument expression to a probe, e.g. __attribute__((noinline)) static int compute_probe_argument(void) { return 100; } void with_computed_arg(void) { SDT_NOOP_WITH_COMPUTED_ARG(compute_probe_argument()); } then the disassembly with SDTs is: : callq nop retq so the function call isn't elided even if it's unused. That's somewhat expected. The same will be true if the arguments to a probe require pointer chasing or non-trivial marshalling. If a semaphore guard is added this becomes: : cmpw $0x0,0x2e2e(%rip)# jne retq nopl 0x0(%rax,%rax,1) callq nop retq so now the call to compute_probe_argument() is skipped unless the probe is enabled, but the function is longer and requires a test and jump. If I dummy up a function that does some pointer chasing, without semaphores I get : mov(%rdi),%rax mov(%rax),%rax mov(%rax),%rax nop retq so the arguments are marshalled then ignored. with semaphores I get: : cmpw $0x0,0x2d90(%rip)# jne retq nopl 0x0(%rax,%rax,1) mov(%rdi),%rax mov(%rax),%rax mov(%rax),%rax nop retq so again the probe's argument marshalling is inline in the function body, but at the end, and skipped over. Findings: * A probe without arguments or with simple arguments is just a 'nop' instruction * Probes that require function calls, pointer chasing, other expression evaluation etc may impose a fixed cost to collect up arguments even if the probe is disabled. * SDT semaphores can avoid that cost but add a branch, so should probably be avoided unless preparing probe arguments is likely to be expensive. Hideous but effective demo code attached. provider sdt_noop { probe no_args(); probe with_args(int arg1, int arg2, int arg3); probe with_global_arg(int arg1); probe with_volatile_arg(int arg1); probe with_many_args(int arg1, int arg2, int arg3, int64_t arg4, int64_t arg5, int64_t arg6, int64_t arg7, int64_t arg8); probe with_computed_arg(int arg1); probe with_pointer_chasing(int arg1); }; Makefile Description: Binary data #include #ifdef USE_SDT #include "sdt_noop_probes_enabled.h" #else #include "sdt_noop_probes_disabled.h" #endif void no_args(void); int with_args(void); int with_global_arg(void); int with_volatile_arg(void); void with_many_args(void); void with_computed_arg(void); void with_pointer_chasing(int arg); __attribute__((noinline)) void no_args(void) { #ifdef USE_SDT_SEMAPHORES if (SDT_NOOP_NO_ARGS_ENABLED()) #endif SDT_NOOP_NO_ARGS(); } __attribute__((noinline)) int with_args(void) { int arg1 = 0; int arg2 = 1; int arg3 = 2; #ifdef USE_SDT_SEMAPHORES if (SDT_NOOP_WITH_ARGS_ENABLED()) #endif SDT_NOOP_WITH_ARGS(arg1, arg2, arg3); return arg1 + arg2 + arg3; } int some_global; __attribute__((noinline)) int with_global_arg(void) { #ifdef USE_SDT_SEMAPHORES if (SDT_NOOP_WITH_GLOBAL_ARG_ENABLED()) #endif SDT_NOOP_WITH_GLOBAL_ARG(some_global); return some_global; } __attribute__((noinline)) int with_volatile_arg(void) { volatile int arg1; arg1 = 42; #ifdef USE_SDT_SEMAPHORES if (SDT_NOOP_WITH_VOLATILE_ARG_ENABLED()) #endif SDT_NOOP_WITH_VOLATILE_ARG(arg1); return arg1; } __att
RE: Truncate in synchronous logical replication failed
On Monday, April 12, 2021 3:58 PM Amit Kapila wrote: > On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com > wrote: > > but if we take a measure to fix the doc, we have to be careful for the > > description, because when we remove the primary keys of 'test' tables on the > scenario in [1], we don't have this issue. > > It means TRUNCATE in synchronous logical replication is not always > blocked. > > > > The problem happens only when we try to fetch IDENTITY_KEY attributes > because pgoutput uses RelationGetIndexAttrBitmap() to get that information > which locks the required indexes. Now, because TRUNCATE has already > acquired an exclusive lock on the index, it seems to create a sort of deadlock > where the actual Truncate operation waits for logical replication of > operation to > complete and logical replication waits for actual Truncate operation to > finish. > > Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY > attributes? During decoding, we don't even lock the main relation, we just > scan > the system table and build that information using a historic snapshot. Can't > we > do something similar here? I think we can build the IDENTITY_KEY attributes with NoLock instead of calling RelationGetIndexAttrBitmap(). When we trace back the caller side of logicalrep_write_attrs(), doing the thing equivalent to RelationGetIndexAttrBitmap() for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate. OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() in pgoutput_* functions and in the file of relcache.c. Therefore, I'd like to discuss how to address the hang. My first idea is to extract some parts of RelationGetIndexAttrBitmap() only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those either in a logicalrep_write_attrs() or as a new function. RelationGetIndexAttrBitmap() has 'restart' label for goto statement in order to ensure to return up-to-date attribute bitmaps, so I prefer having a new function when we choose this direction. Having that goto in logicalrep_write_attrs() makes it a little bit messy, I felt. The other direction might be to extend RelationGetIndexAttrBitmap's function definition to accept lockmode to give NoLock from logicalrep_write_attrs(). But, this change impacts on other several callers so is not as good as the first direction above, I think. If someone has any better idea, please let me know. Best Regards, Takamichi Osumi
Re: vacuum freeze - possible improvements
exactly my point, want to scan only 500GB data instead of 1TB. That can be handy for vacuum freeze at a dangerous stage (reaching towards 2B). "Maybe we can track the oldest xid per page in a map like visiblity map or integrate it with visibility map. We need to freeze only pages that are all-visible and whose oldest xid is older than the cut-off xid. I think we need to track both xid and multi xid." Yes I thought of that (keep track of olderst xid per page instead of per tuple), only thing here is every time there is some modification on the page, that oldest xid needs to be recalculated for respective page. Still that makes sense with kind of BRIN type structure to keep the xid per page. With Binary Tree Index structure, new transaction/tuple will fit right side (as that would be news transaction until 2B) and then other side leaf blocks can be removed with every vacuum freeze. On Tue, Apr 13, 2021 at 6:02 PM Masahiko Sawada wrote: > On Tue, Apr 13, 2021 at 1:51 PM Virender Singla > wrote: > > > > Thanks Masahiko for the response. > > > > "What is > > the use case where users want to freeze fewer transactions, meaning > > invoking anti-wraparound frequently?" > > > > My overall focus here is anti wraparound vacuum on huge tables in > emergency situations (where we reached very close to 2B transactions or > already in outage window). In this situation we want to recover ASAP > instead of having many hours of outage.The Purpose of increasing > "vacuum_freeze_min_age" to high value is that anti wraparound vacuum will > have to do less work because we are asking less transactions/tuples to > freeze (Of Course subsequent vacuum has to do the remaining work). > > I think I understood your proposal. For example, if we insert 500GB > tuples during the first 1 billion transactions and then insert more > 500GB tuples into another 500GB blocks during the next 1 billion > transactions, vacuum freeze scans 1TB whereas we scans only 500GB that > are modified by the first insertions if we’re able to freeze directly > tuples that are older than the cut-off. Is that right? > > > > > "So the vacuum freeze will still have to > > process tuples that are inserted/modified during consuming 1 billion > > transactions. It seems to me that it’s not fewer transactions." > > > > Yes another thing here is anti wraparound vacuum also cleans dead tuples > but i am not sure what we can do to avoid that. > > There can be vacuum to only freeze the tulpes? > > I think it's a good idea to skip all work except for freezing tuples > in emergency cases. Thanks to vacuum_failsafe_age we can avoid index > vacuuming, index cleanup, and heap vacuuming. > > > > > Thanks for sharing PG14 improvements, those are nice to have. But still > the anti wraparound vacuum will have to scan all the pages (from visibility > map) even if we are freezing fewer transactions because currently there is > no way to know what block/tuple contains which transaction id. > > Yes, that feature is to speed up vacuum by dynamically disabling both > cost-based delay and some cleanup work whereas your idea is to do that > by speeding up heap scan. > > > If there is a way then it would be easier to directly freeze those > tuples quickly and advance the relfrozenxid for the table. > > Maybe we can track the oldest xid per page in a map like visiblity map > or integrate it with visibility map. We need to freeze only pages that > are all-visible and whose oldest xid is older than the cut-off xid. I > think we need to track both xid and multi xid. > > Regards, > > > -- > Masahiko Sawada > EDB: https://www.enterprisedb.com/ >
Re: Have I found an interval arithmetic bug?
On Mon, Apr 12, 2021 at 4:22 PM Bruce Momjian wrote: > On Mon, Apr 12, 2021 at 03:09:48PM -0700, Bryn Llewellyn wrote: > > I showed you all this example a long time ago: > > > > select ( > > ' > > 3.853467 years > > '::interval > > )::text as i; > > > > This behavior is the same in the env. of Bruce’s patch as in unpatched > PG 13.2. This is the result. > > > > 3 years 10 mons > > > > Notice that "3.853467 years" is "3 years" plus "10.241604 months". This > explains the "10 mons" in the result. But the 0.241604 months remainder did > not spill down into days. > > > > Can anybody defend this quirk? An extension of this example with a real > number of month in the user input is correspondingly yet more quirky. The > rules can be written down. But they’re too tortuos to allow an ordinary > mortal confidently to design code that relies on them. > > > > (I was unable to find any rule statement that lets the user predict this > in the doc. But maybe that’s because of my feeble searching skills.) > > > > If there is no defense (and I cannot imagine one) might Bruce’s patch > normalize this too to follow this rule: > > > > — convert 'y years m months' to the real number y*12 + m. > > > > — record truc( y*12 + m) in the "months" field of the internal > representation > > > > — flow the remainder down to days (but no further) > > > > After all, you've bitten the bullet now and changed the behavior. This > means that the semantics of some extant applications will change. So... in > for a penny, in for a pound? > > The docs now say: > > Field values can have fractional parts; for example, '1.5 > weeks' or '01:02:03.45'. The fractional > --> parts are used to compute appropriate values for the next lower-order > internal fields (months, days, seconds). > > meaning fractional years flows to the next lower internal unit, months, > and no further. Fractional months would flow to days. The idea of not > flowing past the next lower-order internal field is that the > approximations between units are not precise enough to flow accurately. > > With my patch, the output is now: > > SELECT INTERVAL '3 years 10.241604 months'; > interval > > 3 years 10 mons 7 days > > It used to flow to seconds. > > -- > Bruce Momjian https://momjian.us > EDB https://enterprisedb.com > > If only the physical world exists, free will is an illusion. > > Based on the results of more samples, I restore +1 to Bruce's latest patch. Cheers
Re: [PATCH] force_parallel_mode and GUC categories
Michael Paquier writes: > On Mon, Apr 12, 2021 at 01:40:52AM -0400, Tom Lane wrote: >> Perhaps ... what did you have in mind? > The first sentence of the page now says that: > "The following parameters are intended for work on the PostgreSQL > source code, and in some cases to assist with recovery of severely > damaged databases." > That does not stick with force_parallel_mode IMO. Maybe: > "The following parameters are intended for development work related to > PostgreSQL. Some of them work on the PostgreSQL source code, some of > them can be used to control the run-time behavior of the server, and > in some cases they can be used to assist with the recovery of severely > damaged databases." I think that's overly wordy. Maybe The following parameters are intended for developer testing, and should never be enabled for production work. However, some of them can be used to assist with the recovery of severely damaged databases. regards, tom lane
Re: TRUNCATE on foreign table
2021年4月13日(火) 21:03 Bharath Rupireddy : > > On Tue, Apr 13, 2021 at 2:37 PM Kohei KaiGai wrote: > > Here are two points to discuss. > > > > Regarding to the FDW-APIs, yes, nobody can deny someone want to implement > > their own FDW module that adds special handling when its foreign table > > is specified > > with ONLY-clause, even if we usually ignore. > > > > > > On the other hand, when we consider a foreign table is an abstraction > > of an external > > data source, at least, the current postgres_fdw's behavior is not > > consistent. > > > > When a foreign table by postgres_fdw that maps a remote parent table, > > has a local > > child table, > > > > This command shows all the rows from both of local and remote. > > > > postgres=# select * from f_table ; > > id | v > > +- > > 1 | remote table t_parent id=1 > > 2 | remote table t_parent id=2 > > 3 | remote table t_parent id=3 > > 10 | remote table t_child1 id=10 > > 11 | remote table t_child1 id=11 > > 12 | remote table t_child1 id=12 > > 20 | remote table t_child2 id=20 > > 21 | remote table t_child2 id=21 > > 22 | remote table t_child2 id=22 > > 50 | it is l_child id=50 > > 51 | it is l_child id=51 > > 52 | it is l_child id=52 > > 53 | it is l_child id=53 > > (13 rows) > > > > If f_table is specified with "ONLY", it picks up only the parent table > > (f_table), > > however, ONLY-clause is not push down to the remote side. > > > > postgres=# select * from only f_table ; > > id | v > > +- > > 1 | remote table t_parent id=1 > > 2 | remote table t_parent id=2 > > 3 | remote table t_parent id=3 > > 10 | remote table t_child1 id=10 > > 11 | remote table t_child1 id=11 > > 12 | remote table t_child1 id=12 > > 20 | remote table t_child2 id=20 > > 21 | remote table t_child2 id=21 > > 22 | remote table t_child2 id=22 > > (9 rows) > > > > On the other hands, TRUNCATE ONLY f_table works as follows... > > > > postgres=# truncate only f_table; > > TRUNCATE TABLE > > postgres=# select * from f_table ; > > id | v > > +- > > 10 | remote table t_child1 id=10 > > 11 | remote table t_child1 id=11 > > 12 | remote table t_child1 id=12 > > 20 | remote table t_child2 id=20 > > 21 | remote table t_child2 id=21 > > 22 | remote table t_child2 id=22 > > 50 | it is l_child id=50 > > 51 | it is l_child id=51 > > 52 | it is l_child id=52 > > 53 | it is l_child id=53 > > (10 rows) > > > > It eliminates the rows only from the remote parent table although it > > is a part of the foreign table. > > > > My expectation at the above command shows rows from the local child > > table (id=50...53). > > Yeah, ONLY clause is not pushed to the remote server in case of SELECT > commands. This is also true for DELETE and UPDATE commands on foreign > tables. I'm not sure if it wasn't thought necessary or if there is an > issue to push it or I may be missing something here. I think we can > start a separate thread to see other hackers' opinions on this. > > I'm not sure whether all the clauses that are possible for > SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote > server by postgres_fdw. > > Well, now foreign TRUNCATE pushes the ONLY clause to the remote server > which is inconsistent when compared to SELECT/UPDATE/DELETE commands. > If we were to keep it consistent across all foreign commands that > ONLY clause is not pushed to remote server, then we can restrict for > TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, > just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I > don't see any real problem in pushing the ONLY clause, at least in > case of TRUNCATE. > If ONLY-clause would be pushed down to the remote query of postgres_fdw, what does the foreign-table represent in the local system? In my understanding, a local foreign table by postgres_fdw is a representation of entire tree of the remote parent table and its children. Thus, we have assumed that DML command fetches rows from the remote parent table without ONLY-clause, once PostgreSQL picked up the foreign table as a scan target. I think we don't need to adjust definitions of the role of foreign-table, even if it represents non-RDBMS data sources. If a foreign table by postgres_fdw supports a special table option to indicate adding ONLY-clause when remote query uses remote tables, it is suitable to add ONLY-clause on the remote TRUNCATE command also, not only SELECT/INSERT/UPDATE/DELETE. In the other words, if a foreign-table represents only a remote parent table, it is suitable to truncate only the remote parent table. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: Identify missing publications from publisher while create/alter subscription.
On Tue, Apr 13, 2021 at 6:22 PM vignesh C wrote: > > 2) How about > > + Specifies whether the subscriber must verify the > > publications that are > > + being subscribed to are present in the publisher. By default, > > the subscriber > > instead of > > + Specifies whether the subscriber must verify if the specified > > + publications are present in the publisher. By default, the > > subscriber > > > > Slightly reworded and modified. + + When true, the command will try to verify if the specified + publications that are subscribed is present in the publisher. + The default is false. + "publications that are subscribed" is not right as the subscriber is not yet subscribed, it is "trying to subscribing", and it's not that the command "will try to verify", it actually verifies. So you can modify as follows: + + When true, the command verifies if all the specified publications that are being subscribed to are present in the publisher and throws an error if any of the publication doesn't exist. The default is false. + > > 3) I think we can make below common code into a single function with > > flags to differentiate processing for both, something like: > > StringInfoData *get_publist_str(List *publicaitons, bool use_quotes, > > bool is_fetch_table_list); > > check_publications: > > +/* Convert the publications which does not exist into a string. */ > > +initStringInfo(&nonExistentPublications); > > +foreach(lc, publicationsCopy) > > +{ > > and get_appended_publications_query: > > foreach(lc, publications) > > > > With the new function that only prepares comma separated list of > > publications, you can get rid of get_appended_publications_query and > > just append the returned list to the query. > > fetch_table_list: get_publist_str(publications, true, true); > > check_publications: for select query preparation > > get_publist_str(publications, true, false); and for error string > > preparation get_publist_str(publications, false, false); > > > > And also let the new function get_publist_str allocate the string and > > just mention as a note in the function comment that the callers should > > pfree the returned string. > > > > I felt the existing code looks better, if we have a common function, > we will have to lot of if conditions as both the functions is not same > to same, they operate on different data types and do the preparation > appropriately. Like fetch_table_list get nspname & relname and > converts it to RangeVar and adds to the list other function prepares a > text and deletes the entries present from the list. So I did not fix > this. Thoughts? I was actually thinking we could move the following duplicate code into a function: foreach(lc, publicationsCopy) { char *pubname = strVal(lfirst(lc)); if (first) first = false; else appendStringInfoString(&pubnames, ", "); appendStringInfoString(&pubnames, "\""); appendStringInfoString(&pubnames, pubname); appendStringInfoString(&pubnames, "\""); } and foreach(lc, publications) { char *pubname = strVal(lfirst(lc)); if (first) first = false; else appendStringInfoString(cmd, ", "); appendStringInfoString(cmd, quote_literal_cstr(pubname)); } that function can be: static void get_publications_str(List *publications, StringInfo dest, bool quote_literal) { ListCell *lc; boolfirst = true; Assert(list_length(publications) > 0); foreach(lc, publications) { char *pubname = strVal(lfirst(lc)); if (first) first = false; else appendStringInfoString(dest, ", "); if (quote_literal) appendStringInfoString(pubnames, quote_literal_cstr(pubname)); else { appendStringInfoString(&dest, "\""); appendStringInfoString(&dest, pubname); appendStringInfoString(&dest, "\""); } } } This way, we can get rid of get_appended_publications_query and use the above function to return the appended list of publications. We need to just pass quote_literal as true while preparing the publist string for publication query and append it to the query outside the function. While preparing publist str for error, pass quote_literal as false. Thoughts? > > 7) You can just do > > publications = list_copy(publications); > > instead of using another variable publicationsCopy > > publicationsCopy = list_copy(publications); > > publications is an input list to this function, I did not want this > function to change this list. I felt existing is fine. Thoughts? Okay. Typo - it's not "subcription" +# Create subcription for a publication which does not exist. I think we can remove e
More sepgsql weirdness
On a system with selinux and sepgsql configured, search path resolution appears to fail if sepgsql is in enforcing mode, but selinux is in permissive mode (which, as I understand it, should cause sepgsql to behave as if it's in permissive mode anyway - and does for other operations). Regardless of whether my understanding of the interaction of the two permissive modes is correct, I don't believe the following should happen: mls=# SELECT current_user; current_user -- postgres (1 row) mls=# SHOW search_path; search_path - "$user", public (1 row) mls=# \dn+ public List of schemas Name | Owner | Access privileges | Description +--+--+ public | postgres | postgres=UC/postgres+| standard public schema | | =UC/postgres | (1 row) mls=# CREATE TABLE tb_users(uid int primary key, name text, mail text, address text, salt text, phash text); ERROR: no schema has been selected to create in LINE 1: CREATE TABLE tb_users(uid int primary key, name text, mail t... ^ mls=# CREATE TABLE public.tb_users(uid int primary key, name text, mail text, address text, salt text, phash text); CREATE TABLE mls=# drop table tb_users; ERROR: table "tb_users" does not exist mls=# drop table public.tb_users; DROP TABLE This is on head, pulled yesterday. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EDB: http://www.enterprisedb.com
Re: Truncate in synchronous logical replication failed
> On 12 Apr 2021, at 08:58, Amit Kapila wrote: > > On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com > wrote: >> >>> I checked the PG-DOC, found it says that “Replication of TRUNCATE >>> commands is supported”[1], so maybe TRUNCATE is not supported in >>> synchronous logical replication? >>> >>> If my understanding is right, maybe PG-DOC can be modified like this. Any >>> thought? >>> Replication of TRUNCATE commands is supported >>> -> >>> Replication of TRUNCATE commands is supported in asynchronous mode >> I'm not sure if this becomes the final solution, >> > > I think unless the solution is not possible or extremely complicated > going via this route doesn't seem advisable. > >> but if we take a measure to fix the doc, we have to be careful for the >> description, >> because when we remove the primary keys of 'test' tables on the scenario in >> [1], we don't have this issue. >> It means TRUNCATE in synchronous logical replication is not always blocked. >> > > The problem happens only when we try to fetch IDENTITY_KEY attributes > because pgoutput uses RelationGetIndexAttrBitmap() to get that > information which locks the required indexes. Now, because TRUNCATE > has already acquired an exclusive lock on the index, it seems to > create a sort of deadlock where the actual Truncate operation waits > for logical replication of operation to complete and logical > replication waits for actual Truncate operation to finish. > > Do we really need to use RelationGetIndexAttrBitmap() to build > IDENTITY_KEY attributes? During decoding, we don't even lock the main > relation, we just scan the system table and build that information > using a historic snapshot. Can't we do something similar here? > > Adding Petr J. and Peter E. to know their views as this seems to be an > old problem (since the decoding of Truncate operation is introduced). We used RelationGetIndexAttrBitmap because it already existed, no other reason. I am not sure what exact locking we need but I would have guessed at least AccessShareLock would be needed. -- Petr
Re: Old Postgresql version on i7-1165g7
Justin Pryzby writes: > On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote: >> Occasinally I found I'm not able to `make check` old Postgresql versions. >> I've bisected between REL_11_0 and "Rename pg_rewind's copy_file_range()" >> and >> found 372728b0d49552641f0ea83d9d2e08817de038fa >>> Replace our traditional initial-catalog-data format with a better >>> design. >> This is first commit where `make check` doesn't fail during initdb on my >> machine. > This doesn't make much sense or help much, since 372728b doesn't actually > change the catalogs, or any .c file. It could make sense if some part of the toolchain that was previously used to generate postgres.bki doesn't work right on that machine. Overall though I'd have thought that 372728b would increase not decrease our toolchain footprint. It also seems unlikely that a recent Ubuntu release would contain toolchain bugs that we hadn't already heard about. > You used make clean too, right ? Really, when bisecting, you need to use "make distclean" or even "git clean -dfx" between steps, or you may get bogus results, because our makefiles aren't that great about tracking dependencies, especially when you move backwards in the history. So perhaps a more plausible theory is that this bisection result is wrong because you weren't careful enough. regards, tom lane
Re: [PATCH] Identify LWLocks in tracepoints
On Tue, 13 Apr 2021 at 21:40, Craig Ringer wrote: > Findings: > > * A probe without arguments or with simple arguments is just a 'nop' > instruction > * Probes that require function calls, pointer chasing, other > expression evaluation etc may impose a fixed cost to collect up > arguments even if the probe is disabled. > * SDT semaphores can avoid that cost but add a branch, so should > probably be avoided unless preparing probe arguments is likely to be > expensive. Back to the topic directly at hand. Attached a disassembly of what LWLockAcquire looks like now on my current build of git master @ 5fe83adad9efd5e3929f0465b44e786dc23c7b55 . This is an --enable-debug --enable-cassert --enable-dtrace build with -Og -ggdb3. The three tracepoints in it are all of the form: movzwl 0x0(%r13),%edi call 0x801c49 nop so it's clear we're doing redundant calls to GetLWTrancheName(), as expected. Not ideal. Now if I patch it to add the _ENABLED() guards on all the tracepoints, the probes look like this: 0x00803176 <+200>: cmpw $0x0,0x462da8(%rip)# 0xc65f26 0x0080317e <+208>: jne0x80328b other interleaved code ... 0x0080328b <+477>: movzwl 0x0(%r13),%edi 0x00803290 <+482>: call 0x801c49 0x00803295 <+487>: nop 0x00803296 <+488>: jmp0x803184 so we avoid the GetLWTrancheName() call at the cost of a test and possible branch, and a small expansion in total function size. Without the semaphores, LWLockAcquire is 463 bytes. With them, it's 524 bytes, which is nothing to sneeze at for code that doesn't do anything 99.999% of the time, but we avoid a bunch of GetLWTrancheName() calls. If I instead replace T_NAME() with NULL for all tracepoints in LWLockAcquire, the disassembly shows that the tracepoints now become a simple 0x00803176 <+200>: nop which is pretty hard to be concerned about. So at the very least we should be calling GetLWTrancheName() once at the start of the function if built with dtrace support and remembering the value, instead of calling it for each tracepoint. And omitting the tranche name looks like it might be sensible for the LWLock code. In most places it won't matter, but LWLocks are hot enough that it possibly might. A simple pg_regress run hits LWLockAcquire 25 million times, so that's 75 million calls to GetLWTrancheName().
Re: Extensions not dumped when --schema is used
On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: > On Sun, Apr 04, 2021 at 03:08:02PM -0700, Noah Misch wrote: > > I noticed the patch's behavior for relations that are members of non-dumped > > extensions and are also registered using pg_extension_config_dump(). It > > depends on the schema: > > > > - If extschema='public', "pg_dump -e plpgsql" makes no mention of the > > relations. > > This one is expected to me. The caller of pg_dump is not specifying > the extension that should be dumped, hence it looks logic to me to not > dump the tables marked as pg_extension_config_dump() part as an > extension not listed. Agreed. > > - If extschema='public', "pg_dump -e plpgsql --schema=public" includes > > commands to dump the relation data. This surprised me. (The > > --schema=public argument causes selectDumpableNamespace() to set > > nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) > > This one would be expected to me. Per the discussion of upthread, we > want --schema and --extension to be two separate and exclusive > switches. So, once the caller specifies --schema we should dump the > contents of the schema, even if its extension is not listed with > --extension. I may disagree with this later, but I'm setting it aside for the moment. > Anyway, the behavior to select if a schema can be dumped > or not is not really related to this new code, right? And "public" is > a mixed beast, being treated as a system object and a user object by > selectDumpableNamespace(). Correct. > > - If extschema is not any sort of built-in schema, "pg_dump -e plpgsql" > > includes commands to dump the relation data. This surprised me. > > Hmm. But you are right that this one is inconsistent with the first > case where the extension is not listed. I would have said that as the > extension is not directly specified and that the schema is not passed > down either then we should not dump it at all, but this combination > actually does so. Maybe we should add an extra logic into > selectDumpableNamespace(), close to the end of it, to decide if, > depending on the contents of the extensions to include, we should dump > its associated schema or not? Another solution would be to make use > of schema_include_oids to filter out the schemas we don't want, but > that would mean that --extension gets priority over --schema or > --table but we did ot want that per the original discussion. No, neither of those solutions apply. "pg_dump -e plpgsql" selects all schemas. That is consistent with its documentation; plain "pg_dump" has long selected all schemas, and the documentation for "-e" does not claim that "-e" changes the selection of non-extension objects. We're not going to solve the problem by making selectDumpableNamespace() select some additional aspect of schema foo, because it's already selecting every available aspect. Like you say, we're also not going to solve the problem by removing some existing aspect of schema foo from selection, because that would remove dump material unrelated to any extension. This isn't a problem of selecting schemas for inclusion in the dump. This is a problem of associating extensions with their pg_extension_config_dump() relations and omitting those extension-member relations when "-e" causes omission of the extension.
Re: TRUNCATE on foreign table
On 2021/04/13 23:25, Kohei KaiGai wrote: 2021年4月13日(火) 21:03 Bharath Rupireddy : Yeah, ONLY clause is not pushed to the remote server in case of SELECT commands. This is also true for DELETE and UPDATE commands on foreign tables. This sounds reasonable reason why ONLY should be ignored in TRUNCATE on foreign tables, for now. If there is the existing rule about how to treat ONLY clause for foreign tables, basically TRUNCATE should follow that at this stage. Maybe we can change the rule, but it's an item for v15 or later? I'm not sure if it wasn't thought necessary or if there is an issue to push it or I may be missing something here. I could not find the past discussion about foreign tables and ONLY clause. I guess that ONLY is ignored in SELECT on foreign tables case because ONLY is interpreted outside the executor and it's not easy to change the executor so that ONLY is passed to FDW. Maybe.. I think we can start a separate thread to see other hackers' opinions on this. I'm not sure whether all the clauses that are possible for SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote server by postgres_fdw. Well, now foreign TRUNCATE pushes the ONLY clause to the remote server which is inconsistent when compared to SELECT/UPDATE/DELETE commands. If we were to keep it consistent across all foreign commands that ONLY clause is not pushed to remote server, then we can restrict for TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I don't see any real problem in pushing the ONLY clause, at least in case of TRUNCATE. If ONLY-clause would be pushed down to the remote query of postgres_fdw, what does the foreign-table represent in the local system? In my understanding, a local foreign table by postgres_fdw is a representation of entire tree of the remote parent table and its children. If so, I'm still wondering why CASCADE/RESTRICT (i.e., DropBehavior) needs to be passed to FDW. IOW, if a foreign table is an abstraction of an external data source, ISTM that postgres_fdw should always issue TRUNCATE with CASCADE. Why do we need to allow RESTRICT to be specified for a foreign table even though it's an abstraction of an external data source? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Proposal: Save user's original authenticated identity for logging
On Wed, 2021-04-07 at 10:20 +0900, Michael Paquier wrote: > Anyway, using a FATAL in this code path is fine by me at the end, so I > have applied the patch. Let's see now what the buildfarm thinks about > it. Looks like the farm has gone green, after some test fixups. Thanks for all the reviews! --Jacob
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On 2021-Apr-13, Amit Langote wrote: > Actually it occurred to me this morning that CLOBBER_CACHE_ALWAYS is > what exposed this problem on this animal (not sure if other such > animals did too though). With CLOBBER_CACHE_ALWAYS, a PartitionDesc > will be built afresh on most uses. In this particular case, the RI > query executed by the insert has to build a new one (for d4_primary), > correctly excluding the detach-pending partition (d4_primary1) per the > snapshot with which it is run. In normal builds, it would reuse the > one built by an earlier query in the transaction, which does include > the detach-pending partition, thus allowing the insert trying to > insert a row referencing that partition to succeed. There is a > provision in RelationGetPartitionDesc() to rebuild if any > detach-pending partitions in the existing copy of PartitionDesc are > not to be seen by the current query, but a bug mentioned in my earlier > reply prevents that from kicking in. Right -- that explanation makes perfect sense: the problem stems from the fact that the cached copy of the partition descriptor is not valid depending on the visibility of detached partitions for the operation that requests the descriptor. I think your patch is a critical part to a complete solution, but one thing is missing: we don't actually know that the detached partitions we see now are the same detached partitions we saw a moment ago. After all, a partitioned table can have several partitions in the process of being detached; so if you just go with the boolean "does it have any detached or not" bit, you could end up with a descriptor that doesn't include/ignore *all* detached partitions, just the older one(s). I think you could fix this aspect easily by decreeing that you can only have only one partition-being-detached at one point. So if you try to DETACH CONCURRENTLY and there's another one in that state, raise an error. Maybe for simplicity we should do that anyway. But I think there's another hidden assumption in your patch, which is that the descriptor is rebuilt every now and then *anyway* because the flag for detached flips between parser and executor, and because we send invalidation messages for each detach. I don't think we would ever change things that would break this flipping (it sounds like planner and executor necessarily have to be doing things differently all the time), but it seems fragile as heck. I would feel much safer if we just avoided caching the wrong thing ... or perhaps keep a separate cache entry (one descriptor including detached, another one not), to avoid pointless rebuilds. -- Álvaro Herrera Valdivia, Chile
Proposal for working open source with PostgreSQL
Hello Sir/Madam, I'm Nandni Mehla, a sophomore currently pursuing B.Tech in IT from Indira Gandhi Delhi Technical University for Women, Delhi. I've recently started working on open source and I think I will be a positive addition to your organization for working on projects using C and SQL, as I have experience in these, and I am willing to learn more from you. I am attaching my proposal in this email for your reference, please guide me through this. Regards. Proposal Link: https://docs.google.com/document/d/1H84WmzZbMERPrjsnXbvoQ7W2AaKsM8eJU02SNw7vQBk/edit?usp=sharing
Re: More sepgsql weirdness
On Tue, Apr 13, 2021 at 10:33 AM Dave Page wrote: > On a system with selinux and sepgsql configured, search path resolution > appears to fail if sepgsql is in enforcing mode, but selinux is in permissive > mode (which, as I understand it, should cause sepgsql to behave as if it's in > permissive mode anyway - and does for other operations). Regardless of > whether my understanding of the interaction of the two permissive modes is > correct, I don't believe the following should happen: I agree that this sounds like something which shouldn't happen if the system is in permissive mode, but I think the behavior itself is deliberate. See OAT_NAMESPACE_SEARCH and commit e965e6344cfaff0708a032721b56f61eea777bc5. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_upgrade check for invalid role-specific default config
Tom wrote: > I do find it interesting that we now have two reports of somebody > doing "ALTER ROLE SET role = something". In the older thread, > I was skeptical that that had any real use-case, so I wonder if > Charlie has a rationale for having done that. Unfortunately I haven't heard back from the original developer who set up this role configuration, but if I do then I will share their intentions. In any case the invalid configuration had been removed from every other role except one (certainly by mistake) which lead to me rediscovering this issue. I tested the above patch with the invalid data locally and it avoids the restore error that we ran into previously. Also it requires no intervention to progress with pg_upgrade unlike my initial idea of adding an check, so it is definitely simpler from a user perspective. Thank you for taking a deep look into this and finding a better solution. Best regards, Charlie Hornsby
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2021-Apr-12, Bruce Momjian wrote: > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > have not changed any of the APIs which existed before this feature was > added, and are called "queryid" or "queryId" --- it is kind of a mess. > I assume I should leave those unchanged. It will also need a catversion > bump. I think it is fine actually. These names appear in structs Query and PlannedStmt, and every single member of those already uses camelCase naming. Changing those to use "query_id" would look out of place. You did change the one in PgBackendStatus to st_query_id, which also matches the naming style in that struct, so that looks fine also. So I'm -1 on Julien's first proposed change, and +1 on his second proposed change (the name of the first argument of pgstat_report_query_id should be query_id). -- Álvaro Herrera Valdivia, Chile
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On Mon, Apr 12, 2021 at 10:07 PM James Coleman wrote: > > On Mon, Apr 12, 2021 at 7:49 PM David Rowley wrote: > > > > On Tue, 13 Apr 2021 at 11:42, Tom Lane wrote: > > > > > > David Rowley writes: > > > > I realised when working on something unrelated last night that we can > > > > also do hash lookups for NOT IN too. > > > > > > ... and still get the behavior right for nulls? > > > > Yeah, it will. There are already some special cases for NULLs in the > > IN version. Those would need to be adapted for NOT IN. > > I hadn't thought about using the negator operator directly that way > when I initially wrote the patch. > > But also I didn't think a whole lot about the NOT IN case at all -- > and there's no mention of such that I see in this thread or the > precursor thread. It's pretty obvious that it wasn't part of my > immediate need, but obviously it'd be nice to have the consistency. > > All that to say this: my vote would be to put it into PG15 also. ...and here's a draft patch. I can take this to a new thread if you'd prefer; the one here already got committed, on the other hand this is pretty strongly linked to this discussion, so I figured it made sense to post it here. James From 08c37c5b228a0a7e9546a481a789eb1c384fcfc7 Mon Sep 17 00:00:00 2001 From: jcoleman Date: Tue, 13 Apr 2021 13:36:38 -0400 Subject: [PATCH v1] Add HashedScalarArrayOp support for NOT IN --- src/backend/optimizer/prep/prepqual.c | 1 + src/backend/optimizer/util/clauses.c | 4 +- src/backend/parser/parse_oper.c | 1 + src/include/nodes/primnodes.h | 1 + src/test/regress/expected/expressions.out | 90 +++ src/test/regress/sql/expressions.sql | 31 6 files changed, 126 insertions(+), 2 deletions(-) diff --git a/src/backend/optimizer/prep/prepqual.c b/src/backend/optimizer/prep/prepqual.c index 42c3e4dc04..2c29993b13 100644 --- a/src/backend/optimizer/prep/prepqual.c +++ b/src/backend/optimizer/prep/prepqual.c @@ -129,6 +129,7 @@ negate_clause(Node *node) newopexpr->opfuncid = InvalidOid; newopexpr->hashfuncid = InvalidOid; newopexpr->useOr = !saopexpr->useOr; + newopexpr->isNegator = true; newopexpr->inputcollid = saopexpr->inputcollid; newopexpr->args = saopexpr->args; newopexpr->location = saopexpr->location; diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 526997327c..99e688426e 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -2137,8 +2137,8 @@ convert_saop_to_hashed_saop_walker(Node *node, void *context) Oid lefthashfunc; Oid righthashfunc; - if (saop->useOr && arrayarg && IsA(arrayarg, Const) && - !((Const *) arrayarg)->constisnull && + if ((saop->useOr || (!saop->useOr && saop->isNegator)) && arrayarg && + IsA(arrayarg, Const) && !((Const *) arrayarg)->constisnull && get_op_hash_functions(saop->opno, &lefthashfunc, &righthashfunc) && lefthashfunc == righthashfunc) { diff --git a/src/backend/parser/parse_oper.c b/src/backend/parser/parse_oper.c index c379d72fce..222f15719a 100644 --- a/src/backend/parser/parse_oper.c +++ b/src/backend/parser/parse_oper.c @@ -896,6 +896,7 @@ make_scalar_array_op(ParseState *pstate, List *opname, result->opfuncid = opform->oprcode; result->hashfuncid = InvalidOid; result->useOr = useOr; + result->isNegator = strcmp("<>", NameStr(opform->oprname)) == 0; /* inputcollid will be set by parse_collate.c */ result->args = args; result->location = location; diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index 9ae851d847..819856395e 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -592,6 +592,7 @@ typedef struct ScalarArrayOpExpr Oid opfuncid; /* PG_PROC OID of comparison function */ Oid hashfuncid; /* PG_PROC OID of hash func or InvalidOid */ bool useOr; /* true for ANY, false for ALL */ + bool isNegator; /* true if NOT has been applied to opno */ Oid inputcollid; /* OID of collation that operator should use */ List *args; /* the scalar and array operands */ int location; /* token location, or -1 if unknown */ diff --git a/src/test/regress/expected/expressions.out b/src/test/regress/expected/expressions.out index 5944dfd5e1..2e88f1ca19 100644 --- a/src/test/regress/expected/expressions.out +++ b/src/test/regress/expected/expressions.out @@ -216,6 +216,61 @@ select return_text_input('a') in ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', ' t (1 row) +-- NOT IN +select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1); + ?column? +-- + f +(1 row) + +select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 0); + ?column? +-- + t +(1 row) + +select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null); + ?column? +-- + +(1 row) + +select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null); + ?col
Re: Have I found an interval arithmetic bug?
> On 12-Apr-2021, at 17:25, Bruce Momjian wrote: > > On Mon, Apr 12, 2021 at 05:20:43PM -0700, Bryn Llewellyn wrote: >> I’d argue that the fact that this: >> >> ('0.3 months'::interval) + ('0.7 months'::interval) >> >> Is reported as '30 days' and not '1 month' is yet another >> bug—precisely because of what I said in my previous email (sorry >> that I forked the thread) where I referred to the fact that, in the >> right test, adding 1 month gets a different answer than adding 30 >> days. > > Flowing _up_ is what these functions do: > > \df *justify* > List of functions > Schema | Name | Result data type | Argument data types > | Type > > +--+--+-+-- >pg_catalog | justify_days | interval | interval > | func >pg_catalog | justify_hours| interval | interval > | func >pg_catalog | justify_interval | interval | interval > | func > >> Yet another convincing reason to get rid of this flow down >> business altogether. > > We can certainly get rid of all downflow, which in the current patch is > only when fractional internal units are specified. > >> If some application wants to model flow-down, then it can do so with >> trivial programming and full control over its own definition of the >> rules. “Yes please!” re Bruce’s “We can certainly get rid of all downflow, which in the current patch is only when fractional internal units are specified.” Notice that a user who creates interval values explicitly only by using “make_interval()” will see no behavior change. There’s another case of up-flow. When you subtract one timestamp value from another, and when they’re far enough apart, then the (internal representation of the) resulting interval value has both a seconds component and a days component. (But never, in my tests, a months component.) I assume that the implementation first gets the difference between the two timestamp values in seconds using (the moral equivalent of) “extract epoch”. And then, if this is greater than 24*60*60, it implements up-flow using the “rule-of-24”—never mind that this means that if you add the answer back to the timestamp value that you subtracted, then you might not get the timestamp value from which you subtracted. (This happens around a DST change and has been discussed earlier in the thread.) The purpose of these three “justify” functions is dubious. I think that it’s best to think of the 3-component interval vector like an [x, y, z] vector in 3d geometric space, where the three coordinates are not mutually convertible because each has unique semantics. This point has been rehearsed many times in this long thread. Having said this, it isn’t hard to understand the rules that the functions implement. And, most importantly, their use is voluntary. They are, though, no more than shipped and documented wrappers for a few special cases. A user could so easily write their own function like this: 1. Compute the values of the three components of the internal representation of the passed-in interval value using the “extract” feature and some simple arithmetic. 2. Derive the [minutes, days, seconds] values of a new representation using whatever rules you feel for. 3. Use these new values to create the return interval value. For example, I might follow a discipline to use interval values that have only one of the three components of the internal representation non-zero. It’s easy to use the “domain” feature for this. (I can use these in any context where I can use the shipped interval.) I could then write a function to convert a “pure seconds” value of my_interval to a “pure years” value. And I could implement my own rules: — Makes sense only for a large number of seconds that comes out to at least five years (else assert failure). — Converts seconds to years using the rule that 1 year is, on average, 365.25*24*60*60 seconds, and then truncates it. There’s no shipped function that does this, and this makes me suspect that I’d prefer to roll my own for any serious purpose. The existence of the three “justify” functions is, therefore, harmless.
Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY
On Sat, Apr 10, 2021 at 01:42:26PM -0500, Justin Pryzby wrote: > On Sun, Mar 21, 2021 at 03:01:15PM -0300, Alvaro Herrera wrote: > > > But note that it doesn't check if an existing constraint "implies" the new > > > constraint - maybe it should. > > > > Hm, I'm not sure I want to do that, because that means that if I later > > have to attach the partition again with the same partition bounds, then > > I might have to incur a scan to recheck the constraint. I think we want > > to make the new constraint be as tight as possible ... > > If it *implies* the partition constraint, then it's at least as tight (and > maybe tighter), yes ? > > I think you're concerned with the case that someone has a partition with > "tight" bounds like (a>=200 and a<300) and a check constraint that's "less > tight" like (a>=100 AND a<400). In that case, the loose check constraint > doesn't imply the tighter partition constraint, so your patch would add a > non-redundant constraint. > > I'm interested in the case that someone has a check constraint that almost but > not exactly matches the partition constraint, like (a<300 AND a>=200). In > that > case, your patch adds a redundant constraint. > > I wrote a patch which seems to effect my preferred behavior - please check. On Sat, Apr 10, 2021 at 02:13:26PM -0500, Justin Pryzby wrote: > I suppose the docs should be updated for the exact behavior, maybe even > without > this patch: > > |A CHECK constraint > |that duplicates the partition constraint is added to the partition. I added this as an Opened Item, since it affects user-visible behavior: whether or not a redundant, non-equal constraint is added. -- Justin
Re: [PATCH] Identify LWLocks in tracepoints
On Mon, Apr 12, 2021 at 11:06 PM Andres Freund wrote: > No, they have to be the same in each. Note how the tranche ID is part of > struct LWLock. Which is why LWLockNewTrancheId() has to acquire a lock > etc. More precisely, if a tranche ID is defined in multiple backends, it needs to be defined the same way in all of them. But it is possible to have an extension loaded into some backends and not others and have it define a tranche ID that other backends know nothing about. Another point to note is that, originally, I had an idea that each tranche of lwlocks was situation in a single array somewhere in memory. Perhaps that was an array of something else, like buffer descriptors, and the lwlocks were just one element of the struct, or maybe it was an array specifically of LWLocks, but one way or the other, there was definitely one array that had all the LWLocks from that tranche in it. So before the commit in question -- 3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an offset for a lock within the tranche that was supposed to uniquely identify the lock. However, the whole idea of an array per tranche turns out to be broken by design. Consider parallel query. You could, perhaps, arrange for all the LWLocks that a particular query needs to be in one tranche. And that's all fine. But what if there are multiple parallel contexts in existence at the same time? I think right now that may be impossible as a practical matter, since for example an SQL function that is called by a parallel query is supposed to run any SQL statements inside of it without parallelism. But, that's basically a policy decision. There's nothing in the parallel context machinery itself which prevents multiple parallel contexts from being active at the same time. And if that happens, then you'd have multiple arrays with the same tranche ID, so how do you identify the locks then? The pre-3761fe3c20bb040b15f0e8da58d824631da00caa data structure doesn't work because it has only one place to store an array base, but having multiple places to store an array base doesn't fix it either because now you've just given the same identifier to multiple locks. You could maybe fix it by putting a limit on how many parallel contexts can be open at the same time, and then having N copies of each parallelism-related tranche. But that seems ugly and messy and a burden on extension authors and not really what anybody wants. You could try to identify locks by pointer addresses, but that's got security hazards and the addreses aren't portable across all the backends involved in the parallel query because of how DSM works, so it's not really that helpful in terms of matching stuff up. You could identify every lock by a tranche ID + an array offset + a "tranche instance ID". But where would you store the tranche instance ID to make it readily accessible, other than in the lock itself? Andres wasn't thrilled about using even 2 bytes to identify the LWLock, so he'll probably like having more bytes in there for that kind of thing even less. And to be honest I wouldn't blame him. We only need 12 bytes to implement the lock itself -- we can't justify having more than a couple of additional bytes for debugging purposes. On a broader level, I agree that being able to find out what the system is doing is really important. But I'm also not entirely convinced that having really fine-grained information here to distinguish between one lock and another is the way to get there. Personally, I've never run into a problem where I really needed to know anything more than the tranche name. Like, I've seen problems for example we can see that there's a lot of contention on SubtransSLRULock, or there's problems with WALInsertLock. But I can't really see why I'd need to know which WALInsertLock was experiencing contention. If we were speaking of buffer content locks, I suppose I can imagine wanting more details, but it's not really the buffer number I'd want to know. I'd want to know the database OID, the relfilenode, the fork number, and the block number. You can argue that we should just expose the buffer number and let the user sort out the rest with dtrace/systemtap magic, but that makes it useless in practice to an awful lot of people, including me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_upgrade check for invalid role-specific default config
Charlie Hornsby writes: > I tested the above patch with the invalid data locally and it avoids > the restore error that we ran into previously. Also it requires no > intervention to progress with pg_upgrade unlike my initial idea of > adding an check, so it is definitely simpler from a user perspective. Thanks for testing! I've pushed this, so it will be in the May minor releases. regards, tom lane
Re: New IndexAM API controlling index vacuum strategies
On Mon, Apr 12, 2021 at 11:05 PM Masahiko Sawada wrote: > I realized that when the failsafe is triggered, we don't bypass heap > truncation that is performed before updating relfrozenxid. I think > it's better to bypass it too. What do you think? I agree. Bypassing heap truncation is exactly the kind of thing that risks adding significant, unpredictable delay at a time when we need to advance relfrozenxid as quickly as possible. I pushed a trivial commit that makes the failsafe bypass heap truncation as well just now. Thanks -- Peter Geoghegan
Re: [PATCH] Identify LWLocks in tracepoints
Hi, On 2021-04-13 14:25:23 -0400, Robert Haas wrote: > On Mon, Apr 12, 2021 at 11:06 PM Andres Freund wrote: > You could identify every lock by a tranche ID + an array offset + a > "tranche instance ID". But where would you store the tranche instance > ID to make it readily accessible, other than in the lock itself? > Andres wasn't thrilled about using even 2 bytes to identify the > LWLock, so he'll probably like having more bytes in there for that > kind of thing even less. I still don't like the two bytes, fwiw ;). Especially because it's 4 bytes due to padding right now. I'd like to move the LWLock->waiters list to outside of the lwlock itself - at most TotalProcs LWLocks can be waited for, so we don't need millions of empty proclist_heads. That way we could also remove the proclist indirection - which shows up a fair bit in contended workloads. And if we had a separate "lwlocks being waited for" structure, we could also add more information to it if we wanted to... The difficulty of course is having space to indicate which of these "waiting for" lists are being used - there's not enough space in ->state right now to represent that. Two possibile approaches: - We could make it work if we restricted MAX_BACKENDS to be 2**14 - but while I personally think that's a sane upper limit, I already had a hard time getting consensus to lower the limit to 2^18-1. - Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as well as an offset to one of MAX_BACKENDS "wait lists" into LWLock. It's not so much that I want to lower the overall memory usage (although it doesn't hurt). It's more about being able to fit more data into one cacheline together with the lwlock. E.g. being able to fit more into BufferDesc would be very useful. A secondary benefit of such an approach would be that it it makes it a lot easier to add efficient adaptive spinning on contended locks. I did experiment with that, and there's some considerable potential for performance benefits there. But for it to scale well we need something similar to "mcs locks", to avoid causing too much contention. And that pretty much requires some separate space to store wait information anyway. With an 8 bytes state we probably could also stash the tranche inside that... > On a broader level, I agree that being able to find out what the > system is doing is really important. But I'm also not entirely > convinced that having really fine-grained information here to > distinguish between one lock and another is the way to get there. > Personally, I've never run into a problem where I really needed to > know anything more than the tranche name. I think it's quite useful for relatively simple things like analyzing the total amount of time spent in individual locks, without incuring much overhead when not doing so (for which you need to identify individual locks, otherwise your end - start time is going to be meaningless). And, slightly more advanced, for analyzing what the stack was when the lock was released - which then allows you to see what work you're blocked on, something I found hard to figure out otherwise. I found that that's mostly quite doable with dynamic probes though. > Like, I've seen problems for example we can see that there's a lot of > contention on SubtransSLRULock, or there's problems with > WALInsertLock. But I can't really see why I'd need to know which > WALInsertLock was experiencing contention. Well, but you might want to know what the task blocking you was doing. What to optimize might differ if the other task is e.g. a log switch (which acquires all insert locks), than if it's WAL writes by VACUUM. > If we were speaking of buffer content locks, I suppose I can imagine > wanting more details, but it's not really the buffer number I'd want > to know. I'd want to know the database OID, the relfilenode, the fork > number, and the block number. You can argue that we should just expose > the buffer number and let the user sort out the rest with > dtrace/systemtap magic, but that makes it useless in practice to an > awful lot of people, including me. I have wondered if we ought to put some utilities for that in contrib or such. It's a lot easier to address something new with a decent starting point... Greetings, Andres Freund
Re: Retry in pgbench
Hi, On Tue, 13 Apr 2021 16:12:59 +0900 (JST) Tatsuo Ishii wrote: > [...] > [...] > [...] > > Thanks for the pointer. It seems we need to resume the discussion. By the way, I've been playing with the idea of failing gracefully and retry indefinitely (or until given -T) on SQL error AND connection issue. It would be useful to test replicating clusters with a (switch|fail)over procedure. Regards,
Converting contrib SQL functions to new style
Attached are some draft patches to convert almost all of the contrib modules' SQL functions to use SQL-standard function bodies. The point of this is to remove the residual search_path security hazards that we couldn't fix in commits 7eeb1d986 et al. Since a SQL-style function body is fully parsed at creation time, its object references are not subject to capture by the run-time search path. Possibly there are small performance benefits too, though I've not tried to measure that. I've not touched the documentation yet. I suppose that we can tone down the warnings added by 7eeb1d986 quite a bit, maybe replacing them with just "be sure to use version x.y or later". However I think we may still need an assumption that earthdistance and cube are in the same schema --- any comments on that? I'd like to propose squeezing these changes into v14, even though we're past feature freeze. Reason one is that this is less a new feature than a security fix; reason two is that this provides some non-artificial test coverage for the SQL-function-body feature. BTW, there still remain a couple of old-style SQL functions in contrib/adminpack and contrib/lo. AFAICS those are unconditionally secure, so I didn't bother with them. Thoughts? regards, tom lane diff --git a/contrib/citext/Makefile b/contrib/citext/Makefile index a7de52928d..d879f4eb0b 100644 --- a/contrib/citext/Makefile +++ b/contrib/citext/Makefile @@ -4,6 +4,7 @@ MODULES = citext EXTENSION = citext DATA = citext--1.4.sql \ + citext--1.6--1.7.sql \ citext--1.5--1.6.sql \ citext--1.4--1.5.sql \ citext--1.3--1.4.sql \ diff --git a/contrib/citext/citext--1.6--1.7.sql b/contrib/citext/citext--1.6--1.7.sql new file mode 100644 index 00..51a7edf2bc --- /dev/null +++ b/contrib/citext/citext--1.6--1.7.sql @@ -0,0 +1,65 @@ +/* contrib/citext/citext--1.6--1.7.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION citext UPDATE TO '1.7'" to load this file. \quit + +-- +-- Matching citext in string comparison functions. +-- XXX TODO Ideally these would be implemented in C. +-- + +CREATE OR REPLACE FUNCTION regexp_match( citext, citext ) RETURNS TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, 'i' ); + +CREATE OR REPLACE FUNCTION regexp_match( citext, citext, text ) RETURNS TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_match( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END ); + +CREATE OR REPLACE FUNCTION regexp_matches( citext, citext ) RETURNS SETOF TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 1 +RETURN pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, 'i' ); + +CREATE OR REPLACE FUNCTION regexp_matches( citext, citext, text ) RETURNS SETOF TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE ROWS 10 +RETURN pg_catalog.regexp_matches( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END ); + +CREATE OR REPLACE FUNCTION regexp_replace( citext, citext, text ) returns TEXT +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, 'i'); + +CREATE OR REPLACE FUNCTION regexp_replace( citext, citext, text, text ) returns TEXT +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_replace( $1::pg_catalog.text, $2::pg_catalog.text, $3, CASE WHEN pg_catalog.strpos($4, 'c') = 0 THEN $4 || 'i' ELSE $4 END); + +CREATE OR REPLACE FUNCTION regexp_split_to_array( citext, citext ) RETURNS TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, 'i' ); + +CREATE OR REPLACE FUNCTION regexp_split_to_array( citext, citext, text ) RETURNS TEXT[] +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_split_to_array( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END ); + +CREATE OR REPLACE FUNCTION regexp_split_to_table( citext, citext ) RETURNS SETOF TEXT +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, 'i' ); + +CREATE OR REPLACE FUNCTION regexp_split_to_table( citext, citext, text ) RETURNS SETOF TEXT +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.regexp_split_to_table( $1::pg_catalog.text, $2::pg_catalog.text, CASE WHEN pg_catalog.strpos($3, 'c') = 0 THEN $3 || 'i' ELSE $3 END ); + +CREATE OR REPLACE FUNCTION strpos( citext, citext ) RETURNS INT +LANGUAGE SQL IMMUTABLE STRICT PARALLEL SAFE +RETURN pg_catalog.strpos( pg_catalog.lower( $1::pg_catalog.text ), pg_catalog.lower( $2::pg_catalog.text ) ); + +CREATE OR REPLACE FUNCTION replace( citext, citext, citext ) RETURNS TEXT +LANGUAGE SQL IMMUTABLE
Re: Uninitialized scalar variable (UNINIT) (src/backend/statistics/extended_stats.c)
On 4/12/21 7:04 PM, Tom Lane wrote: > Ranier Vilela writes: >> Em seg., 12 de abr. de 2021 às 03:04, Tom Lane escreveu: >>> It would be wrong, though, or at least not have the same effect. > >> I think that you speak about fill pointers with 0 is not the same as fill >> pointers with NULL. > > No, I mean that InvalidBlockNumber isn't 0. > >> I was confused here, does the patch follow the pattern and fix the problem >> or not? > > Your patch seems fine. Justin's proposed improvement isn't. > Pushed. > (I'm not real sure whether there's any *actual* bug here --- would we > really be looking at either ctid or tableoid of this temporary tuple? > But it's probably best to ensure that they're valid anyway.)> Yeah, the tuple is only built so that we can pass it to the various selectivity estimators. I don't think anything will be actually looking at those fields, but initializing them seems easy enough. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: New IndexAM API controlling index vacuum strategies
On Wed, Apr 14, 2021 at 4:59 AM Peter Geoghegan wrote: > > On Mon, Apr 12, 2021 at 11:05 PM Masahiko Sawada > wrote: > > I realized that when the failsafe is triggered, we don't bypass heap > > truncation that is performed before updating relfrozenxid. I think > > it's better to bypass it too. What do you think? > > I agree. Bypassing heap truncation is exactly the kind of thing that > risks adding significant, unpredictable delay at a time when we need > to advance relfrozenxid as quickly as possible. > > I pushed a trivial commit that makes the failsafe bypass heap > truncation as well just now. Great, thanks! Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Extensions not dumped when --schema is used
On Tue, Apr 13, 2021 at 08:00:34AM -0700, Noah Misch wrote: > On Tue, Apr 13, 2021 at 02:43:11PM +0900, Michael Paquier wrote: >>> - If extschema='public', "pg_dump -e plpgsql --schema=public" includes >>> commands to dump the relation data. This surprised me. (The >>> --schema=public argument causes selectDumpableNamespace() to set >>> nsinfo->dobj.dump=DUMP_COMPONENT_ALL instead of DUMP_COMPONENT_ACL.) >> >> This one would be expected to me. Per the discussion of upthread, we >> want --schema and --extension to be two separate and exclusive >> switches. So, once the caller specifies --schema we should dump the >> contents of the schema, even if its extension is not listed with >> --extension. > > I may disagree with this later, but I'm setting it aside for the moment. > > This isn't a problem of selecting schemas for inclusion in the dump. This is > a problem of associating extensions with their pg_extension_config_dump() > relations and omitting those extension-member relations when "-e" causes > omission of the extension. At code level, the decision to dump the data of any extension's dumpable table is done in processExtensionTables(). I have to admit that your feeling here keeps the code simpler than what I have been thinking if we apply an extra filtering based on the list of extensions in this code path. So I can see the value in your argument to not dump at all the data of an extension's dumpable table as long as its extension is not listed, and this, even if its schema is explicitly listed. So I got down to make the behavior more consistent with the patch attached. This passes your case. It is worth noting that if a table is part of a schema created by an extension, but that the table is not dependent on the extension, we would still dump its data if using --schema with this table's schema while the extension is not part of the list from --extension. In the attached, that's just the extra test with without_extension_implicit_schema. (By the way, good catch with the duplicated --no-sync in the new tests.) What do you think? -- Michael diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d0ea489614..391947340f 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -18271,7 +18271,8 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[], * Note that we create TableDataInfo objects even in schemaOnly mode, ie, * user data in a configuration table is treated like schema data. This * seems appropriate since system data in a config table would get - * reloaded by CREATE EXTENSION. + * reloaded by CREATE EXTENSION. If the extension is not listed in the + * list of extensions to be included, none of its data is dumped. */ for (i = 0; i < numExtensions; i++) { @@ -18283,6 +18284,15 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[], int nconfigitems = 0; int nconditionitems = 0; + /* + * Check if this extension is listed as to include in the dump. If + * not, any table data associated with it is discarded. + */ + if (extension_include_oids.head != NULL && + !simple_oid_list_member(&extension_include_oids, + curext->dobj.catId.oid)) + continue; + if (strlen(extconfig) != 0 || strlen(extcondition) != 0) { int j; diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl index ef98c08493..51450f1b3a 100644 --- a/src/test/modules/test_pg_dump/t/001_base.pl +++ b/src/test/modules/test_pg_dump/t/001_base.pl @@ -208,6 +208,30 @@ my %pgdump_runs = ( 'pg_dump', '--no-sync', "--file=$tempdir/without_extension.sql", '--extension=plpgsql', 'postgres', ], + }, + + # plgsql in the list of extensions blocks the dump of extension + # test_pg_dump. + without_extension_explicit_schema => { + dump_cmd => [ + 'pg_dump', + '--no-sync', + "--file=$tempdir/without_extension_explicit_schema.sql", + '--extension=plpgsql', + '--schema=public', + 'postgres', + ], + }, + + without_extension_implicit_schema => { + dump_cmd => [ + 'pg_dump', + '--no-sync', + "--file=$tempdir/without_extension_implicit_schema.sql", + '--extension=plpgsql', + '--schema=regress_pg_dump_schema', + 'postgres', + ], },); ### @@ -632,6 +656,8 @@ my %tests = ( pg_dumpall_globals => 1, section_data => 1, section_pre_data => 1, + # Excludes this schema as extension is not listed. + without_extension_explicit_schema => 1, }, }, @@ -646,6 +672,8 @@ my %tests = ( pg_dumpall_globals => 1, section_data => 1, section_pre_data => 1, + # Excludes this schema as extension is not listed. + without_extension_explicit_schema => 1, }, }, @@ -662,6 +690,8 @@ my %tests = ( %full_runs, schema_only => 1, section_pre_data => 1, + # Excludes the extension and keeps the schema's data. + without_extension_impl
Re: Replication slot stats misgivings
On Tue, Apr 13, 2021 at 10:46 AM Masahiko Sawada wrote: > > On Mon, Apr 12, 2021 at 9:16 PM vignesh C wrote: > > > > On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila wrote: > > > > > > On Sat, Apr 10, 2021 at 6:51 PM vignesh C wrote: > > > > > > > > > > Thanks, 0001 and 0002 look good to me. I have a minor comment for 0002. > > > > > > > > > +total_bytesbigint > > > + > > > + > > > +Amount of decoded transactions data sent to the decoding output plugin > > > +while decoding the changes from WAL for this slot. This can be used to > > > +gauge the total amount of data sent during logical decoding. > > > > > > Can we slightly extend it to say something like: Note that this > > > includes the bytes streamed and or spilled. Similarly, we can extend > > > it for total_txns. > > > > > > > Thanks for the comments, the comments are fixed in the v8 patch attached. > > Thoughts? > > Here are review comments on new TAP tests: Thanks for the comments. > +# Create replication slots. > +$node->safe_psql('postgres', > + "SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot1', > 'test_decoding')"); > +$node->safe_psql('postgres', > + "SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot2', > 'test_decoding')"); > +$node->safe_psql('postgres', > + "SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot3', > 'test_decoding')"); > +$node->safe_psql('postgres', > + "SELECT 'init' FROM > pg_create_logical_replication_slot('regression_slot4', > 'test_decoding')"); > > and > > + > +$node->safe_psql('postgres', > + "SELECT data FROM > pg_logical_slot_get_changes('regression_slot1', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1')"); > +$node->safe_psql('postgres', > + "SELECT data FROM > pg_logical_slot_get_changes('regression_slot2', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1')"); > +$node->safe_psql('postgres', > +"SELECT data FROM > pg_logical_slot_get_changes('regression_slot3', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1')"); > +$node->safe_psql('postgres', > + "SELECT data FROM > pg_logical_slot_get_changes('regression_slot4', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1')"); > > I think we can do those similar queries in a single psql connection > like follows: > > # Create replication slots. > $node->safe_psql('postgres', >qq[ > SELECT pg_create_logical_replication_slot('regression_slot1', 'test_decoding'); > SELECT pg_create_logical_replication_slot('regression_slot2', 'test_decoding'); > SELECT pg_create_logical_replication_slot('regression_slot3', 'test_decoding'); > SELECT pg_create_logical_replication_slot('regression_slot4', 'test_decoding'); > ]); > > and > > $node->safe_psql('postgres', >qq[ > SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > ]); > Modified. > --- > +# Wait for the statistics to be updated. > +my $slot1_stat_check_query = > + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name > = 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;"; > +my $slot2_stat_check_query = > + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name > = 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;"; > +my $slot3_stat_check_query = > + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name > = 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;"; > +my $slot4_stat_check_query = > + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name > = 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;"; > + > +# Verify that the statistics have been updated. > +$node->poll_query_until('postgres', $slot1_stat_check_query) > + or die "Timed out while waiting for statistics to be updated"; > +$node->poll_query_until('postgres', $slot2_stat_check_query) > + or die "Timed out while waiting for statistics to be updated"; > +$node->poll_query_until('postgres', $slot3_stat_check_query) > + or die "Timed out while waiting for statistics to be updated"; > +$node->poll_query_until('postgres', $slot4_stat_check_query) > + or die "Timed out while waiting for statistics to be updated"; > > We can simplify the above code to something like: > > $node->poll_query_until( >'postgres', qq[ > SELECT count(slot_name) >= 4 > FROM pg_stat_replication_slots > WHERE slot_name ~ 'regression_slot' > AND total_txns > 0 > AND total_bytes > 0; > ]) or die "Timed out while waiting for
Re: Converting contrib SQL functions to new style
On Tue, Apr 13, 2021 at 06:26:34PM -0400, Tom Lane wrote: > Attached are some draft patches to convert almost all of the > contrib modules' SQL functions to use SQL-standard function bodies. > The point of this is to remove the residual search_path security > hazards that we couldn't fix in commits 7eeb1d986 et al. Since > a SQL-style function body is fully parsed at creation time, > its object references are not subject to capture by the run-time > search path. Are there any inexact matches in those function/operator calls? Will that matter more or less than it does today? > However I think we may still need an assumption that earthdistance > and cube are in the same schema --- any comments on that? That part doesn't change, indeed. > I'd like to propose squeezing these changes into v14, even though > we're past feature freeze. Reason one is that this is less a > new feature than a security fix; reason two is that this provides > some non-artificial test coverage for the SQL-function-body feature. Dogfooding like this is good. What about the SQL-language functions that initdb creates?
Re: Replication slot stats misgivings
On Tue, Apr 13, 2021 at 5:07 PM vignesh C wrote: > > On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada wrote: > > > > On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila wrote: > > > > > > On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada > > > wrote: > > > > > > > > On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila > > > > wrote: > > > > > > > > > > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada > > > > > wrote: > > > > > > > > > > > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila > > > > > > wrote: > > > > > > > > > > > > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems Vignesh has changed patches based on the latest set > > > > > > > > > of > > > > > > > > > comments so you might want to rebase. > > > > > > > > > > > > > > > > I've merged my patch into the v6 patch set Vignesh submitted. > > > > > > > > > > > > > > > > I've attached the updated version of the patches. I didn't > > > > > > > > change > > > > > > > > anything in the patch that changes char[NAMEDATALEN] to > > > > > > > > NameData (0001 > > > > > > > > patch) and patches that add tests. > > > > > > > > > > > > > > > > > > > > > > I think we can push 0001. What do you think? > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > In 0003 patch I reordered the > > > > > > > > output parameters of pg_stat_replication_slots; showing total > > > > > > > > number > > > > > > > > of transactions and total bytes followed by statistics for > > > > > > > > spilled and > > > > > > > > streamed transactions seems appropriate to me. > > > > > > > > > > > > > > > > > > > > > > I am not sure about this because I think we might want to add some > > > > > > > info of stream/spill bytes in total_bytes description (something > > > > > > > like > > > > > > > stream/spill bytes are not in addition to total_bytes). > > > > > > > > BTW doesn't it confuse users that stream/spill bytes are not in > > > > addition to total_bytes? User will need to do "total_bytes + > > > > spill/stream_bytes" to know the actual total amount of data sent to > > > > the decoding output plugin, is that right? > > > > > > > > > > No, total_bytes includes the spill/stream bytes. So, the user doesn't > > > need to do any calculation to compute totel_bytes sent to output > > > plugin. > > > > The following test for the latest v8 patch seems to show different. > > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > > something? > > > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > > 'test_decoding'); > > pg_create_logical_replication_slot > > > > (s,0/1884468) > > (1 row) > > > > postgres(1:85969)=# create table a (i int); > > CREATE TABLE > > postgres(1:85969)=# insert into a select generate_series(1, 10); > > INSERT 0 10 > > postgres(1:85969)=# set logical_decoding_work_mem to 64; > > SET > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 0 | 0 | 0 | 0 | > > 0 | 0 |0 |0 | > > (1 row) > > > > postgres(1:85969)=# select count(*) from > > pg_logical_slot_peek_changes('s', NULL, NULL); > > count > > > > 14 > > (1 row) > > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 2 |1808 | 1 | 202 | > > 1320 | 0 |0 |0 | > > (1 row) > > > > Thanks for identifying this issue, while spilling the transactions > reorder buffer changes gets released, we will not be able to get the > total size for spilled transactions from reorderbuffer size. I have > fixed it by including spilledbytes to totalbytes in case of spilled > transactions. Attached patch has the fix for this. > Thoughts? I've not looked at the patches yet but as Amit mentioned before[1], it's better to move 0002 patch to after 0004. That is, 0001 patch changes data type to NameData, 0002 patch adds total_txn and total_bytes, and 0003 patch adds regression tests. 0004 patch will be the patch using HTAB (was 0002 patch) and get reviewed after pushing 0001, 0002, and 0003 patches. 0005 patch adds more regression tests for the problem 0004 patch addresses. Regards, [1] https
Re: [PATCH] Identify LWLocks in tracepoints
On Wed, 14 Apr 2021 at 04:46, Andres Freund wrote: > > On 2021-04-13 14:25:23 -0400, Robert Haas wrote: > > On Mon, Apr 12, 2021 at 11:06 PM Andres Freund wrote: > > You could identify every lock by a tranche ID + an array offset + a > > "tranche instance ID". But where would you store the tranche instance > > ID to make it readily accessible, other than in the lock itself? > > Andres wasn't thrilled about using even 2 bytes to identify the > > LWLock, so he'll probably like having more bytes in there for that > > kind of thing even less. > > I still don't like the two bytes, fwiw ;). Especially because it's 4 > bytes due to padding right now. Aha, did I hear you say "there are two free bytes for me to shove something marginally useful and irrelevant into"? (*grin*) > I'd like to move the LWLock->waiters list to outside of the lwlock > itself - at most TotalProcs LWLocks can be waited for, so we don't need > millions of empty proclist_heads. That way we could also remove the > proclist indirection - which shows up a fair bit in contended workloads. > > And if we had a separate "lwlocks being waited for" structure, we could > also add more information to it if we wanted to... Having the ability to observe LWLock waiters would be nice. But you're right to constantly emphasise that LWLocks need to be very slim. We don't want to turn them into near-heavyweight locks by saddling them with overhead that's not strictly necessary. A simple pg_regress run (with cassert, admittedly) takes 25,000,000 LWLocks and does 24,000 LWLock waits and 130,000 condvar waits. All in less than a minute. OTOH, once someone's waiting we don't care nearly as much about bookkeeping cost, it's the un-contested fast paths that're most critical. > - We could make it work if we restricted MAX_BACKENDS to be 2**14 - but > while I personally think that's a sane upper limit, I already had a > hard time getting consensus to lower the limit to 2^18-1. 16384 backends is totally fine in sane real world deployments. But it'll probably upset marketing people when OtherDatabaseVendor starts shouting that they support 14 million connections, and postgres only has 16k. Sigh. The real answer here in the long term probably needs to be decoupling of executors from connection state inside postgres. But while we're on that topic, how about we convert the entire codebase to Rust while riding on a flying rainbow unicorn? We're stuck with the 1:1 connection to executor mapping for the foreseeable future. > - Use a 64bit integer. Then we can easily fit MAX_BACKENDS lockers, as > well as an offset to one of MAX_BACKENDS "wait lists" into LWLock. You know much more than me about the possible impacts of that on layout and caching, but I gather that it's probably undesirable to make LWLocks any bigger. > I think it's quite useful for relatively simple things like analyzing > the total amount of time spent in individual locks, without incuring > much overhead when not doing so (for which you need to identify > individual locks, otherwise your end - start time is going to be > meaningless). Yep. That's why the removal of the lock offset is a bit frustrating, because you can't identify an LWLock instance-wide by LWLock* due to the possibility of different per-backend DSM base-address mappings. Well, and ASLR on EXEC_BACKEND systems, but who cares about those? The removal was for good reasons though. And it only affects LWLocks in DSM, for everything else the LWLock* is good enough. If identifying LWLocks in DSM ever matters enough to bother to solve that problem, we can emit trace events on DSM mapping attach in each backend, and trace tools can do the work to track which LWLocks are in DSM and convert their addresses to a reference base address. Pg shouldn't have to pay the price for that unless it's something a lot of people need. > And, slightly more advanced, for analyzing what the stack > was when the lock was released - which then allows you to see what work > you're blocked on, something I found hard to figure out otherwise. > > I found that that's mostly quite doable with dynamic probes though. Yeah, it is. That's part of why my patchset here doesn't try to do a lot to LWLock tracepoints - I didn't think it was necessary to add a lot. The LWLock code is fairly stable, not usually something you have to worry about in production unless you're debugging badly behaved extensions, and usually somewhat probe-able with DWARF based dynamic probes. However, the way the wait-loop and fast-path are in the same function is a serious pain for dynamic probing; you can't tell the difference between a fast-path acquire and an acquire after a wait without using probes on function+offset or probing by source line. Both those are fine for dev work but useless in tool/library scripts. I almost wonder if we should test out moving the LWLock wait-loops out of LWLockAcquire(), LWLockAcquireOrWait() and LWLockWaitForVar() anyway, so the hot parts of the fun
Re: Replication slot stats misgivings
On Wed, Apr 14, 2021 at 7:52 AM Masahiko Sawada wrote: > > On Tue, Apr 13, 2021 at 5:07 PM vignesh C wrote: > > > > On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada > > wrote: > > > > > > On Mon, Apr 12, 2021 at 9:36 PM Amit Kapila > > > wrote: > > > > > > > > On Mon, Apr 12, 2021 at 5:29 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > On Mon, Apr 12, 2021 at 8:08 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > On Mon, Apr 12, 2021 at 4:34 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Mon, Apr 12, 2021 at 6:19 PM Amit Kapila > > > > > > > wrote: > > > > > > > > > > > > > > > > On Mon, Apr 12, 2021 at 10:27 AM Masahiko Sawada > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Sat, Apr 10, 2021 at 9:53 PM Amit Kapila > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It seems Vignesh has changed patches based on the latest > > > > > > > > > > set of > > > > > > > > > > comments so you might want to rebase. > > > > > > > > > > > > > > > > > > I've merged my patch into the v6 patch set Vignesh submitted. > > > > > > > > > > > > > > > > > > I've attached the updated version of the patches. I didn't > > > > > > > > > change > > > > > > > > > anything in the patch that changes char[NAMEDATALEN] to > > > > > > > > > NameData (0001 > > > > > > > > > patch) and patches that add tests. > > > > > > > > > > > > > > > > > > > > > > > > > I think we can push 0001. What do you think? > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > In 0003 patch I reordered the > > > > > > > > > output parameters of pg_stat_replication_slots; showing total > > > > > > > > > number > > > > > > > > > of transactions and total bytes followed by statistics for > > > > > > > > > spilled and > > > > > > > > > streamed transactions seems appropriate to me. > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure about this because I think we might want to add > > > > > > > > some > > > > > > > > info of stream/spill bytes in total_bytes description > > > > > > > > (something like > > > > > > > > stream/spill bytes are not in addition to total_bytes). > > > > > > > > > > BTW doesn't it confuse users that stream/spill bytes are not in > > > > > addition to total_bytes? User will need to do "total_bytes + > > > > > spill/stream_bytes" to know the actual total amount of data sent to > > > > > the decoding output plugin, is that right? > > > > > > > > > > > > > No, total_bytes includes the spill/stream bytes. So, the user doesn't > > > > need to do any calculation to compute totel_bytes sent to output > > > > plugin. > > > > > > The following test for the latest v8 patch seems to show different. > > > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > > > something? > > > > > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > > > 'test_decoding'); > > > pg_create_logical_replication_slot > > > > > > (s,0/1884468) > > > (1 row) > > > > > > postgres(1:85969)=# create table a (i int); > > > CREATE TABLE > > > postgres(1:85969)=# insert into a select generate_series(1, 10); > > > INSERT 0 10 > > > postgres(1:85969)=# set logical_decoding_work_mem to 64; > > > SET > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > > ---++-++-+-+-+--+--+- > > > s | 0 | 0 | 0 | 0 | > > > 0 | 0 |0 |0 | > > > (1 row) > > > > > > postgres(1:85969)=# select count(*) from > > > pg_logical_slot_peek_changes('s', NULL, NULL); > > > count > > > > > > 14 > > > (1 row) > > > > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > > ---++-++-+-+-+--+--+- > > > s | 2 |1808 | 1 | 202 | > > > 1320 | 0 |0 |0 | > > > (1 row) > > > > > > > Thanks for identifying this issue, while spilling the transactions > > reorder buffer changes gets released, we will not be able to get the > > total size for spilled transactions from reorderbuffer size. I have > > fixed it by including spilledbytes to totalbytes in case of spilled > > transactions. Attached patch has the fix for this. > > Thoughts? > > I've not looked at the patches yet but as Amit mentioned before[1], > it's better to move 0002 patch to after 0004. That is, 0001 patch > changes
Re: Truncate in synchronous logical replication failed
On Tue, 13 Apr 2021 at 21:54, osumi.takami...@fujitsu.com wrote: > On Monday, April 12, 2021 3:58 PM Amit Kapila wrote: >> On Mon, Apr 12, 2021 at 10:03 AM osumi.takami...@fujitsu.com >> wrote: >> > but if we take a measure to fix the doc, we have to be careful for the >> > description, because when we remove the primary keys of 'test' tables on >> > the >> scenario in [1], we don't have this issue. >> > It means TRUNCATE in synchronous logical replication is not always >> blocked. >> > >> >> The problem happens only when we try to fetch IDENTITY_KEY attributes >> because pgoutput uses RelationGetIndexAttrBitmap() to get that information >> which locks the required indexes. Now, because TRUNCATE has already >> acquired an exclusive lock on the index, it seems to create a sort of >> deadlock >> where the actual Truncate operation waits for logical replication of >> operation to >> complete and logical replication waits for actual Truncate operation to >> finish. >> >> Do we really need to use RelationGetIndexAttrBitmap() to build IDENTITY_KEY >> attributes? During decoding, we don't even lock the main relation, we just >> scan >> the system table and build that information using a historic snapshot. Can't >> we >> do something similar here? > I think we can build the IDENTITY_KEY attributes with NoLock > instead of calling RelationGetIndexAttrBitmap(). > > When we trace back the caller side of logicalrep_write_attrs(), > doing the thing equivalent to RelationGetIndexAttrBitmap() > for INDEX_ATTR_BITMAP_IDENTITY_KEY impacts only pgoutput_truncate. > > OTOH, I can't find codes similar to RelationGetIndexAttrBitmap() > in pgoutput_* functions and in the file of relcache.c. > Therefore, I'd like to discuss how to address the hang. > > My first idea is to extract some parts of RelationGetIndexAttrBitmap() > only for INDEX_ATTR_BITMAP_IDENTITY_KEY and implement those > either in a logicalrep_write_attrs() or as a new function. > RelationGetIndexAttrBitmap() has 'restart' label for goto statement > in order to ensure to return up-to-date attribute bitmaps, so > I prefer having a new function when we choose this direction. > Having that goto in logicalrep_write_attrs() makes it a little bit messy, I > felt. > > The other direction might be to extend RelationGetIndexAttrBitmap's function > definition > to accept lockmode to give NoLock from logicalrep_write_attrs(). > But, this change impacts on other several callers so is not as good as the > first direction above, I think. > > If someone has any better idea, please let me know. > I think the first idea is better than the second. OTOH, can we release the locks before SyncRepWaitForLSN(), since it already flush to local WAL files. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: [PATCH] Identify LWLocks in tracepoints
On Wed, 14 Apr 2021 at 02:25, Robert Haas wrote: > So before the commit in question -- > 3761fe3c20bb040b15f0e8da58d824631da00caa -- T_ID() used to compute an > offset for a lock within the tranche that was supposed to uniquely > identify the lock. However, the whole idea of an array per tranche > turns out to be broken by design. Yeah, I understand that. I'd really love it if a committer could add an explanatory comment or two in the area though. I'm happy to draft a comment patch if anyone agrees my suggestion is sensible. The key things I needed to know when studying the code were: * A LWLock* is always part of a tranche, but locks within a given tranche are not necessarily co-located in memory, cross referenced or associated in any way. * A LWLock tranche does not track how many LWLocks are in the tranche or where they are in memory. It only groups up LWLocks into categories and maps the tranche ID to a name. * Not all LWLocks are part of the main LWLock array; others can be embedded in shmem structs elsewhere, including in DSM segments. * LWLocks in DSM segments may not have the same address between different backends, because the DSM base address can vary, so a LWLock* cannot be reliably compared between backends unless you know it's in the main LWLock array or in static shmem. Having that info in lwlock.c near the tranche management code or the tranche and main lwlock arrays would've been very handy. > You could try to identify locks by pointer addresses, but that's got > security hazards and the addreses aren't portable across all the > backends involved in the parallel query because of how DSM works, so > it's not really that helpful in terms of matching stuff up. What I'm doing now is identifying them by LWLock* across backends. I keep track of DSM segment mappings in each backend inside the trace script and I relocate LWLock* pointers known to be inside DSM segments relative to a dummy base address so they're equal across backends. It's a real pain, but it works. The main downside is that the trace script has to observe the DSM attach; if it's started once a backend already has the DSM segment attached, it has no idea the LWLock is in a DSM segment or how to remap it. But that's not a serious issue. > On a broader level, I agree that being able to find out what the > system is doing is really important. But I'm also not entirely > convinced that having really fine-grained information here to > distinguish between one lock and another is the way to get there. At the start of this thread I would've disagreed with you. But yeah, you and Andres are right, because the costs outweigh the benefits here. I'm actually inclined to revise the patch I sent in order to *remove* the LWLock name from the tracepoint argument. At least for the fast-path tracepoints on start-of-acquire and end-of-acquire. I think it's probably OK to report it in the lock wait tracepoints, which is where it's most important to have anyway. So the tracepoint will always report the LWLock* and tranche ID, but it won't report the tranche name for the fast-path. I'll add trace events for tranche ID registration, so trace tools can either remember the tranche ID->name mappings from there, or capture them from lock wait events and remember them. Reasonable? That way we retain the most important trace functionality, but we reduce the overheads.
Re: [PATCH] Identify LWLocks in tracepoints
On Wed, 14 Apr 2021 at 10:41, Craig Ringer wrote: > On Wed, 14 Apr 2021 at 02:25, Robert Haas wrote: > > You could try to identify locks by pointer addresses, but that's got > > security hazards and the addreses aren't portable across all the > > backends involved in the parallel query because of how DSM works, so > > it's not really that helpful in terms of matching stuff up. > > What I'm doing now is identifying them by LWLock* across backends. I > keep track of DSM segment mappings in each backend inside the trace > script and I relocate LWLock* pointers known to be inside DSM segments > relative to a dummy base address so they're equal across backends. BTW, one of the reasons I did this was to try to identify BDR and pglogical code that blocks or sleeps while holding a LWLock. I got stuck on that for other reasons, so it didn't go anywhere, but those issues are now resolved so I should probably return to it at some point. It'd be a nice thing to be able to run on postgres itself too.
Re: Replication slot stats misgivings
On Wed, Apr 14, 2021 at 8:04 AM vignesh C wrote: > > On Wed, Apr 14, 2021 at 7:52 AM Masahiko Sawada wrote: > > > > I've not looked at the patches yet but as Amit mentioned before[1], > > it's better to move 0002 patch to after 0004. That is, 0001 patch > > changes data type to NameData, 0002 patch adds total_txn and > > total_bytes, and 0003 patch adds regression tests. 0004 patch will be > > the patch using HTAB (was 0002 patch) and get reviewed after pushing > > 0001, 0002, and 0003 patches. 0005 patch adds more regression tests > > for the problem 0004 patch addresses. > > I will make the change for this and post a patch for this. > Currently we have kept total_txns and total_bytes at the beginning of > pg_stat_replication_slots, I did not see any conclusion on this. I > preferred it to be at the beginning. > Thoughts? > I prefer those two fields after spill and stream fields. I have mentioned the same in one of the emails above. -- With Regards, Amit Kapila.
Re: ModifyTable overheads in generic plans
On Wed, Apr 7, 2021 at 5:18 PM Amit Langote wrote: > On Wed, Apr 7, 2021 at 8:24 AM Tom Lane wrote: > > I also could not get excited about postponing initialization of RETURNING > > or WITH CHECK OPTIONS expressions. I grant that that can be helpful > > when those features are used, but I doubt that RETURNING is used that > > heavily, and WITH CHECK OPTIONS is surely next door to nonexistent > > in performance-critical queries. If the feature isn't used, the cost > > of the existing code is about zero. So I couldn't see that it was worth > > the amount of code thrashing and risk of new bugs involved. > > Okay. > > > The bit you > > noted about EXPLAIN missing a subplan is pretty scary in this connection; > > I'm not at all sure that that's just cosmetic. > > Yeah, this and... I looked into this and can't see why this isn't just cosmetic as far as ModifyTable is concerned. "EXPLAIN missing a subplan" here just means that ModifyTableState.PlanState.subPlan is not set. Besides ExplainNode(), only ExecReScan() looks at PlanState.subPlan, and that does not seem relevant to ModifyTable, because it doesn't support rescanning. I don't see any such problems with creating RETURNING projections on-demand either. > > (Having said that, I'm wondering if there are bugs in these cases for > > cross-partition updates that target a previously-not-used partition. > > So we might have things to fix anyway.) > > ...this would need to be looked at a bit more closely, which I'll try > to do sometime later this week. Given the above, I can't think of any undiscovered problems related to WCO and RETURNING expression states in the cases where cross-partition updates target partitions that need to be initialized by ExecInitPartitionInfo(). Here is the result for the test case in updatable_views.sql modified to use partitioning and cross-partition updates: CREATE TABLE base_tbl (a int) partition by range (a); CREATE TABLE base_tbl1 PARTITION OF base_tbl FOR VALUES FROM (1) TO (6); CREATE TABLE base_tbl2 PARTITION OF base_tbl FOR VALUES FROM (6) TO (11); CREATE TABLE base_tbl3 PARTITION OF base_tbl FOR VALUES FROM (11) TO (15); CREATE TABLE ref_tbl (a int PRIMARY KEY); INSERT INTO ref_tbl SELECT * FROM generate_series(1,10); CREATE VIEW rw_view1 AS SELECT * FROM base_tbl b WHERE EXISTS(SELECT 1 FROM ref_tbl r WHERE r.a = b.a) WITH CHECK OPTION; INSERT INTO rw_view1 VALUES (1); INSERT 0 1 INSERT INTO rw_view1 VALUES (11); ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). -- Both are cross-partition updates where the target relation is -- lazily initialized in ExecInitPartitionInfo(), along with the WCO -- qual ExprState UPDATE rw_view1 SET a = a + 5 WHERE a = 1; UPDATE 1 UPDATE rw_view1 SET a = a + 5 WHERE a = 6; ERROR: new row violates check option for view "rw_view1" DETAIL: Failing row contains (11). EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5); QUERY PLAN -- Insert on base_tbl b -> Result (2 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 1; QUERY PLAN Update on base_tbl b Update on base_tbl1 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 1) -> Seq Scan on base_tbl1 b_1 Filter: (a = 1) (7 rows) EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5 WHERE a = 6; QUERY PLAN Update on base_tbl b Update on base_tbl2 b_1 -> Nested Loop -> Index Scan using ref_tbl_pkey on ref_tbl r Index Cond: (a = 6) -> Seq Scan on base_tbl2 b_1 Filter: (a = 6) (7 rows) Patch attached. I tested the performance benefit of doing this by modifying the update query used in earlier benchmarks to have a RETURNING * clause, getting the following TPS numbers: -Mprepared (plan_cache_mode=force_generic_plan) nparts 10cols 20cols 40cols HEAD 64 10909 90677171 128 690356244161 256 374830562219 1024953 738 427 Patched 64 13817 13395 12754 128 927191028279 256 534552075083 1024146314431389 Also, I don't see much impact of checking if (node->returningLists) in the per-result-rel initialization loop in the common cases where there's no RETURNING. -- Amit Langote EDB: http://www.enterprisedb.com 0001-Initialize-WITH-CHECK-OPTIONS-and-RETURNING-expressi.patch Description: Binary data
Re: Converting contrib SQL functions to new style
Noah Misch writes: > On Tue, Apr 13, 2021 at 06:26:34PM -0400, Tom Lane wrote: >> Attached are some draft patches to convert almost all of the >> contrib modules' SQL functions to use SQL-standard function bodies. >> The point of this is to remove the residual search_path security >> hazards that we couldn't fix in commits 7eeb1d986 et al. Since >> a SQL-style function body is fully parsed at creation time, >> its object references are not subject to capture by the run-time >> search path. > Are there any inexact matches in those function/operator calls? Will that > matter more or less than it does today? I can't claim to have looked closely for inexact matches. It should matter less than today, since there's a hazard only during creation (with a somewhat-controlled search path) and not during use. But that doesn't automatically eliminate the issue. >> I'd like to propose squeezing these changes into v14, even though >> we're past feature freeze. Reason one is that this is less a >> new feature than a security fix; reason two is that this provides >> some non-artificial test coverage for the SQL-function-body feature. > Dogfooding like this is good. What about the SQL-language functions that > initdb creates? Hadn't thought about those, but converting them seems like a good idea. regards, tom lane
Re: TRUNCATE on foreign table
On Tue, Apr 13, 2021 at 8:30 PM Fujii Masao wrote: > On 2021/04/13 23:25, Kohei KaiGai wrote: > > 2021年4月13日(火) 21:03 Bharath Rupireddy > > : > >> Yeah, ONLY clause is not pushed to the remote server in case of SELECT > >> commands. This is also true for DELETE and UPDATE commands on foreign > >> tables. > > This sounds reasonable reason why ONLY should be ignored in TRUNCATE on > foreign tables, for now. If there is the existing rule about how to treat > ONLY clause for foreign tables, basically TRUNCATE should follow that at this > stage. Maybe we can change the rule, but it's an item for v15 or later? > > >> I'm not sure if it wasn't thought necessary or if there is an > >> issue to push it or I may be missing something here. > > I could not find the past discussion about foreign tables and ONLY clause. > I guess that ONLY is ignored in SELECT on foreign tables case because ONLY > is interpreted outside the executor and it's not easy to change the executor > so that ONLY is passed to FDW. Maybe.. > > > >> I think we can > >> start a separate thread to see other hackers' opinions on this. > >> > >> I'm not sure whether all the clauses that are possible for > >> SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote > >> server by postgres_fdw. > >> > >> Well, now foreign TRUNCATE pushes the ONLY clause to the remote server > >> which is inconsistent when compared to SELECT/UPDATE/DELETE commands. > >> If we were to keep it consistent across all foreign commands that > >> ONLY clause is not pushed to remote server, then we can restrict for > >> TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, > >> just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I > >> don't see any real problem in pushing the ONLY clause, at least in > >> case of TRUNCATE. > >> > > If ONLY-clause would be pushed down to the remote query of postgres_fdw, > > what does the foreign-table represent in the local system? > > > > In my understanding, a local foreign table by postgres_fdw is a > > representation of > > entire tree of the remote parent table and its children. > > If so, I'm still wondering why CASCADE/RESTRICT (i.e., DropBehavior) needs to > be passed to FDW. IOW, if a foreign table is an abstraction of an external > data source, ISTM that postgres_fdw should always issue TRUNCATE with > CASCADE. Why do we need to allow RESTRICT to be specified for a foreign table > even though it's an abstraction of an external data source? IMHO, we can push all the TRUNCATE options (ONLY, RESTRICTED, CASCADE, RESTART/CONTINUE IDENTITY), because it doesn't have any major challenge(implementation wise) unlike pushing some clauses in SELECT/UPDATE/DELETE and we already do this on the master. It doesn't look good and may confuse users, if we push some options and restrict others. We should have an explicit note in the documentation saying we push all these options to the remote server. We can leave it to the user to write TRUNCATE for foreign tables with the appropriate options. If somebody complains about a problem that they will face with this behavior, we can revisit. This is my opinion, others may disagree. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: TRUNCATE on foreign table
2021年4月14日(水) 0:00 Fujii Masao : > > On 2021/04/13 23:25, Kohei KaiGai wrote: > > 2021年4月13日(火) 21:03 Bharath Rupireddy > > : > >> Yeah, ONLY clause is not pushed to the remote server in case of SELECT > >> commands. This is also true for DELETE and UPDATE commands on foreign > >> tables. > > This sounds reasonable reason why ONLY should be ignored in TRUNCATE on > foreign tables, for now. If there is the existing rule about how to treat > ONLY clause for foreign tables, basically TRUNCATE should follow that at this > stage. Maybe we can change the rule, but it's an item for v15 or later? > > > >> I'm not sure if it wasn't thought necessary or if there is an > >> issue to push it or I may be missing something here. > > I could not find the past discussion about foreign tables and ONLY clause. > I guess that ONLY is ignored in SELECT on foreign tables case because ONLY > is interpreted outside the executor and it's not easy to change the executor > so that ONLY is passed to FDW. Maybe.. > > > >> I think we can > >> start a separate thread to see other hackers' opinions on this. > >> > >> I'm not sure whether all the clauses that are possible for > >> SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote > >> server by postgres_fdw. > >> > >> Well, now foreign TRUNCATE pushes the ONLY clause to the remote server > >> which is inconsistent when compared to SELECT/UPDATE/DELETE commands. > >> If we were to keep it consistent across all foreign commands that > >> ONLY clause is not pushed to remote server, then we can restrict for > >> TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, > >> just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I > >> don't see any real problem in pushing the ONLY clause, at least in > >> case of TRUNCATE. > >> > > If ONLY-clause would be pushed down to the remote query of postgres_fdw, > > what does the foreign-table represent in the local system? > > > > In my understanding, a local foreign table by postgres_fdw is a > > representation of > > entire tree of the remote parent table and its children. > > If so, I'm still wondering why CASCADE/RESTRICT (i.e., DropBehavior) needs to > be passed to FDW. IOW, if a foreign table is an abstraction of an external > data source, ISTM that postgres_fdw should always issue TRUNCATE with > CASCADE. Why do we need to allow RESTRICT to be specified for a foreign table > even though it's an abstraction of an external data source? > Please assume the internal heap data is managed by PostgreSQL core, and external data source is managed by postgres_fdw (or other FDW driver). TRUNCATE command requires these object managers to eliminate the data on behalf of the foreign tables picked up. Even though the object manager tries to eliminate the managed data, it may be restricted by some reason; FK restrictions in case of PostgreSQL internal data. In this case, CASCADE/RESTRICT option suggests the object manager how to handle the target data. The ONLY clause controls whoes data shall be eliminated. On the other hand, CASCADE/RESTRICT and CONTINUE/RESTART controls how data shall be eliminated. It is a primitive difference. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: TRUNCATE on foreign table
At Wed, 14 Apr 2021 13:17:55 +0900, Kohei KaiGai wrote in > 2021年4月14日(水) 0:00 Fujii Masao : > > > > On 2021/04/13 23:25, Kohei KaiGai wrote: > > > 2021年4月13日(火) 21:03 Bharath Rupireddy > > > : > > >> Yeah, ONLY clause is not pushed to the remote server in case of SELECT > > >> commands. This is also true for DELETE and UPDATE commands on foreign > > >> tables. > > > > This sounds reasonable reason why ONLY should be ignored in TRUNCATE on > > foreign tables, for now. If there is the existing rule about how to treat > > ONLY clause for foreign tables, basically TRUNCATE should follow that at > > this > > stage. Maybe we can change the rule, but it's an item for v15 or later? > > > > > > >> I'm not sure if it wasn't thought necessary or if there is an > > >> issue to push it or I may be missing something here. > > > > I could not find the past discussion about foreign tables and ONLY clause. > > I guess that ONLY is ignored in SELECT on foreign tables case because ONLY > > is interpreted outside the executor and it's not easy to change the executor > > so that ONLY is passed to FDW. Maybe.. > > > > > > >> I think we can > > >> start a separate thread to see other hackers' opinions on this. > > >> > > >> I'm not sure whether all the clauses that are possible for > > >> SELECT/UPDATE/DELETE/INSERT with local tables are pushed to the remote > > >> server by postgres_fdw. > > >> > > >> Well, now foreign TRUNCATE pushes the ONLY clause to the remote server > > >> which is inconsistent when compared to SELECT/UPDATE/DELETE commands. > > >> If we were to keep it consistent across all foreign commands that > > >> ONLY clause is not pushed to remote server, then we can restrict for > > >> TRUNCATE too and even if "TRUNCATE ONLY foreign_tbl" is specified, > > >> just pass "TRUNCATE foreign_tbl" to remote server. Having said that, I > > >> don't see any real problem in pushing the ONLY clause, at least in > > >> case of TRUNCATE. > > >> > > > If ONLY-clause would be pushed down to the remote query of postgres_fdw, > > > what does the foreign-table represent in the local system? > > > > > > In my understanding, a local foreign table by postgres_fdw is a > > > representation of > > > entire tree of the remote parent table and its children. > > > > If so, I'm still wondering why CASCADE/RESTRICT (i.e., DropBehavior) needs > > to > > be passed to FDW. IOW, if a foreign table is an abstraction of an external > > data source, ISTM that postgres_fdw should always issue TRUNCATE with > > CASCADE. Why do we need to allow RESTRICT to be specified for a foreign > > table > > even though it's an abstraction of an external data source? > > > Please assume the internal heap data is managed by PostgreSQL core, and > external data source is managed by postgres_fdw (or other FDW driver). > TRUNCATE command requires these object managers to eliminate the data > on behalf of the foreign tables picked up. > > Even though the object manager tries to eliminate the managed data, it may be > restricted by some reason; FK restrictions in case of PostgreSQL internal > data. > In this case, CASCADE/RESTRICT option suggests the object manager how > to handle the target data. > > The ONLY clause controls whoes data shall be eliminated. > On the other hand, CASCADE/RESTRICT and CONTINUE/RESTART controls > how data shall be eliminated. It is a primitive difference. I object to unconditionally push ONLY to remote. As Kaigai-san said that it works an apparent wrong way when a user wants to truncate only the specified foreign table in a inheritance tree and there's no way to avoid the behavior. I also don't think it is right to push down CASCADE/RESTRICT. The options suggest to propagate truncation to *local* referrer tables from the *foreign* table, not to the remote referrer tables from the original table on remote. If a user want to allow that behavior it should be specified by foreign table options. (It is bothersome when someone wants to specify the behavior on-the-fly.) alter foreign table ft1 options (add truncate_cascade 'true'); Also, CONTINUE/RESTART IDENTITY should not work since foreign tables don't have an identity-sequence. However, this we might be able to push down the options since it affects only the target table. I would accept that behavior if TRUNCATE were "TRUNCATE FOREIGN TABLE", which explicitly targets a foreign table. But I'm not sure it is possible to add such syntax reasonable way. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [PATCH] force_parallel_mode and GUC categories
On Tue, Apr 13, 2021 at 10:12:35AM -0400, Tom Lane wrote: > The following parameters are intended for developer testing, and > should never be enabled for production work. However, some of > them can be used to assist with the recovery of severely > damaged databases. Okay, that's fine by me. -- Michael signature.asc Description: PGP signature
Re: Performance Evaluation of Result Cache by using TPC-DS
Hello David, Thank you for your reply. > Can you share if these times were to run EXPLAIN ANALYZE or if they > were just the queries being executed normally? These times were to run EXPLAIN ANALYZE. I executed each query twice, and the **average** execution time was shown in the table of the last e-mail. Therefore, the result of the table is not simply equal to that of the attached file. I'm sorry for the insufficient explanation. > It would be really great if you could show the EXPLAIN (ANALYZE, > TIMING OFF) for query 62. There's a chance that the slowdown comes > from the additional EXPLAIN ANALYZE timing overhead with the Result > Cache version. I ran query 62 by "EXPLAIN (ANALYZE, TIMING OFF)" and normally. I attached these execution results to this e-mail. At this time, I executed each query only once (not twice). The results are as follows. Method | Execution time with result cache (s) | Execution time without result cache (s) | Speedup ratio EXPLAIN (ANALYZE, TIMING ON) 67.161 59.615-12.66% EXPLAIN (ANALYZE, TIMING OFF) 66.142 60.660 -9.04% Normal66.611 60.955 -9.28% Although there is variation in the execution time, the speedup ratio is around -10%. So, the result cache has a 10% regression in query 62. The overhead of EXPLAIN ANALYZE and TIMING ON do not seem to be high. Best regards, Yuya Watari On Tue, Apr 13, 2021 at 7:13 PM David Rowley wrote: > > On Tue, 13 Apr 2021 at 21:29, Yuya Watari wrote: > > I used the TPC-DS scale factor 100 in the evaluation. I executed all > > of the 99 queries in the TPC-DS, and the result cache worked in the 21 > > queries of them. However, some queries took too much time, so I > > skipped their execution. I set work_mem to 256MB, and > > max_parallel_workers_per_gather to 0. > > Many thanks for testing this. > > > As you can see from these results, many queries have a negative > > speedup ratio, which means that there are negative impacts on the > > query performance. In query 62, the execution time increased by > > 11.36%. I guess these regressions are due to the misestimation of the > > cost in the planner. I attached the execution plan of query 62. > > Can you share if these times were to run EXPLAIN ANALYZE or if they > were just the queries being executed normally? > > The times in the two files you attached do look very similar to the > times in your table, so I suspect either TIMING ON is not that high an > overhead on your machine, or the results are that of EXPLAIN ANALYZE. > > It would be really great if you could show the EXPLAIN (ANALYZE, > TIMING OFF) for query 62. There's a chance that the slowdown comes > from the additional EXPLAIN ANALYZE timing overhead with the Result > Cache version. > > > The result cache is currently enabled by default. However, if this > > kind of performance regression is common, we have to change its > > default behavior. > > Yes, the feedback we get during the beta period will help drive that > decision or if the costing needs to be adjusted. > > David SET QUERY PLAN Limit (cost=2816266.54..2816267.79 rows=100 width=110) (actual rows=100 loops=1) -> Sort (cost=2816266.54..2816267.38 rows=336 width=110) (actual rows=100 loops=1) Sort Key: (substr((warehouse.w_warehouse_name)::text, 1, 20)), ship_mode.sm_type, web_site.web_name Sort Method: top-N heapsort Memory: 49kB -> HashAggregate (cost=2816248.24..2816252.44 rows=336 width=110) (actual rows=360 loops=1) Group Key: substr((warehouse.w_warehouse_name)::text, 1, 20), ship_mode.sm_type, web_site.web_name Batches: 1 Memory Usage: 157kB -> Hash Join (cost=2510.74..2794150.54 rows=368295 width=78) (actual rows=14336926 loops=1) Hash Cond: (web_sales.ws_ship_mode_sk = ship_mode.sm_ship_mode_sk) -> Hash Join (cost=2509.29..2792056.55 rows=368356 width=36) (actual rows=14337390 loops=1) Hash Cond: (web_sales.ws_web_site_sk = web_site.web_site_sk) -> Hash Join (cost=2506.75..2790916.14 rows=368430 width=33) (actual rows=14338265 loops=1) Hash Cond: (web_sales.ws_warehouse_sk = warehouse.w_warehouse_sk) -> Hash Join (cost=2505.41..2789674.19 rows=368516 width=20) (actual rows=14340028 loops=1) Hash Cond: (web_sales.ws_ship_date_sk = date_dim.d_date_sk) -> Seq Scan on web_sales (cost=0.00..2598153.08 rows=72001808 width=20) (actual rows=72001237 loops=1)
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Tue, Apr 13, 2021 at 04:39:58PM +0900, Michael Paquier wrote: > Looks fine to me. Let's wait a bit first to see if Fujii-san has any > objections to this cleanup as that's his code originally, from > 32a9c0bd. And hearing nothing, done. The tests of postgres_fdw are getting much faster for me now, from basically 6s to 4s (actually that's roughly 1.8s of gain as pg_wait_until_termination waits at least 100ms, twice), so that's a nice gain. -- Michael signature.asc Description: PGP signature
jsonb subscripting assignment performance
Hi, commit 676887a3 added support for jsonb subscripting. Many thanks for working on this. I really like the improved syntax. I was also hoping for some performance benefits, but my testing shows that jsonb_value['existing_key'] = new_value; takes just as long time as jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value); which is a bit surprising to me. Shouldn't subscripting be a lot faster, since it could modify the existing data structure in-place? What am I missing here? I came to think of the this new functionality when trying to optimize some PL/pgSQL code where the bottle-neck turned out to be lots of calls to jsonb_set() for large jsonb objects. Here is the output from attached bench: n=1 00:00:00.002628 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); 00:00:00.002778 jsonb := jsonb_set(jsonb, ARRAY[new key], value); 00:00:00.002332 jsonb[existing key] := value; 00:00:00.002794 jsonb[new key] := value; n=10 00:00:00.042843 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); 00:00:00.046515 jsonb := jsonb_set(jsonb, ARRAY[new key], value); 00:00:00.044974 jsonb[existing key] := value; 00:00:00.075429 jsonb[new key] := value; n=100 00:00:00.420808 jsonb := jsonb_set(jsonb, ARRAY[existing key], value); 00:00:00.449622 jsonb := jsonb_set(jsonb, ARRAY[new key], value); 00:00:00.31834 jsonb[existing key] := value; 00:00:00.527904 jsonb[new key] := value; Many thanks for clarifying. Best regards, Joel jsonb-bench.sql Description: Binary data
Re: Replication slot stats misgivings
On Tue, Apr 13, 2021 at 1:37 PM vignesh C wrote: > > On Mon, Apr 12, 2021 at 7:03 PM Masahiko Sawada wrote: > > > > > > The following test for the latest v8 patch seems to show different. > > total_bytes is 1808 whereas spill_bytes is 1320. Am I missing > > something? > > > > postgres(1:85969)=# select pg_create_logical_replication_slot('s', > > 'test_decoding'); > > pg_create_logical_replication_slot > > > > (s,0/1884468) > > (1 row) > > > > postgres(1:85969)=# create table a (i int); > > CREATE TABLE > > postgres(1:85969)=# insert into a select generate_series(1, 10); > > INSERT 0 10 > > postgres(1:85969)=# set logical_decoding_work_mem to 64; > > SET > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 0 | 0 | 0 | 0 | > > 0 | 0 |0 |0 | > > (1 row) > > > > postgres(1:85969)=# select count(*) from > > pg_logical_slot_peek_changes('s', NULL, NULL); > > count > > > > 14 > > (1 row) > > > > postgres(1:85969)=# select * from pg_stat_replication_slots ; > > slot_name | total_txns | total_bytes | spill_txns | spill_count | > > spill_bytes | stream_txns | stream_count | stream_bytes | stats_reset > > ---++-++-+-+-+--+--+- > > s | 2 |1808 | 1 | 202 | > > 1320 | 0 |0 |0 | > > (1 row) > > > > Thanks for identifying this issue, while spilling the transactions > reorder buffer changes gets released, we will not be able to get the > total size for spilled transactions from reorderbuffer size. I have > fixed it by including spilledbytes to totalbytes in case of spilled > transactions. Attached patch has the fix for this. > Thoughts? > I am not sure if that is the best way to fix it because sometimes we clear the serialized flag in which case it might not give the correct answer. Another way to fix it could be that before we try to restore a new set of changes, we update totalBytes counter. See, the attached patch atop your v6-0002-* patch. -- With Regards, Amit Kapila. fix_spilled_stats_1.patch Description: Binary data
Re: [PATCH] force_parallel_mode and GUC categories
On Tue, Apr 13, 2021 at 07:31:39AM -0500, Justin Pryzby wrote: > Good point. Thanks. I have used the wording that Tom has proposed upthread, added one GUC_NOT_IN_SAMPLE that you forgot, and applied the force_parallel_mode patch. -- Michael signature.asc Description: PGP signature