Re: Schema variables - new implementation for Postgres 15
> [0001-schema-variables-20211219-2.patch] > [0002-schema-variables-20211219-2.patch] Hi Pavel, You said earlier > 1. The schema variables were renamed to session variable But I still see: $ grep -Eic 'schema variable' postgres.html 15 (postgres.html from 'make postgres.html') So that rename doesn't seem finished. Erik
postgres_fdw: incomplete subabort cleanup of connections used in async execution
Hi, While working on [1], I noticed $SUBJECT: postgres_fdw resets the per-connection states of connections, which store async requests sent to remote servers in async_capable mode, during post-abort (pgfdw_xact_callback()), but it fails to do so during post-subabort (pgfdw_subxact_callback()). This causes a crash when re-executing a query that was aborted in a subtransaction: postgres=# create table t (a text, b text); postgres=# create or replace function slow_data (name text, duration float) returns setof t as $$ begin perform pg_sleep(duration); return query select name, generate_series(1, 1000)::text; end; $$ language plpgsql; postgres=# create view v1 as select * from slow_data('foo', 2.5); postgres=# create view v2 as select * from slow_data('bar', 5.0); postgres=# create extension postgres_fdw; postgres=# create server loopback1 foreign data wrapper postgres_fdw options (dbname 'postgres', async_capable 'true'); postgres=# create server loopback2 foreign data wrapper postgres_fdw options (dbname 'postgres', async_capable 'true'); postgres=# create user mapping for current_user server loopback1; postgres=# create user mapping for current_user server loopback2; postgres=# create foreign table list_p1 (a text, b text) server loopback1 options (table_name 'v1'); postgres=# create foreign table list_p2 (a text, b text) server loopback2 options (table_name 'v2'); postgres=# create table list_pt (a text, b text) partition by list (a); postgres=# alter table list_pt attach partition list_p1 for values in ('foo'); postgres=# alter table list_pt attach partition list_p2 for values in ('bar'); postgres=# begin; BEGIN postgres=*# savepoint s1; SAVEPOINT postgres=*# select count(*) from list_pt; ^CCancel request sent ERROR: canceling statement due to user request postgres=!# rollback to savepoint s1; ROLLBACK postgres=*# select count(*) from list_pt; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. When canceling the SELECT, an async request sent to a remote server using a connection is canceled, but it’s stored in the per-connection state of the connection even after the failed subtransaction for the reason above, so when re-executing the SELECT, postgres_fdw processes the invalid async request to re-use the connection in GetConnection(), causing a segmentation fault. This would be my oversight in commit 27e1f1456. :-( To fix, I modified pgfdw_abort_cleanup() to reset the per-connection state in the post-subabort case as well. Also, I modified the initialization so that it’s done only if necessary, to save cycles, and improved a comment on the initialization a bit. Attached is a patch for that. Best regards, Etsuro Fujita fix-postgres-fdw-subabort-cleanup.patch Description: Binary data
sqlsmith: ERROR: XX000: bogus varno: 2
I reduced the problematic query to this. SELECT 1 FROM pg_rewrite WHERE pg_get_function_arg_default(ev_class, 1) !~~ pg_get_expr(ev_qual, ev_class, false); #0 pg_re_throw () at elog.c:1800 #1 0x563f5d027932 in errfinish () at elog.c:593 #2 0x563f5cb874ee in resolve_special_varno (node=0x563f5dd0f7e0, context=0x7ffcf0daf250, callback=0x563f5cfca270 , callback_arg=0x0) at ruleutils.c:7319 #3 0x563f5cfca044 in get_variable () at ruleutils.c:7086 #4 0x563f5cfc7c58 in get_rule_expr () at ruleutils.c:8363 #5 0x563f5cfc97a6 in get_oper_expr (context=0x7ffcf0daf250, expr=0x563f5dd0f6f0) at ruleutils.c:9626 #6 get_rule_expr () at ruleutils.c:8472 #7 0x563f5cfcdc37 in deparse_expression_pretty (expr=expr@entry=0x563f5dd0f6f0, dpcontext=0x563f5dd10488, forceprefix=forceprefix@entry=false, showimplicit=showimplicit@entry=false, prettyFlags=prettyFlags@entry=2, startIndent=0) at ruleutils.c:3558 #8 0x563f5cfce661 in pg_get_expr_worker (expr=, relid=12104, relname=0x563f5dd10130 "pg_settings", prettyFlags=2) at ruleutils.c:2645 #9 0x563f5cd6540b in ExecInterpExpr () at execExprInterp.c:1272 #10 0x563f5cd73c5f in ExecEvalExprSwitchContext (isNull=0x7ffcf0daf3a7, econtext=0x563f5dd08a00, state=0x563f5dd0a270) at ../../../src/include/executor/executor.h:339 #11 ExecQual (econtext=0x563f5dd08a00, state=0x563f5dd0a270) at ../../../src/include/executor/executor.h:408 #12 ExecScan (node=0x563f5dd09328, accessMtd=0x563f5cd9e790 , recheckMtd=0x563f5cd9e780 ) at execScan.c:227 #13 0x563f5cd69f73 in ExecProcNode (node=0x563f5dd09328) at ../../../src/include/executor/executor.h:257 #14 ExecutePlan (execute_once=, dest=0x563f5dd18a80, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, use_parallel_mode=, planstate=0x563f5dd09328, estate=0x563f5dd08790) at execMain.c:1600 #15 standard_ExecutorRun () at execMain.c:410 #16 0x563f5cf0460f in PortalRunSelect () at pquery.c:924 #17 0x563f5cf05bf1 in PortalRun () at pquery.c:768 #18 0x563f5cf019b2 in exec_simple_query () at postgres.c:1215 #19 0x563f5cf0370a in PostgresMain () at postgres.c:4498 #20 0x563f5ce6e479 in BackendRun (port=, port=) at postmaster.c:4594 #21 BackendStartup (port=) at postmaster.c:4322 #22 ServerLoop () at postmaster.c:1802 #23 0x563f5ce6f47c in PostmasterMain () at postmaster.c:1474 #24 0x563f5cb9a0c0 in main (argc=5, argv=0x563f5dc653f0) at main.c:198 While reducing the query, I got a related error: SELECT 1 FROM pg_rewrite WHERE pg_get_function_arg_default(ev_class, 1) !~~ pg_get_expr(ev_qual, 0, false); ERROR: XX000: bogus varlevelsup: 0 offset 0 LOCATION: get_variable, ruleutils.c:7003 Both errors are reproducible back to at least v10.
Re: Getting rid of regression test input/ and output/ files
I wrote: > This led me to wonder why we couldn't get rid of that entire > mechanism in favor of some less-painful way of getting that > information into the scripts. If we had the desired values in > psql variables, we could do what we need easily, for example ... Here's some fleshed-out patches for this. 0001 adds the \getenv command to psql; now with documentation and a simple regression test. 0002 tweaks pg_regress to export the needed values as environment variables, and modifies the test scripts to use those variables. (For ease of review, this patch modifies the scripts in-place, and then 0003 will move them.) A few comments on this: * I didn't see any value in exporting @testtablespace@ as a separate variable; we might as well let the test script know how to construct that path name. * I concluded that the right way to handle the concatenation issue is *not* to rely on SQL literal concatenation, but to use psql's \set command to concatenate parts of a string. In particular this gives us a clean way to handle quoting/escaping rules in the places where a pathname has to be embedded in some larger string, such as a function body. The golden rule for that seems to be "use one \set per level of quoting". I believe this code is now fairly proof against situations that would completely break the existing way of doing things, such as pathnames with quotes or backslashes in them. (It's hard to test the embedded-quote case, because that breaks the Makefiles too, but I did get through the regression tests with a path including a backslash.) * There are a couple of places where the existing tests involve substituting a path name into expected query output or error messages. This technique cannot handle that, but we have plenty of prior art for dealing with such cases. I changed file_fdw to use a filter function to hide the pathnames in EXPLAIN output, and tweaked create_function_0 to show only an edited version of an error message (this is based on a similar case in infinite_recurse.sql). 0003 simply "git mv"'s the scripts and output files into place as normal not-requiring-editing files. Be careful to "make clean" before applying this, else you may have conflicts with the target files already being present. Also, while you can run the tests between 0003 and 0004, don't do "make clean" in this state or the hacky EXTRA_CLEAN rules in dblink and file_fdw will remove files you want. 0004 finally removes the no-longer-needed infrastructure in pg_regress and the makefiles. (BTW, as far as I can find, the MSVC scripts have no provisions for cleaning these generated files?) There's some refactoring that could be done afterwards, for example there seems little reason for dblink's paths.sql to continue to exist as a separate script. But it seemed best for this patch series to convert the scripts as mechanically as possible. I'm fairly pleased with how this came out. I think these scripts will be *much* easier to maintain in this form. Updating the output/*.source files was always a major pain in the rear, since you couldn't just copy results/ files to them. Comments? regards, tom lane diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 48248f750e..ae38d3dcc3 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -2237,6 +2237,28 @@ Tue Oct 26 21:40:57 CEST 1999 + +\getenv psql_var env_var + + + + Gets the value of the environment + variable env_var + and assigns it to the psql + variable psql_var. + If env_var is + not defined in the psql process's + environment, psql_var + is not changed. Example: + +=> \getenv home HOME +=> \echo :home +/home/postgres + + + + + \gexec diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index ccd7b48108..fb3bab9494 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -98,6 +98,8 @@ static backslashResult process_command_g_options(char *first_option, bool active_branch, const char *cmd); static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch); +static backslashResult exec_command_getenv(PsqlScanState scan_state, bool active_branch, + const char *cmd); static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch); static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch); @@ -348,6 +350,8 @@ exec_command(const char *cmd, status = exec_command_g(scan_state, active_branch, cmd); else if (strcmp(cmd, "gdesc") == 0) status = exec_command_gdesc(scan_state, active_branch); + else if (strcmp(cmd, "getenv") == 0) + status = exec_command_getenv(scan_state, active_branch, cmd); else if (strcmp(cmd, "gexe
Re: sqlsmith: ERROR: XX000: bogus varno: 2
Justin Pryzby writes: > I reduced the problematic query to this. > SELECT 1 FROM pg_rewrite WHERE > pg_get_function_arg_default(ev_class, 1) !~~ > pg_get_expr(ev_qual, ev_class, false); Or more simply, regression=# select pg_get_expr(ev_qual, ev_class, false) from pg_rewrite where rulename = 'pg_settings_u'; ERROR: bogus varno: 2 I don't see anything particularly surprising here. pg_get_expr is only able to cope with expression trees over a single relation, but ON UPDATE rules can refer to both OLD and NEW relations. Maybe we could make the error message more friendly, but there's not much else to be done, I think. regards, tom lane
Re: Getting rid of regression test input/ and output/ files
> > > 0001 adds the \getenv command to psql; now with documentation > and a simple regression test. > +1. Wish I had added this years ago when I had a need for it. > > 0002 tweaks pg_regress to export the needed values as environment > variables, and modifies the test scripts to use those variables. > (For ease of review, this patch modifies the scripts in-place, > and then 0003 will move them.) A few comments on this: > > * I didn't see any value in exporting @testtablespace@ as a separate > variable; we might as well let the test script know how to construct > that path name. > > * I concluded that the right way to handle the concatenation issue > is *not* to rely on SQL literal concatenation, but to use psql's > \set command to concatenate parts of a string. In particular this > +1 to that, much better than the multi-line thing. I have a nitpick about the \getenv FOO FOO lines. It's a new function to everyone, and to anyone who hasn't seen the documentation it won't be immediately obvious which one is the ENV var and which one is the local var. Lowercasing the local var would be a way to reinforce which is which to the reader. It would also be consistent with var naming in the rest of the script. > > 0004 finally removes the no-longer-needed infrastructure in > +1 Deleted code is debugged code.
Re: Getting rid of regression test input/ and output/ files
Corey Huinker writes: > I have a nitpick about the \getenv FOO FOO lines. > It's a new function to everyone, and to anyone who hasn't seen the > documentation it won't be immediately obvious which one is the ENV var and > which one is the local var. Lowercasing the local var would be a way to > reinforce which is which to the reader. It would also be consistent with > var naming in the rest of the script. Reasonable idea. Another thing I was wondering about was whether to attach PG_ prefixes to the environment variable names, since those are in a more-or-less global namespace. If we do that, then a different method for distinguishing the psql variables is to not prefix them. regards, tom lane
Re: row filtering for logical replication
On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila wrote: > > On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow wrote: > > > > On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian wrote: > > > > > > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow > > > wrote: > > > > > > > So using the v47 patch-set, I still find that the UPDATE above results > > > > in publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to > > > > (2,1). > > > > This is according to the 2nd UPDATE rule below, from patch 0003. > > > > > > > > + * old-row (no match)new-row (no match) -> (drop change) > > > > + * old-row (no match)new row (match) -> INSERT > > > > + * old-row (match) new-row (no match) -> DELETE > > > > + * old-row (match) new row (match) -> UPDATE > > > > > > > > This is because the old row (1,1) doesn't match the UPDATE filter > > > > "(a>1)", but the new row (2,1) does. > > > > This functionality doesn't seem right to me. I don't think it can be > > > > assumed that (1,1) was never published (and thus requires an INSERT > > > > rather than UPDATE) based on these checks, because in this example, > > > > (1,1) was previously published via a different operation - INSERT (and > > > > using a different filter too). > > > > I think the fundamental problem here is that these UPDATE rules assume > > > > that the old (current) row was previously UPDATEd (and published, or > > > > not published, according to the filter applicable to UPDATE), but this > > > > is not necessarily the case. > > > > Or am I missing something? > > > > > > But it need not be correct in assuming that the old-row was part of a > > > previous INSERT either (and published, or not published according to > > > the filter applicable to an INSERT). > > > For example, change the sequence of inserts and updates prior to the > > > last update: > > > > > > truncate tbl1 ; > > > insert into tbl1 values (1,5); ==> not replicated since insert and ! (b < > > > 2); > > > update tbl1 set b = 1; ==> not replicated since update and ! (a > 1) > > > update tbl1 set a = 2; ==> replicated and update converted to insert > > > since (a > 1) > > > > > > In this case, the last update "update tbl1 set a = 2; " is updating a > > > row that was previously updated and not inserted and not replicated to > > > the subscriber. > > > How does the replication logic differentiate between these two cases, > > > and decide if the update was previously published or not? > > > I think it's futile for the publisher side to try and figure out the > > > history of published rows. In fact, if this level of logic is required > > > then it is best implemented on the subscriber side, which then defeats > > > the purpose of a publication filter. > > > > > > > I think it's a concern, for such a basic example with only one row, > > getting unpredictable (and even wrong) replication results, depending > > upon the order of operations. > > > > I am not sure how we can deduce that. The results are based on current > and new values of row which is what I think we are expecting here. > > > Doesn't this problem result from allowing different WHERE clauses for > > different pubactions for the same table? > > My current thoughts are that this shouldn't be allowed, and also WHERE > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to > > using only columns covered by the replica identity or primary key. > > > > Hmm, even if we do that one could have removed the insert row filter > by the time we are evaluating the update. So, we will get the same > result. I think the behavior in your example is as we expect as per > the specs defined by the patch and I don't see any problem, in this > case, w.r.t replication results. Let us see what others think on this? > I think currently there could be a problem with user perceptions. IMO a user would be mostly interested in predictability and getting results that are intuitive. So, even if all strange results can (after careful examination) be after-the-fact explained away as being "correct" according to a spec, I don't think that is going to make any difference. e.g. regardless of correctness, even if it just "appeared" to give unexpected results then a user may just decide that row-filtering is not worth their confusion... Perhaps there is a slightly dumbed-down RF design that can still be useful, but which can give much more comfort to the user because the replica will be more like what they were expecting? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Getting rid of regression test input/ and output/ files
On Sun, Dec 19, 2021 at 5:48 PM Tom Lane wrote: > Corey Huinker writes: > > I have a nitpick about the \getenv FOO FOO lines. > > It's a new function to everyone, and to anyone who hasn't seen the > > documentation it won't be immediately obvious which one is the ENV var > and > > which one is the local var. Lowercasing the local var would be a way to > > reinforce which is which to the reader. It would also be consistent with > > var naming in the rest of the script. > > Reasonable idea. Another thing I was wondering about was whether > to attach PG_ prefixes to the environment variable names, since > those are in a more-or-less global namespace. If we do that, > then a different method for distinguishing the psql variables > is to not prefix them. +1 to that as well. Which brings up a tangential question, is there value in having something that brings in one or more env vars as psql vars directly. I'm thinking something like: \importenv pattern [prefix] (alternate names: \getenv_multi \getenv_pattern, \getenvs, etc) which could be used like \importenv PG* env_ which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward names but leaving no doubt about where a previously unreferenced variable came from. I don't *think* we need it for this specific case, but since the subject of env vars has come up I thought I'd throw it out there.
Re: Getting rid of regression test input/ and output/ files
Corey Huinker writes: > Which brings up a tangential question, is there value in having something > that brings in one or more env vars as psql vars directly. I'm thinking > something like: > \importenv pattern [prefix] Meh ... considering we've gone this long without any getenv capability in psql at all, that seems pretty premature. regards, tom lane
PublicationActions - use bit flags.
For some reason the current HEAD PublicationActions is a struct of boolean representing combinations of the 4 different "publication actions". I felt it is more natural to implement boolean flag combinations using a bitmask instead of a struct of bools. IMO using the bitmask also simplifies assignment and checking of said flags. PSA a small patch for this. Thoughts? -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-PublicationActions-use-bit-flags.patch Description: Binary data
Re: row filtering for logical replication
On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila wrote: > > > > > I think it's a concern, for such a basic example with only one row, > > getting unpredictable (and even wrong) replication results, depending > > upon the order of operations. > > > > I am not sure how we can deduce that. The results are based on current > and new values of row which is what I think we are expecting here. > In the two simple cases presented, the publisher ends up with the same single row (2,1) in both cases, but in one of the cases the subscriber ends up with an extra row (1,1) that the publisher doesn't have. So, in using a "filter", a new row has been published that the publisher doesn't have. I'm not so sure a user would be expecting that. Not to mention that if (1,1) is subsequently INSERTed on the publisher side, it will result in a duplicate key error on the publisher. > > Doesn't this problem result from allowing different WHERE clauses for > > different pubactions for the same table? > > My current thoughts are that this shouldn't be allowed, and also WHERE > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to > > using only columns covered by the replica identity or primary key. > > > > Hmm, even if we do that one could have removed the insert row filter > by the time we are evaluating the update. So, we will get the same > result. I think the behavior in your example is as we expect as per > the specs defined by the patch and I don't see any problem, in this > case, w.r.t replication results. Let us see what others think on this? > Here I'm talking about the typical use-case of setting the row-filtering WHERE clause up-front and not changing it thereafter. I think that dynamically changing filters after INSERT/UPDATE/DELETE operations is not the typical use-case, and IMHO it's another thing entirely (could result in all kinds of unpredictable, random results). Personally I think it would make more sense to: 1) Disallow different WHERE clauses on the same table, for different pubactions. 2) If only INSERTs are being published, allow any column in the WHERE clause, otherwise (as for UPDATE and DELETE) restrict the referenced columns to be part of the replica identity or primary key. Regards, Greg Nancarrow Fujitsu Australia
RE: row filtering for logical replication
> -Original Message- > From: Amit Kapila On Saturday, December 18, 2021 10:33 AM > On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow > wrote: > > > > On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian wrote: > > > > > > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow > wrote: > > > > > > > So using the v47 patch-set, I still find that the UPDATE above results > > > > in > publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to (2,1). > > > > This is according to the 2nd UPDATE rule below, from patch 0003. > > > > > > > > + * old-row (no match)new-row (no match) -> (drop change) > > > > + * old-row (no match)new row (match) -> INSERT > > > > + * old-row (match) new-row (no match) -> DELETE > > > > + * old-row (match) new row (match) -> UPDATE > > > > > > > > This is because the old row (1,1) doesn't match the UPDATE filter > > > > "(a>1)", > but the new row (2,1) does. > > > > This functionality doesn't seem right to me. I don't think it can be > assumed that (1,1) was never published (and thus requires an INSERT rather > than > UPDATE) based on these checks, because in this example, (1,1) was previously > published via a different operation - INSERT (and using a different filter > too). > > > > I think the fundamental problem here is that these UPDATE rules assume > that the old (current) row was previously UPDATEd (and published, or not > published, according to the filter applicable to UPDATE), but this is not > necessarily the case. > > > > Or am I missing something? > > > > > > But it need not be correct in assuming that the old-row was part of > > > a previous INSERT either (and published, or not published according > > > to the filter applicable to an INSERT). > > > For example, change the sequence of inserts and updates prior to the > > > last update: > > > > > > truncate tbl1 ; > > > insert into tbl1 values (1,5); ==> not replicated since insert and ! > > > (b < 2); update tbl1 set b = 1; ==> not replicated since update and > > > ! (a > 1) update tbl1 set a = 2; ==> replicated and update converted > > > to insert since (a > 1) > > > > > > In this case, the last update "update tbl1 set a = 2; " is updating > > > a row that was previously updated and not inserted and not > > > replicated to the subscriber. > > > How does the replication logic differentiate between these two > > > cases, and decide if the update was previously published or not? > > > I think it's futile for the publisher side to try and figure out the > > > history of published rows. In fact, if this level of logic is > > > required then it is best implemented on the subscriber side, which > > > then defeats the purpose of a publication filter. > > > > > > > I think it's a concern, for such a basic example with only one row, > > getting unpredictable (and even wrong) replication results, depending > > upon the order of operations. > > > > I am not sure how we can deduce that. The results are based on current and > new values of row which is what I think we are expecting here. > > > Doesn't this problem result from allowing different WHERE clauses for > > different pubactions for the same table? > > My current thoughts are that this shouldn't be allowed, and also WHERE > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to > > using only columns covered by the replica identity or primary key. > > > > Hmm, even if we do that one could have removed the insert row filter by the > time we are evaluating the update. So, we will get the same result. I think > the > behavior in your example is as we expect as per the specs defined by the patch > and I don't see any problem, in this case, w.r.t replication results. Let us > see > what others think on this? I think it might not be hard to predict the current behavior. User only need to be aware of that: 1) pubaction and row filter on different publications are combined with 'OR'. 2) FOR UPDATE, we execute the fiter for both OLD and NEW tuple and would change the operation type accordingly. For the example mentioned: create table tbl1 (a int primary key, b int); create publication A for table tbl1 where (b<2) with(publish='insert'); create publication B for table tbl1 where (a>1) with(publish='update'); If we follow the rule 1) and 2), I feel we are able to predict the following conditions: -- WHERE (action = 'insert' AND b < 2) OR (action = 'update' AND a > 1) -- So, it seems acceptable to me. Personally, I think the current design could give user more flexibility to handle some complex scenario. If user want some simple setting for publication, they can also set same row filter for the same table in different publications. To avoid confusion, I think we can document about these rules clearly. BTW, From the document of IBM, I think IBM also support this kind of complex condition [1]. [1] https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables Best regards, Hou zj
Re: parallel vacuum comments
On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila wrote: > > On Fri, Dec 17, 2021 at 10:51 AM Masahiko Sawada > wrote: > > > > I've attached updated patches. The first patch just moves common > > function for index bulk-deletion and cleanup to vacuum.c. And the > > second patch moves parallel vacuum code to vacuumparallel.c. The > > comments I got so far are incorporated into these patches. Please > > review them. > > > > Thanks, it is helpful for the purpose of review. > > Few comments: > = > 1. > + * dead_items stores TIDs whose index tuples are deleted by index vacuuming. > + * Each TID points to an LP_DEAD line pointer from a heap page that has been > + * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, which > + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass. > */ > - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ > + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ > > Isn't it better to keep these comments atop the structure VacDeadItems > declaration? I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving VacDeadItems to vacuum.c, I thought it's better to keep it as generic TID storage. > > 2. What is the reason for not moving > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they > can be called from vacuumlazy.c and vacuumparallel.c? Without this > refactoring patch, I think both leader and workers set the same error > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index > vacuuming? Is it because you want a separate context phase for a > parallel vacuum? Since the phases defined as VacErrPhase like VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc. and error callback function, vacuum_error_callback(), are specific to heap, I thought it'd not be a good idea to move lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and vacuumparallel.c can use the phases and error callback function. > The other thing related to this is that if we have to > do the way you have it here then we don't need pg_rusage_init() in > functions lazy_vacuum_one_index/lazy_cleanup_one_index. Right. It should be removed. > > 3. > @@ -3713,7 +3152,7 @@ update_index_statistics(LVRelState *vacrel) > int nindexes = vacrel->nindexes; > IndexBulkDeleteResult **indstats = vacrel->indstats; > > - Assert(!IsInParallelMode()); > + Assert(!ParallelVacuumIsActive(vacrel)); > > I think we can retain the older Assert. If we do that then I think we > don't need to define ParallelVacuumIsActive in vacuumlazy.c. Right, will fix in the next version patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: [PATCH] pg_stat_toast
Dear Gunnar, > postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); > postgres=# INSERT INTO test SELECT > i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM > generate_series(0,10) x(i); > postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; > -[ RECORD 1 ]+-- > schemaname | public > reloid | 16829 > attnum | 2 > relname | test > attname | lz4 > externalizations | 0 > compressions | 11 > compressionsuccesses | 11 > compressionsizesum | 6299710 > originalsizesum | 320403204 > -[ RECORD 2 ]+-- > schemaname | public > reloid | 16829 > attnum | 3 > relname | test > attname | std > externalizations | 0 > compressions | 11 > compressionsuccesses | 11 > compressionsizesum | 8198819 > originalsizesum | 320403204 I'm not sure about TOAST, but currently compressions are configurable: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294 How about adding a new attribute "method" to pg_stat_toast? ToastAttrInfo *attr->tai_compression represents how compress the data, so I think it's easy to add. Or, is it not needed because pg_attr has information? Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: row filtering for logical replication
On Wednesday, December 8, 2021 2:29 PM Amit Kapila wrote: > > On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira wrote: > > > > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote: > > > > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira wrote: > > > > > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote: > > > > > > PS> I will update the commit message in the next version. I barely > > > changed the > > > documentation to reflect the current behavior. I probably missed some > changes > > > but I will fix in the next version. > > > > > > I realized that I forgot to mention a few things about the UPDATE > > > behavior. > > > Regardless of 0003, we need to define which tuple will be used to > > > evaluate the > > > row filter for UPDATEs. We already discussed it circa [1]. This current > > > version > > > chooses *new* tuple. Is it the best choice? > > > > But with 0003, we are using both the tuple for evaluating the row > > filter, so instead of fixing 0001, why we don't just merge 0003 with > > 0001? I mean eventually, 0003 is doing what is the agreed behavior, > > i.e. if just OLD is matching the filter then convert the UPDATE to > > DELETE OTOH if only new is matching the filter then convert the UPDATE > > to INSERT. Do you think that even we merge 0001 and 0003 then also > > there is an open issue regarding which row to select for the filter? > > > > Maybe I was not clear. IIUC we are still discussing 0003 and I would like to > > propose a different default based on the conclusion I came up. If we merged > > 0003, that's fine; this change will be useless. If we don't or it is > > optional, > > it still has its merit. > > > > Do we want to pay the overhead to evaluating both tuple for UPDATEs? I'm > > still > > processing if it is worth it. If you think that in general the row filter > > contains the primary key and it is rare to change it, it will waste cycles > > evaluating the same expression twice. It seems this behavior could be > > controlled by a parameter. > > > > I think the first thing we should do in this regard is to evaluate the > performance for both cases (when we apply a filter to both tuples vs. > to one of the tuples). In case the performance difference is > unacceptable, I think it would be better to still compare both tuples > as default to avoid data inconsistency issues and have an option to > allow comparing one of the tuples. > I did some performance tests to see if 0003 patch has much overhead. With which I compared applying first two patches and applying first three patches in four cases: 1) only old rows match the filter. 2) only new rows match the filter. 3) both old rows and new rows match the filter. 4) neither old rows nor new rows match the filter. 0003 patch checks both old rows and new rows, and without 0003 patch, it only checks either old or new rows. We want to know whether it would take more time if we check the old rows. I ran the tests in asynchronous mode and compared the SQL execution time. I also tried some complex filters, to see if the difference could be more obvious. The result and the script are attached. I didn’t see big difference between the result of applying 0003 patch and the one not in all cases. So I think 0003 patch doesn’t have much overhead. Regards, Tang perf.sh Description: perf.sh
Re: row filtering for logical replication
On Mon, Dec 20, 2021 at 6:07 AM Greg Nancarrow wrote: > > On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila wrote: > > > > > > > > I think it's a concern, for such a basic example with only one row, > > > getting unpredictable (and even wrong) replication results, depending > > > upon the order of operations. > > > > > > > I am not sure how we can deduce that. The results are based on current > > and new values of row which is what I think we are expecting here. > > > > In the two simple cases presented, the publisher ends up with the same > single row (2,1) in both cases, but in one of the cases the subscriber > ends up with an extra row (1,1) that the publisher doesn't have. So, > in using a "filter", a new row has been published that the publisher > doesn't have. I'm not so sure a user would be expecting that. Not to > mention that if (1,1) is subsequently INSERTed on the publisher side, > it will result in a duplicate key error on the publisher. > Personally, I feel users need to be careful in defining publications and subscriptions, otherwise, there are various ways "duplicate key error .." kind of issues can arise. Say, you different publications which publish the same table, and then you have different subscriptions on the subscriber which subscribe to those publications. > > > Doesn't this problem result from allowing different WHERE clauses for > > > different pubactions for the same table? > > > My current thoughts are that this shouldn't be allowed, and also WHERE > > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to > > > using only columns covered by the replica identity or primary key. > > > > > > > Hmm, even if we do that one could have removed the insert row filter > > by the time we are evaluating the update. So, we will get the same > > result. I think the behavior in your example is as we expect as per > > the specs defined by the patch and I don't see any problem, in this > > case, w.r.t replication results. Let us see what others think on this? > > > > Here I'm talking about the typical use-case of setting the > row-filtering WHERE clause up-front and not changing it thereafter. > I think that dynamically changing filters after INSERT/UPDATE/DELETE > operations is not the typical use-case, and IMHO it's another thing > entirely (could result in all kinds of unpredictable, random results). > Yeah, that's what I also wanted to say that but users need to carefully define publications/subscriptions, otherwise, with up-front definition also leads to unpredictable results as shared in the explanation above. I feel Hou-San's latest email [1] explains the current rules very well and maybe we should document them in some way to avoid confusion. > Personally I think it would make more sense to: > 1) Disallow different WHERE clauses on the same table, for different > pubactions. > 2) If only INSERTs are being published, allow any column in the WHERE > clause, otherwise (as for UPDATE and DELETE) restrict the referenced > columns to be part of the replica identity or primary key. > We can restrict in some way like you are saying or we can even restrict such that we "disallow specifying row filters unless pubactions have all the dml operations and allow row filter to have columns that are part of replica identity or primary key". I feel it is better to provide flexibility as the current patch does and document it to make users aware of the kind of problems that can arise with the wrong usage. [1] - https://www.postgresql.org/message-id/OS0PR01MB57168F4384D50656A4FC2DC5947B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Mon, Dec 20, 2021 at 12:51 PM houzj.f...@fujitsu.com wrote: > > I think it might not be hard to predict the current behavior. User only need > to be > aware of that: > 1) pubaction and row filter on different publications are combined with 'OR'. > 2) FOR UPDATE, we execute the fiter for both OLD and NEW tuple and would > change >the operation type accordingly. > > For the example mentioned: > create table tbl1 (a int primary key, b int); > create publication A for table tbl1 where (b<2) with(publish='insert'); > create publication B for table tbl1 where (a>1) with(publish='update'); > > If we follow the rule 1) and 2), I feel we are able to predict the following > conditions: > -- > WHERE (action = 'insert' AND b < 2) OR (action = 'update' AND a > 1) > -- > > So, it seems acceptable to me. > > Personally, I think the current design could give user more flexibility to > handle some complex scenario. If user want some simple setting for > publication, > they can also set same row filter for the same table in different > publications. > To avoid confusion, I think we can document about these rules clearly. > > BTW, From the document of IBM, I think IBM also support this kind of complex > condition [1]. > [1] https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables Yes, I agree with this. It's better to give users more flexibility while warning him on what the consequences are rather than restricting him with constraints. We could explain this in the documentation so that users can better predict the effect of having pubaction specific filters. regards, Ajin Cherian Fujitsu Australia
RE: Confused comment about drop replica identity index
On Tue, Dec 16, 2021 at 10:27AM, Michael Paquier wrote: > On Tue, Dec 16, 2021 at 06:40AM, Michael Paquier wrote: > > Would you like to write a patch to address all that? > > OK, I will push it soon. Here is a patch to correct wrong comment about REPLICA_IDENTITY_INDEX, And improve the pg-doc. Regards, Wang wei Correct-wrong-comment-about-REPLICA_IDENTITY_INDEX.patch Description: Correct-wrong-comment-about-REPLICA_IDENTITY_INDEX.patch
Re: parallel vacuum comments
On Mon, Dec 20, 2021 at 8:33 AM Masahiko Sawada wrote: > > On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila wrote: > > > > Few comments: > > = > > 1. > > + * dead_items stores TIDs whose index tuples are deleted by index > > vacuuming. > > + * Each TID points to an LP_DEAD line pointer from a heap page that has > > been > > + * processed by lazy_scan_prune. Also needed by lazy_vacuum_heap_rel, > > which > > + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap > > pass. > > */ > > - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ > > + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */ > > > > Isn't it better to keep these comments atop the structure VacDeadItems > > declaration? > > I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving > VacDeadItems to vacuum.c, I thought it's better to keep it as generic > TID storage. > Okay, that makes sense. > > > > 2. What is the reason for not moving > > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they > > can be called from vacuumlazy.c and vacuumparallel.c? Without this > > refactoring patch, I think both leader and workers set the same error > > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index > > vacuuming? Is it because you want a separate context phase for a > > parallel vacuum? > > Since the phases defined as VacErrPhase like > VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc. > and error callback function, vacuum_error_callback(), are specific to > heap, I thought it'd not be a good idea to move > lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and > vacuumparallel.c can use the phases and error callback function. > How about exposing it via heapam.h? We have already exposed a few things via heapam.h (see /* in heap/vacuumlazy.c */). In the current proposal, we need to have separate callbacks and phases for index vacuuming so that it can be used by both vacuumlazy.c and vacuumparallel.c which might not be a good idea. -- With Regards, Amit Kapila.
Re: Getting rid of regression test input/ and output/ files
On Sun, Dec 19, 2021 at 7:00 PM Tom Lane wrote: > Corey Huinker writes: > > Which brings up a tangential question, is there value in having something > > that brings in one or more env vars as psql vars directly. I'm thinking > > something like: > > > \importenv pattern [prefix] > > Meh ... considering we've gone this long without any getenv capability > in psql at all, that seems pretty premature. > > regards, tom lane > Fair enough. Patches didn't apply with `git apply` but did fine with `patch -p1`, from there it passes make check-world.
Re: row filtering for logical replication
On Thu, Dec 2, 2021 at 7:40 PM vignesh C wrote: > ... > Thanks for the updated patch, few comments: > 1) Both testpub5a and testpub5c publication are same, one of them can be > removed > +SET client_min_messages = 'ERROR'; > +CREATE PUBLICATION testpub5a FOR TABLE testpub_rf_tbl1 WHERE (a > 1) > WITH (publish="insert"); > +CREATE PUBLICATION testpub5b FOR TABLE testpub_rf_tbl1; > +CREATE PUBLICATION testpub5c FOR TABLE testpub_rf_tbl1 WHERE (a > 3) > WITH (publish="insert"); > +RESET client_min_messages; > +\d+ testpub_rf_tbl1 > +DROP PUBLICATION testpub5a, testpub5b, testpub5c; > > testpub5b will be covered in the earlier existing case above: > ALTER PUBLICATION testpib_ins_trunct ADD TABLE pub_test.testpub_nopk, > testpub_tbl1; > > \d+ pub_test.testpub_nopk > \d+ testpub_tbl1 > > I felt test related to testpub5b is also not required Skipped. Strictly speaking you may be correct to say this code path is already tested elsewhere. But this test case was meant for \d+ so I wanted it to be "self-contained" and easy to observe it displaying both with and without a filters both at the same time. > 3) testpub7 can be renamed to testpub6 to maintain the continuity > since the previous testpub6 did not succeed: > +CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func, LEFTARG = integer, > RIGHTARG = integer); > +CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27); > +-- fail - WHERE not allowed in DROP > +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27); > +-- fail - cannot ALTER SET table which is a member of a pre-existing schema > +SET client_min_messages = 'ERROR'; > +CREATE PUBLICATION testpub7 FOR ALL TABLES IN SCHEMA testpub_rf_myschema1; > +ALTER PUBLICATION testpub7 SET ALL TABLES IN SCHEMA > testpub_rf_myschema1, TABLE testpub_rf_myschema1.testpub_rf_tb16; > +RESET client_min_messages; > Fixed in v48 [1] > 4) Did this test intend to include where clause in testpub_rf_tb16, if > so it can be added: > +-- fail - cannot ALTER SET table which is a member of a pre-existing schema > +SET client_min_messages = 'ERROR'; > +CREATE PUBLICATION testpub7 FOR ALL TABLES IN SCHEMA testpub_rf_myschema1; > +ALTER PUBLICATION testpub7 SET ALL TABLES IN SCHEMA > testpub_rf_myschema1, TABLE testpub_rf_myschema1.testpub_rf_tb16; > +RESET client_min_messages; > Fixed in v48 [1] -- [1] https://www.postgresql.org/message-id/CAHut%2BPuHz1oFM7oaiHeqxMQqd0L70bV_hT7u_mDf3b8As5kwig%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: simplifying foreign key/RI checks
> > > > I wasn't able to make much inroads into how we might be able to get > rid of the DETACH-related partition descriptor hacks, the item (3), > though I made some progress on items (1) and (2). > > For (1), the attached 0001 patch adds a new isolation suite > fk-snapshot.spec to exercise snapshot behaviors in the cases where we > no longer go through SPI. It helped find some problems with the > snapshot handling in the earlier versions of the patch, mainly with > partitioned PK tables. It also contains a test along the lines of the > example you showed upthread, which shows that the partition descriptor > hack requiring ActiveSnapshot to be set results in wrong results. > Patch includes the buggy output for that test case and marked as such > in a comment above the test. > > In updated 0002, I fixed things such that the snapshot-setting > required by the partition descriptor hack is independent of > snapshot-setting of the RI query such that it no longer causes the PK > index scan to return rows that the RI query mustn't see. That fixes > the visibility bug illustrated in your example, and as mentioned, also > exercised in the new test suite. > > I also moved find_leaf_pk_rel() into execPartition.c with a new name > and a new set of parameters. > > -- > Amit Langote > EDB: http://www.enterprisedb.com Sorry for the delay. This patch no longer applies, it has some conflict with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a
Re: PublicationActions - use bit flags.
On Mon, Dec 20, 2021 at 11:19 AM Peter Smith wrote: > > For some reason the current HEAD PublicationActions is a struct of > boolean representing combinations of the 4 different "publication > actions". > > I felt it is more natural to implement boolean flag combinations using > a bitmask instead of a struct of bools. IMO using the bitmask also > simplifies assignment and checking of said flags. > > PSA a small patch for this. > > Thoughts? > +1 I think the bit flags are a more natural fit, and also the patch removes the unnecessary use of a palloc'd tiny struct to return the PublicationActions (which also currently isn't explicitly pfree'd). Regards, Greg Nancarrow Fujitsu Australia
Re: row filtering for logical replication
On Wed, Nov 24, 2021 at 3:22 PM vignesh C wrote: > > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > > > 3) Should we include row filter condition in pg_publication_tables > view like in describe publication(\dRp+) , since the prqual is not > easily readable in pg_publication_rel table: > How about exposing pubdef (or publicationdef) column via pg_publication_tables? In this, we will display the publication definition. This is similar to what we do for indexes via pg_indexes view: postgres=# select * from pg_indexes where tablename like '%t1%'; schemaname | tablename | indexname | tablespace | indexdef +---+---++--- public | t1 | idx_t1 | | CREATE INDEX idx_t1 ON public.t1 USING btree (c1) WHERE (c1 < 10) (1 row) The one advantage I see with this is that we will avoid adding additional columns for the other patches like "column filter". Also, it might be convenient for users. What do you think? -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Mon, Dec 20, 2021 at 4:13 PM Amit Kapila wrote: > > On Wed, Nov 24, 2021 at 3:22 PM vignesh C wrote: > > > > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian wrote: > > > > > > > 3) Should we include row filter condition in pg_publication_tables > > view like in describe publication(\dRp+) , since the prqual is not > > easily readable in pg_publication_rel table: > > > > How about exposing pubdef (or publicationdef) column via > pg_publication_tables? In this, we will display the publication > definition. This is similar to what we do for indexes via pg_indexes > view: > postgres=# select * from pg_indexes where tablename like '%t1%'; > schemaname | tablename | indexname | tablespace | indexdef > +---+---++--- > public | t1 | idx_t1 | | CREATE INDEX idx_t1 ON public.t1 USING btree > (c1) WHERE (c1 < 10) > (1 row) > > The one advantage I see with this is that we will avoid adding > additional columns for the other patches like "column filter". Also, > it might be convenient for users. What do you think? > I think it is a good idea, particularly since there are already some precedents. OTOH maybe there is no immediate requirement for this feature because there are already alternative ways to conveniently display the filters (e.g. psql \d+ and \dRp+). Currently, there is no pg_get_pubdef function (analogous to the index's pg_get_indexdef) so that would need to be written from scratch. So I feel this is a good feature, but it could be implemented as an independent patch in another thread. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: simplifying foreign key/RI checks
On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker wrote: > Sorry for the delay. This patch no longer applies, it has some conflict with > d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a Thanks Corey for the heads up. Rebased with some cosmetic adjustments. -- Amit Langote EDB: http://www.enterprisedb.com v11-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patch Description: Binary data v11-0002-Avoid-using-SPI-for-some-RI-checks.patch Description: Binary data
Re: In-placre persistance change of a relation
Hello, Jakub. At Fri, 17 Dec 2021 09:10:30 +, Jakub Wartak wrote in > the patch didn't apply clean (it's from March; some hunks were failing), so > I've fixed it and the combined git format-patch is attached. It did conflict > with the following: Thanks for looking this. Also thanks for Justin for finding the silly use-after-free bug. (Now I see the regression test fails and I'm not sure how come I didn't find this before.) > b0483263dda - Add support for SET ACCESS METHOD in ALTER TABLE > 7b565843a94 - Add call to object access hook at the end of table > rewrite in ALTER TABLE > 9ce346eabf3 - Report progress of startup operations that take a long > time. > f10f0ae420 - Replace RelationOpenSmgr() with RelationGetSmgr(). > > I'm especially worried if I didn't screw up something/forgot something > related to the last one (rd->rd_smgr changes), but I'm getting "All 210 tests > passed". About the last one, all rel->rd_smgr acesses need to be repalced with RelationGetSmgr(). On the other hand we can simply remove RelationOpenSmgr() calls since the target smgrrelation is guaranteed to be loaded by RelationGetSmgr(). The fix you made for RelationCreate/DropInitFork is correct and changes you made would work, but I prefer that the code not being too permissive for unknown (or unexpected) states. > Basic demonstration of this patch (with wal_level=minimal): > create unlogged table t6 (id bigint, t text); > -- produces 110GB table, takes ~5mins > insert into t6 select nextval('s1'), repeat('A', 1000) from > generate_series(1, 1); > alter table t6 set logged; > on baseline SET LOGGED takes: ~7min10s > on patched SET LOGGED takes: 25s > > So basically one can - thanks to this patch - use his application (performing > classic INSERTs/UPDATEs/DELETEs, so without the need to rewrite to use COPY) > and perform literally batch upload and then just switch the tables to LOGGED. This result is significant. That operation finally requires WAL writes but I was not sure how much gain FPIs (or bulk WAL logging) gives in comparison to operational WALs. > Some more intensive testing also looks good, assuming table prepared to put > pressure on WAL: > create unlogged table t_unlogged (id bigint, t text) partition by hash > (id); > create unlogged table t_unlogged_h0 partition of t_unlogged FOR VALUES > WITH (modulus 4, remainder 0); > [..] > create unlogged table t_unlogged_h3 partition of t_unlogged FOR VALUES > WITH (modulus 4, remainder 3); > > Workload would still be pretty heavy on LWLock/BufferContent,WALInsert and > Lock/extend . > t_logged.sql = insert into t_logged select nextval('s1'), repeat('A', > 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes > generates ~1MB of WAL > t_unlogged.sql = insert into t_unlogged select nextval('s1'), > repeat('A', 1000) from generate_series(1, 1000); # according to > pg_wal_stats.wal_bytes generates ~3kB of WAL > > so using: pgbench -f .sql -T 30 -P 1 -c 32 -j 3 t > with synchronous_commit =ON(default): > with t_logged.sql: tps = 229 (lat avg = 138ms) > with t_unlogged.sql tps = 283 (lat avg = 112ms) # almost all on > LWLock/WALWrite > with synchronous_commit =OFF: > with t_logged.sql: tps = 413 (lat avg = 77ms) > with t_unloged.sql: tps = 782 (lat avg = 40ms) > Afterwards switching the unlogged ~16GB partitions takes 5s per partition. > > As the thread didn't get a lot of traction, I've registered it into current > commitfest https://commitfest.postgresql.org/36/3461/ with You as the author > and in 'Ready for review' state. > > I think it behaves as almost finished one and apparently after reading all > those discussions that go back over 10years+ time span about this feature, > and lot of failed effort towards wal_level=noWAL I think it would be nice to > finally start getting some of that of it into the core. Thanks for taking the performance benchmark. I didn't register for some reasons. 1. I'm not sure that we want to have the new mark files. 2. Aside of possible bugs, I'm not confident that the crash-safety of this patch is actually water-tight. At least we need tests for some failure cases. 3. As mentioned in transam/README, failure in removing smgr mark files leads to immediate shut down. I'm not sure this behavior is acceptable. 4. Including the reasons above, this is not fully functionally. For example, if we execute the following commands on primary, replica dones't work correctly. (boom!) =# CREATE UNLOGGED TABLE t (a int); =# ALTER TABLE t SET LOGGED; The following fixes are done in the attched v8. - Rebased. Referring to Jakub and Justin's work, I replaced direct access to ->rd_smgr with RelationGetSmgr() and removed calls to RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN TABLESPACE SET LOGGED/UNLOGGED"
relcache not invalidated when ADD PRIMARY KEY USING INDEX
Hi, When reviewing some replica identity related patches, I found that when adding primary key using an existing unique index on not null columns, the target table's relcache won't be invalidated. This would cause an error When REPLICA IDENTITY is default and we are UPDATE/DELETE a published table , because we will refer to RelationData::rd_pkindex to check if the UPDATE or DELETE can be safety executed in this case. ---reproduction steps CREATE TABLE test(a int not null, b int); CREATE UNIQUE INDEX a ON test(a); CREATE PUBLICATION PUB for TABLE test; UPDATE test SET a = 2; ERROR: cannot update table "test" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. alter table test add primary key using index a; UPDATE test SET a = 2; ERROR: cannot update table "test" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. --- I think the bug exists in HEAD ~ PG11 after the commit(f66e8bf) which remove relhaspkey from pg_class. In PG10, when adding a primary key, it will always update the relhaspkey in pg_class which will invalidate the relcache, so it was OK. I tried to write a patch to fix this by invalidating the relcache after marking primary key in index_constraint_create(). Best regards, Hou zj 0001-Invalid-relcache-when-ADD-PRIMARY-KEY-USING-INDEX.patch Description: 0001-Invalid-relcache-when-ADD-PRIMARY-KEY-USING-INDEX.patch
Re: In-placre persistance change of a relation
At Mon, 20 Dec 2021 15:28:23 +0900 (JST), Kyotaro Horiguchi wrote in > > 4. Including the reasons above, this is not fully functionally. >For example, if we execute the following commands on primary, >replica dones't work correctly. (boom!) > >=# CREATE UNLOGGED TABLE t (a int); >=# ALTER TABLE t SET LOGGED; > - The issue 4 above is not fixed (yet). Not only for the case, RelationChangePersistence needs to send a truncate record before FPIs. If primary crashes amid of the operation, the table content will be vanish with the persistence change. That is the correct behavior. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From b28163fd7b3527e69f5b76f252891f800d7ac98c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 11 Nov 2020 21:51:11 +0900 Subject: [PATCH v9 1/2] In-place table persistence change Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data rewriting, currently it runs heap rewrite which causes large amount of file I/O. This patch makes the command run without heap rewrite. Addition to that, SET LOGGED while wal_level > minimal emits WAL using XLOG_FPI instead of massive number of HEAP_INSERT's, which should be smaller. Also this allows for the cleanup of files left behind in the crash of the transaction that created it. --- src/backend/access/rmgrdesc/smgrdesc.c | 52 +++ src/backend/access/transam/README | 8 + src/backend/access/transam/xlog.c | 17 + src/backend/catalog/storage.c | 593 +++-- src/backend/commands/tablecmds.c | 256 +-- src/backend/replication/basebackup.c | 3 +- src/backend/storage/buffer/bufmgr.c| 88 src/backend/storage/file/fd.c | 4 +- src/backend/storage/file/reinit.c | 344 ++ src/backend/storage/smgr/md.c | 93 +++- src/backend/storage/smgr/smgr.c| 32 ++ src/backend/storage/sync/sync.c| 20 +- src/bin/pg_rewind/parsexlog.c | 24 + src/common/relpath.c | 47 +- src/include/catalog/storage.h | 2 + src/include/catalog/storage_xlog.h | 42 +- src/include/common/relpath.h | 9 +- src/include/storage/bufmgr.h | 2 + src/include/storage/fd.h | 1 + src/include/storage/md.h | 8 +- src/include/storage/reinit.h | 10 +- src/include/storage/smgr.h | 17 + 22 files changed, 1465 insertions(+), 207 deletions(-) diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c index 773d57..d251f22207 100644 --- a/src/backend/access/rmgrdesc/smgrdesc.c +++ b/src/backend/access/rmgrdesc/smgrdesc.c @@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record) xlrec->blkno, xlrec->flags); pfree(path); } + else if (info == XLOG_SMGR_UNLINK) + { + xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec; + char *path = relpathperm(xlrec->rnode, xlrec->forkNum); + + appendStringInfoString(buf, path); + pfree(path); + } + else if (info == XLOG_SMGR_MARK) + { + xl_smgr_mark *xlrec = (xl_smgr_mark *) rec; + char *path = GetRelationPath(xlrec->rnode.dbNode, + xlrec->rnode.spcNode, + xlrec->rnode.relNode, + InvalidBackendId, + xlrec->forkNum, xlrec->mark); + char *action; + + switch (xlrec->action) + { + case XLOG_SMGR_MARK_CREATE: +action = "CREATE"; +break; + case XLOG_SMGR_MARK_UNLINK: +action = "DELETE"; +break; + default: +action = ""; +break; + } + + appendStringInfo(buf, "%s %s", action, path); + pfree(path); + } + else if (info == XLOG_SMGR_BUFPERSISTENCE) + { + xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec; + char *path = relpathperm(xlrec->rnode, MAIN_FORKNUM); + + appendStringInfoString(buf, path); + appendStringInfo(buf, " persistence %d", xlrec->persistence); + pfree(path); + } } const char * @@ -55,6 +98,15 @@ smgr_identify(uint8 info) case XLOG_SMGR_TRUNCATE: id = "TRUNCATE"; break; + case XLOG_SMGR_UNLINK: + id = "UNLINK"; + break; + case XLOG_SMGR_MARK: + id = "MARK"; + break; + case XLOG_SMGR_BUFPERSISTENCE: + id = "BUFPERSISTENCE"; + break; } return id; diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README index 1edc8180c1..b344bbe511 100644 --- a/src/backend/access/transam/README +++ b/src/backend/access/transam/README @@ -724,6 +724,14 @@ we must panic and abort recovery. The DBA will have to manually clean up then restart recovery. This is part of the reason for not writing a WAL entry until we've successfully done the original action. +The Smgr MARK files + + +An smgr mark file is created when a new relation file is created to +mark the relfilenode needs to be cleaned up at recovery time. In +contrast to the four actions above, failure to remove smgr mark files +will lead to data loss,
RE: In-placre persistance change of a relation
Hi Kyotaro, I'm glad you are still into this > I didn't register for some reasons. Right now in v8 there's a typo in ./src/backend/catalog/storage.c : storage.c: In function 'RelationDropInitFork': storage.c:385:44: error: expected statement before ')' token pending->unlink_forknum != INIT_FORKNUM)) <-- here, one ) too much > 1. I'm not sure that we want to have the new mark files. I can't help with such design decision, but if there are doubts maybe then add checking return codes around: a) pg_fsync() and fsync_parent_path() (??) inside mdcreatemark() b) mdunlinkmark() inside mdunlinkmark() and PANIC if something goes wrong? > 2. Aside of possible bugs, I'm not confident that the crash-safety of > this patch is actually water-tight. At least we need tests for some > failure cases. > > 3. As mentioned in transam/README, failure in removing smgr mark files >leads to immediate shut down. I'm not sure this behavior is acceptable. Doesn't it happen for most of the stuff already? There's even data_sync_retry GUC. > 4. Including the reasons above, this is not fully functionally. >For example, if we execute the following commands on primary, >replica dones't work correctly. (boom!) > >=# CREATE UNLOGGED TABLE t (a int); >=# ALTER TABLE t SET LOGGED; > > > The following fixes are done in the attched v8. > > - Rebased. Referring to Jakub and Justin's work, I replaced direct > access to ->rd_smgr with RelationGetSmgr() and removed calls to > RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN > TABLESPACE SET LOGGED/UNLOGGED" statement part. > > - Fixed RelationCreate/DropInitFork's behavior for non-target > relations. (From Jakub's work). > > - Fixed wording of some comments. > > - As revisited, I found a bug around recovery. If the logged-ness of a > relation gets flipped repeatedly in a transaction, duplicate > pending-delete entries are accumulated during recovery and work in a > wrong way. sgmr_redo now adds up to one entry for a action. > > - The issue 4 above is not fixed (yet). Thanks again, If you have any list of crush tests ideas maybe I'll have some minutes to try to figure them out. Is there is any goto list of stuff to be checked to add confidence to this patch (as per point #2) ? BTW fast feedback regarding that ALTER patch (there were 4 unlogged tables): # ALTER TABLE ALL IN TABLESPACE tbs1 set logged; WARNING: unrecognized node type: 349 -J.