date:20211219

Re: Schema variables - new implementation for Postgres 15

2021-12-19 Thread Erik Rijkers


> [0001-schema-variables-20211219-2.patch]
> [0002-schema-variables-20211219-2.patch]

Hi Pavel,

You said earlier
> 1. The schema variables were renamed to session variable

But I still see:
$ grep -Eic 'schema variable' postgres.html
15

(postgres.html from 'make postgres.html')

So that rename doesn't seem finished.


Erik

postgres_fdw: incomplete subabort cleanup of connections used in async execution

2021-12-19 Thread Etsuro Fujita

Hi,

While working on [1], I noticed $SUBJECT: postgres_fdw resets the
per-connection states of connections, which store async requests sent
to remote servers in async_capable mode, during post-abort
(pgfdw_xact_callback()), but it fails to do so during post-subabort
(pgfdw_subxact_callback()).  This causes a crash when re-executing a
query that was aborted in a subtransaction:

postgres=# create table t (a text, b text);
postgres=# create or replace function slow_data (name text, duration
float) returns setof t as $$ begin perform pg_sleep(duration); return
query select name, generate_series(1, 1000)::text; end; $$ language
plpgsql;
postgres=# create view v1 as select * from slow_data('foo', 2.5);
postgres=# create view v2 as select * from slow_data('bar', 5.0);
postgres=# create extension postgres_fdw;
postgres=# create server loopback1 foreign data wrapper postgres_fdw
options (dbname 'postgres', async_capable 'true');
postgres=# create server loopback2 foreign data wrapper postgres_fdw
options (dbname 'postgres', async_capable 'true');
postgres=# create user mapping for current_user server loopback1;
postgres=# create user mapping for current_user server loopback2;
postgres=# create foreign table list_p1 (a text, b text) server
loopback1 options (table_name 'v1');
postgres=# create foreign table list_p2 (a text, b text) server
loopback2 options (table_name 'v2');
postgres=# create table list_pt (a text, b text) partition by list (a);
postgres=# alter table list_pt attach partition list_p1 for values in ('foo');
postgres=# alter table list_pt attach partition list_p2 for values in ('bar');

postgres=# begin;
BEGIN
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# select count(*) from list_pt;
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=!# rollback to savepoint s1;
ROLLBACK
postgres=*# select count(*) from list_pt;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

When canceling the SELECT, an async request sent to a remote server
using a connection is canceled, but it’s stored in the per-connection
state of the connection even after the failed subtransaction for the
reason above, so when re-executing the SELECT, postgres_fdw processes
the invalid async request to re-use the connection in GetConnection(),
causing a segmentation fault.  This would be my oversight in commit
27e1f1456.  :-(

To fix, I modified pgfdw_abort_cleanup() to reset the per-connection
state in the post-subabort case as well.  Also, I modified the
initialization so that it’s done only if necessary, to save cycles,
and improved a comment on the initialization a bit.  Attached is a
patch for that.

Best regards,
Etsuro Fujita


fix-postgres-fdw-subabort-cleanup.patch
Description: Binary data

sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-19 Thread Justin Pryzby

I reduced the problematic query to this.

SELECT 1 FROM pg_rewrite WHERE
pg_get_function_arg_default(ev_class, 1) !~~
pg_get_expr(ev_qual, ev_class, false);

#0  pg_re_throw () at elog.c:1800
#1  0x563f5d027932 in errfinish () at elog.c:593
#2  0x563f5cb874ee in resolve_special_varno (node=0x563f5dd0f7e0, 
context=0x7ffcf0daf250, callback=0x563f5cfca270 , 
callback_arg=0x0) at ruleutils.c:7319
#3  0x563f5cfca044 in get_variable () at ruleutils.c:7086
#4  0x563f5cfc7c58 in get_rule_expr () at ruleutils.c:8363
#5  0x563f5cfc97a6 in get_oper_expr (context=0x7ffcf0daf250, 
expr=0x563f5dd0f6f0) at ruleutils.c:9626
#6  get_rule_expr () at ruleutils.c:8472
#7  0x563f5cfcdc37 in deparse_expression_pretty 
(expr=expr@entry=0x563f5dd0f6f0, dpcontext=0x563f5dd10488, 
forceprefix=forceprefix@entry=false, showimplicit=showimplicit@entry=false, 
prettyFlags=prettyFlags@entry=2, startIndent=0) at ruleutils.c:3558
#8  0x563f5cfce661 in pg_get_expr_worker (expr=, 
relid=12104, relname=0x563f5dd10130 "pg_settings", prettyFlags=2) at 
ruleutils.c:2645
#9  0x563f5cd6540b in ExecInterpExpr () at execExprInterp.c:1272
#10 0x563f5cd73c5f in ExecEvalExprSwitchContext (isNull=0x7ffcf0daf3a7, 
econtext=0x563f5dd08a00, state=0x563f5dd0a270) at 
../../../src/include/executor/executor.h:339
#11 ExecQual (econtext=0x563f5dd08a00, state=0x563f5dd0a270) at 
../../../src/include/executor/executor.h:408
#12 ExecScan (node=0x563f5dd09328, accessMtd=0x563f5cd9e790 , 
recheckMtd=0x563f5cd9e780 ) at execScan.c:227
#13 0x563f5cd69f73 in ExecProcNode (node=0x563f5dd09328) at 
../../../src/include/executor/executor.h:257
#14 ExecutePlan (execute_once=, dest=0x563f5dd18a80, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x563f5dd09328, 
estate=0x563f5dd08790) at execMain.c:1600
#15 standard_ExecutorRun () at execMain.c:410
#16 0x563f5cf0460f in PortalRunSelect () at pquery.c:924
#17 0x563f5cf05bf1 in PortalRun () at pquery.c:768
#18 0x563f5cf019b2 in exec_simple_query () at postgres.c:1215
#19 0x563f5cf0370a in PostgresMain () at postgres.c:4498
#20 0x563f5ce6e479 in BackendRun (port=, port=) at postmaster.c:4594
#21 BackendStartup (port=) at postmaster.c:4322
#22 ServerLoop () at postmaster.c:1802
#23 0x563f5ce6f47c in PostmasterMain () at postmaster.c:1474
#24 0x563f5cb9a0c0 in main (argc=5, argv=0x563f5dc653f0) at main.c:198

While reducing the query, I got a related error:

SELECT 1 FROM pg_rewrite WHERE
pg_get_function_arg_default(ev_class, 1) !~~
pg_get_expr(ev_qual, 0, false);

ERROR:  XX000: bogus varlevelsup: 0 offset 0
LOCATION:  get_variable, ruleutils.c:7003

Both errors are reproducible back to at least v10.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Tom Lane

I wrote:
> This led me to wonder why we couldn't get rid of that entire
> mechanism in favor of some less-painful way of getting that
> information into the scripts.  If we had the desired values in
> psql variables, we could do what we need easily, for example ...

Here's some fleshed-out patches for this.

0001 adds the \getenv command to psql; now with documentation
and a simple regression test.

0002 tweaks pg_regress to export the needed values as environment
variables, and modifies the test scripts to use those variables.
(For ease of review, this patch modifies the scripts in-place,
and then 0003 will move them.)  A few comments on this:

* I didn't see any value in exporting @testtablespace@ as a separate
variable; we might as well let the test script know how to construct
that path name.

* I concluded that the right way to handle the concatenation issue
is *not* to rely on SQL literal concatenation, but to use psql's
\set command to concatenate parts of a string.  In particular this
gives us a clean way to handle quoting/escaping rules in the places
where a pathname has to be embedded in some larger string, such as
a function body.  The golden rule for that seems to be "use one \set
per level of quoting".  I believe this code is now fairly proof
against situations that would completely break the existing way of
doing things, such as pathnames with quotes or backslashes in them.
(It's hard to test the embedded-quote case, because that breaks the
Makefiles too, but I did get through the regression tests with a
path including a backslash.)

* There are a couple of places where the existing tests involve
substituting a path name into expected query output or error messages.
This technique cannot handle that, but we have plenty of prior art for
dealing with such cases.  I changed file_fdw to use a filter function
to hide the pathnames in EXPLAIN output, and tweaked create_function_0
to show only an edited version of an error message (this is based on a
similar case in infinite_recurse.sql).

0003 simply "git mv"'s the scripts and output files into place as
normal not-requiring-editing files.  Be careful to "make clean"
before applying this, else you may have conflicts with the target
files already being present.  Also, while you can run the tests
between 0003 and 0004, don't do "make clean" in this state or the
hacky EXTRA_CLEAN rules in dblink and file_fdw will remove files
you want.

0004 finally removes the no-longer-needed infrastructure in
pg_regress and the makefiles.  (BTW, as far as I can find, the
MSVC scripts have no provisions for cleaning these generated files?)

There's some refactoring that could be done afterwards, for example
there seems little reason for dblink's paths.sql to continue to exist
as a separate script.  But it seemed best for this patch series to
convert the scripts as mechanically as possible.

I'm fairly pleased with how this came out.  I think these scripts
will be *much* easier to maintain in this form.  Updating the
output/*.source files was always a major pain in the rear, since
you couldn't just copy results/ files to them.

Comments?

regards, tom lane

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 48248f750e..ae38d3dcc3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2237,6 +2237,28 @@ Tue Oct 26 21:40:57 CEST 1999

+  
+\getenv psql_var env_var
+
+
+
+ Gets the value of the environment
+ variable env_var
+ and assigns it to the psql
+ variable psql_var.
+ If env_var is
+ not defined in the psql process's
+ environment, psql_var
+ is not changed.  Example:
+
+=> \getenv home HOME
+=> \echo :home
+/home/postgres
+
+
+  
+
+

 \gexec

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index ccd7b48108..fb3bab9494 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -98,6 +98,8 @@ static backslashResult process_command_g_options(char *first_option,
  bool active_branch,
  const char *cmd);
 static backslashResult exec_command_gdesc(PsqlScanState scan_state, bool active_branch);
+static backslashResult exec_command_getenv(PsqlScanState scan_state, bool active_branch,
+		   const char *cmd);
 static backslashResult exec_command_gexec(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_gset(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_help(PsqlScanState scan_state, bool active_branch);
@@ -348,6 +350,8 @@ exec_command(const char *cmd,
 		status = exec_command_g(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "gdesc") == 0)
 		status = exec_command_gdesc(scan_state, active_branch);
+	else if (strcmp(cmd, "getenv") == 0)
+		status = exec_command_getenv(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "gexe

Re: sqlsmith: ERROR: XX000: bogus varno: 2

2021-12-19 Thread Tom Lane

Justin Pryzby  writes:
> I reduced the problematic query to this.
> SELECT 1 FROM pg_rewrite WHERE
> pg_get_function_arg_default(ev_class, 1) !~~
> pg_get_expr(ev_qual, ev_class, false);

Or more simply,

regression=# select pg_get_expr(ev_qual, ev_class, false) from pg_rewrite where 
rulename = 'pg_settings_u';
ERROR:  bogus varno: 2

I don't see anything particularly surprising here.  pg_get_expr is only
able to cope with expression trees over a single relation, but ON UPDATE
rules can refer to both OLD and NEW relations.  Maybe we could make the
error message more friendly, but there's not much else to be done,
I think.

regards, tom lane

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

>
>
> 0001 adds the \getenv command to psql; now with documentation
> and a simple regression test.
>

+1. Wish I had added this years ago when I had a need for it.


>
> 0002 tweaks pg_regress to export the needed values as environment
> variables, and modifies the test scripts to use those variables.
> (For ease of review, this patch modifies the scripts in-place,
> and then 0003 will move them.)  A few comments on this:
>
> * I didn't see any value in exporting @testtablespace@ as a separate
> variable; we might as well let the test script know how to construct
> that path name.
>
> * I concluded that the right way to handle the concatenation issue
> is *not* to rely on SQL literal concatenation, but to use psql's
> \set command to concatenate parts of a string.  In particular this
>

+1 to that, much better than the multi-line thing.

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var and
which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.


>
> 0004 finally removes the no-longer-needed infrastructure in
>

+1
Deleted code is debugged code.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Tom Lane

Corey Huinker  writes:
> I have a nitpick about the \getenv FOO FOO lines.
> It's a new function to everyone, and to anyone who hasn't seen the
> documentation it won't be immediately obvious which one is the ENV var and
> which one is the local var. Lowercasing the local var would be a way to
> reinforce which is which to the reader. It would also be consistent with
> var naming in the rest of the script.

Reasonable idea.  Another thing I was wondering about was whether
to attach PG_ prefixes to the environment variable names, since
those are in a more-or-less global namespace.  If we do that,
then a different method for distinguishing the psql variables
is to not prefix them.

regards, tom lane

Re: row filtering for logical replication

2021-12-19 Thread Peter Smith

On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila  wrote:
>
> On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow  wrote:
> >
> > On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian  wrote:
> > >
> > > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow  
> > > wrote:
> > >
> > > > So using the v47 patch-set, I still find that the UPDATE above results 
> > > > in publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to 
> > > > (2,1).
> > > > This is according to the 2nd UPDATE rule below, from patch 0003.
> > > >
> > > > + * old-row (no match)new-row (no match)  -> (drop change)
> > > > + * old-row (no match)new row (match) -> INSERT
> > > > + * old-row (match)   new-row (no match)  -> DELETE
> > > > + * old-row (match)   new row (match) -> UPDATE
> > > >
> > > > This is because the old row (1,1) doesn't match the UPDATE filter 
> > > > "(a>1)", but the new row (2,1) does.
> > > > This functionality doesn't seem right to me. I don't think it can be 
> > > > assumed that (1,1) was never published (and thus requires an INSERT 
> > > > rather than UPDATE) based on these checks, because in this example, 
> > > > (1,1) was previously published via a different operation - INSERT (and 
> > > > using a different filter too).
> > > > I think the fundamental problem here is that these UPDATE rules assume 
> > > > that the old (current) row was previously UPDATEd (and published, or 
> > > > not published, according to the filter applicable to UPDATE), but this 
> > > > is not necessarily the case.
> > > > Or am I missing something?
> > >
> > > But it need not be correct in assuming that the old-row was part of a
> > > previous INSERT either (and published, or not published according to
> > > the filter applicable to an INSERT).
> > > For example, change the sequence of inserts and updates prior to the
> > > last update:
> > >
> > > truncate tbl1 ;
> > > insert into tbl1 values (1,5); ==> not replicated since insert and ! (b < 
> > > 2);
> > > update tbl1 set b = 1; ==> not replicated since update and ! (a > 1)
> > > update tbl1 set a = 2; ==> replicated and update converted to insert
> > > since (a > 1)
> > >
> > > In this case, the last update "update tbl1 set a = 2; " is updating a
> > > row that was previously updated and not inserted and not replicated to
> > > the subscriber.
> > > How does the replication logic differentiate between these two cases,
> > > and decide if the update was previously published or not?
> > > I think it's futile for the publisher side to try and figure out the
> > > history of published rows. In fact, if this level of logic is required
> > > then it is best implemented on the subscriber side, which then defeats
> > > the purpose of a publication filter.
> > >
> >
> > I think it's a concern, for such a basic example with only one row,
> > getting unpredictable (and even wrong) replication results, depending
> > upon the order of operations.
> >
>
> I am not sure how we can deduce that. The results are based on current
> and new values of row which is what I think we are expecting here.
>
> > Doesn't this problem result from allowing different WHERE clauses for
> > different pubactions for the same table?
> > My current thoughts are that this shouldn't be allowed, and also WHERE
> > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > using only columns covered by the replica identity or primary key.
> >
>
> Hmm, even if we do that one could have removed the insert row filter
> by the time we are evaluating the update. So, we will get the same
> result. I think the behavior in your example is as we expect as per
> the specs defined by the patch and I don't see any problem, in this
> case, w.r.t replication results. Let us see what others think on this?
>

I think currently there could be a problem with user perceptions. IMO
a user would be mostly interested in predictability and getting
results that are intuitive.

So, even if all strange results can (after careful examination) be
after-the-fact explained away as being "correct" according to a spec,
I don't think that is going to make any difference. e.g. regardless of
correctness, even if it just "appeared" to give unexpected results
then a user may just decide that row-filtering is not worth their
confusion...

Perhaps there is a slightly dumbed-down RF design that can still be
useful, but which can give much more comfort to the user because the
replica will be more like what they were expecting?

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

On Sun, Dec 19, 2021 at 5:48 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > I have a nitpick about the \getenv FOO FOO lines.
> > It's a new function to everyone, and to anyone who hasn't seen the
> > documentation it won't be immediately obvious which one is the ENV var
> and
> > which one is the local var. Lowercasing the local var would be a way to
> > reinforce which is which to the reader. It would also be consistent with
> > var naming in the rest of the script.
>
> Reasonable idea.  Another thing I was wondering about was whether
> to attach PG_ prefixes to the environment variable names, since
> those are in a more-or-less global namespace.  If we do that,
> then a different method for distinguishing the psql variables
> is to not prefix them.

+1 to that as well.

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]

(alternate names: \getenv_multi \getenv_pattern, \getenvs, etc)

which could be used like

\importenv PG* env_

which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward
names but leaving no doubt about where a previously unreferenced variable
came from.

I don't *think* we need it for this specific case, but since the subject of
env vars has come up I thought I'd throw it out there.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Tom Lane

Corey Huinker  writes:
> Which brings up a tangential question, is there value in having something
> that brings in one or more env vars as psql vars directly. I'm thinking
> something like:

> \importenv pattern [prefix]

Meh ... considering we've gone this long without any getenv capability
in psql at all, that seems pretty premature.

regards, tom lane

PublicationActions - use bit flags.

2021-12-19 Thread Peter Smith

For some reason the current HEAD PublicationActions is a struct of
boolean representing combinations of the 4 different "publication
actions".

I felt it is more natural to implement boolean flag combinations using
a bitmask instead of a struct of bools. IMO using the bitmask also
simplifies assignment and checking of said flags.

PSA a small patch for this.

Thoughts?

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-PublicationActions-use-bit-flags.patch
Description: Binary data

Re: row filtering for logical replication

2021-12-19 Thread Greg Nancarrow

On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila  wrote:
>
> >
> > I think it's a concern, for such a basic example with only one row,
> > getting unpredictable (and even wrong) replication results, depending
> > upon the order of operations.
> >
>
> I am not sure how we can deduce that. The results are based on current
> and new values of row which is what I think we are expecting here.
>

In the two simple cases presented, the publisher ends up with the same
single row (2,1) in both cases, but in one of the cases the subscriber
ends up with an extra row (1,1) that the publisher doesn't have. So,
in using a "filter", a new row has been published that the publisher
doesn't have. I'm not so sure a user would be expecting that. Not to
mention that if (1,1) is subsequently INSERTed on the publisher side,
it will result in a duplicate key error on the publisher.

> > Doesn't this problem result from allowing different WHERE clauses for
> > different pubactions for the same table?
> > My current thoughts are that this shouldn't be allowed, and also WHERE
> > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > using only columns covered by the replica identity or primary key.
> >
>
> Hmm, even if we do that one could have removed the insert row filter
> by the time we are evaluating the update. So, we will get the same
> result. I think the behavior in your example is as we expect as per
> the specs defined by the patch and I don't see any problem, in this
> case, w.r.t replication results. Let us see what others think on this?
>

Here I'm talking about the typical use-case of setting the
row-filtering WHERE clause up-front and not changing it thereafter.
I think that dynamically changing filters after INSERT/UPDATE/DELETE
operations is not the typical use-case, and IMHO it's another thing
entirely (could result in all kinds of unpredictable, random results).

Personally I think it would make more sense to:
1) Disallow different WHERE clauses on the same table, for different pubactions.
2) If only INSERTs are being published, allow any column in the WHERE
clause, otherwise (as for UPDATE and DELETE) restrict the referenced
columns to be part of the replica identity or primary key.

Regards,
Greg Nancarrow
Fujitsu Australia

RE: row filtering for logical replication

2021-12-19 Thread houzj.f...@fujitsu.com



> -Original Message-
> From: Amit Kapila 
On Saturday, December 18, 2021 10:33 AM
> On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow 
> wrote:
> >
> > On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian  wrote:
> > >
> > > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow 
> wrote:
> > >
> > > > So using the v47 patch-set, I still find that the UPDATE above results 
> > > > in
> publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to (2,1).
> > > > This is according to the 2nd UPDATE rule below, from patch 0003.
> > > >
> > > > + * old-row (no match)new-row (no match)  -> (drop change)
> > > > + * old-row (no match)new row (match) -> INSERT
> > > > + * old-row (match)   new-row (no match)  -> DELETE
> > > > + * old-row (match)   new row (match) -> UPDATE
> > > >
> > > > This is because the old row (1,1) doesn't match the UPDATE filter 
> > > > "(a>1)",
> but the new row (2,1) does.
> > > > This functionality doesn't seem right to me. I don't think it can be
> assumed that (1,1) was never published (and thus requires an INSERT rather 
> than
> UPDATE) based on these checks, because in this example, (1,1) was previously
> published via a different operation - INSERT (and using a different filter 
> too).
> > > > I think the fundamental problem here is that these UPDATE rules assume
> that the old (current) row was previously UPDATEd (and published, or not
> published, according to the filter applicable to UPDATE), but this is not
> necessarily the case.
> > > > Or am I missing something?
> > >
> > > But it need not be correct in assuming that the old-row was part of
> > > a previous INSERT either (and published, or not published according
> > > to the filter applicable to an INSERT).
> > > For example, change the sequence of inserts and updates prior to the
> > > last update:
> > >
> > > truncate tbl1 ;
> > > insert into tbl1 values (1,5); ==> not replicated since insert and !
> > > (b < 2); update tbl1 set b = 1; ==> not replicated since update and
> > > ! (a > 1) update tbl1 set a = 2; ==> replicated and update converted
> > > to insert since (a > 1)
> > >
> > > In this case, the last update "update tbl1 set a = 2; " is updating
> > > a row that was previously updated and not inserted and not
> > > replicated to the subscriber.
> > > How does the replication logic differentiate between these two
> > > cases, and decide if the update was previously published or not?
> > > I think it's futile for the publisher side to try and figure out the
> > > history of published rows. In fact, if this level of logic is
> > > required then it is best implemented on the subscriber side, which
> > > then defeats the purpose of a publication filter.
> > >
> >
> > I think it's a concern, for such a basic example with only one row,
> > getting unpredictable (and even wrong) replication results, depending
> > upon the order of operations.
> >
> 
> I am not sure how we can deduce that. The results are based on current and
> new values of row which is what I think we are expecting here.
> 
> > Doesn't this problem result from allowing different WHERE clauses for
> > different pubactions for the same table?
> > My current thoughts are that this shouldn't be allowed, and also WHERE
> > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > using only columns covered by the replica identity or primary key.
> >
> 
> Hmm, even if we do that one could have removed the insert row filter by the
> time we are evaluating the update. So, we will get the same result. I think 
> the
> behavior in your example is as we expect as per the specs defined by the patch
> and I don't see any problem, in this case, w.r.t replication results. Let us 
> see
> what others think on this?

I think it might not be hard to predict the current behavior. User only need to 
be
aware of that:
1) pubaction and row filter on different publications are combined with 'OR'.
2) FOR UPDATE, we execute the fiter for both OLD and NEW tuple and would change
   the operation type accordingly.

For the example mentioned:
create table tbl1 (a int primary key, b int);
create publication A for table tbl1 where (b<2) with(publish='insert');
create publication B for table tbl1 where (a>1) with(publish='update');

If we follow the rule 1) and 2), I feel we are able to predict the following
conditions:
--
WHERE (action = 'insert' AND b < 2) OR (action = 'update' AND a > 1)
--

So, it seems acceptable to me.

Personally, I think the current design could give user more flexibility to
handle some complex scenario. If user want some simple setting for publication,
they can also set same row filter for the same table in different publications.
To avoid confusion, I think we can document about these rules clearly.

BTW, From the document of IBM, I think IBM also support this kind of complex
condition [1].
[1] https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables

Best regards,
Hou zj

Re: parallel vacuum comments

2021-12-19 Thread Masahiko Sawada

On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila  wrote:
>
> On Fri, Dec 17, 2021 at 10:51 AM Masahiko Sawada  
> wrote:
> >
> > I've attached updated patches. The first patch just moves common
> > function for index bulk-deletion and cleanup to vacuum.c. And the
> > second patch moves parallel vacuum code to vacuumparallel.c. The
> > comments I got so far are incorporated into these patches. Please
> > review them.
> >
>
> Thanks, it is helpful for the purpose of review.
>
> Few comments:
> =
> 1.
> + * dead_items stores TIDs whose index tuples are deleted by index vacuuming.
> + * Each TID points to an LP_DEAD line pointer from a heap page that has been
> + * processed by lazy_scan_prune.  Also needed by lazy_vacuum_heap_rel, which
> + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap pass.
>   */
> - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
> + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
>
> Isn't it better to keep these comments atop the structure VacDeadItems
> declaration?

I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
VacDeadItems to vacuum.c, I thought it's better to keep it as generic
TID storage.

>
> 2. What is the reason for not moving
> lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
> can be called from vacuumlazy.c and vacuumparallel.c? Without this
> refactoring patch, I think both leader and workers set the same error
> context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
> vacuuming? Is it because you want a separate context phase for a
> parallel vacuum?

Since the phases defined as VacErrPhase like
VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
and error callback function, vacuum_error_callback(), are specific to
heap, I thought it'd not be a good idea to move
lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
vacuumparallel.c can use the phases and error callback function.

> The other thing related to this is that if we have to
> do the way you have it here then we don't need pg_rusage_init() in
> functions lazy_vacuum_one_index/lazy_cleanup_one_index.

Right. It should be removed.

>
> 3.
> @@ -3713,7 +3152,7 @@ update_index_statistics(LVRelState *vacrel)
>   int nindexes = vacrel->nindexes;
>   IndexBulkDeleteResult **indstats = vacrel->indstats;
>
> - Assert(!IsInParallelMode());
> + Assert(!ParallelVacuumIsActive(vacrel));
>
> I think we can retain the older Assert. If we do that then I think we
> don't need to define ParallelVacuumIsActive in vacuumlazy.c.

Right, will fix in the next version patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: [PATCH] pg_stat_toast

2021-12-19 Thread kuroda.hay...@fujitsu.com

Dear Gunnar,

> postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
> postgres=# INSERT INTO test  SELECT
> i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
> generate_series(0,10) x(i);
> postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
> -[ RECORD 1 ]+--
> schemaname   | public
> reloid   | 16829
> attnum   | 2
> relname  | test
> attname  | lz4
> externalizations | 0
> compressions | 11
> compressionsuccesses | 11
> compressionsizesum   | 6299710
> originalsizesum  | 320403204
> -[ RECORD 2 ]+--
> schemaname   | public
> reloid   | 16829
> attnum   | 3
> relname  | test
> attname  | std
> externalizations | 0
> compressions | 11
> compressionsuccesses | 11
> compressionsizesum   | 8198819
> originalsizesum  | 320403204

I'm not sure about TOAST, but currently compressions are configurable:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bbe0a81db69bd10bd166907c3701492a29aca294

How about adding a new attribute "method" to pg_stat_toast?
ToastAttrInfo *attr->tai_compression represents how compress the data,
so I think it's easy to add.
Or, is it not needed because pg_attr has information?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: row filtering for logical replication

2021-12-19 Thread tanghy.f...@fujitsu.com

On Wednesday, December 8, 2021 2:29 PM Amit Kapila  
wrote:
> 
> On Mon, Dec 6, 2021 at 6:04 PM Euler Taveira  wrote:
> >
> > On Mon, Dec 6, 2021, at 3:35 AM, Dilip Kumar wrote:
> >
> > On Mon, Dec 6, 2021 at 6:49 AM Euler Taveira  wrote:
> > >
> > > On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> > >
> > > PS> I will update the commit message in the next version. I barely 
> > > changed the
> > > documentation to reflect the current behavior. I probably missed some
> changes
> > > but I will fix in the next version.
> > >
> > > I realized that I forgot to mention a few things about the UPDATE 
> > > behavior.
> > > Regardless of 0003, we need to define which tuple will be used to 
> > > evaluate the
> > > row filter for UPDATEs. We already discussed it circa [1]. This current 
> > > version
> > > chooses *new* tuple. Is it the best choice?
> >
> > But with 0003, we are using both the tuple for evaluating the row
> > filter, so instead of fixing 0001, why we don't just merge 0003 with
> > 0001?  I mean eventually, 0003 is doing what is the agreed behavior,
> > i.e. if just OLD is matching the filter then convert the UPDATE to
> > DELETE OTOH if only new is matching the filter then convert the UPDATE
> > to INSERT.  Do you think that even we merge 0001 and 0003 then also
> > there is an open issue regarding which row to select for the filter?
> >
> > Maybe I was not clear. IIUC we are still discussing 0003 and I would like to
> > propose a different default based on the conclusion I came up. If we merged
> > 0003, that's fine; this change will be useless. If we don't or it is 
> > optional,
> > it still has its merit.
> >
> > Do we want to pay the overhead to evaluating both tuple for UPDATEs? I'm 
> > still
> > processing if it is worth it. If you think that in general the row filter
> > contains the primary key and it is rare to change it, it will waste cycles
> > evaluating the same expression twice. It seems this behavior could be
> > controlled by a parameter.
> >
> 
> I think the first thing we should do in this regard is to evaluate the
> performance for both cases (when we apply a filter to both tuples vs.
> to one of the tuples). In case the performance difference is
> unacceptable, I think it would be better to still compare both tuples
> as default to avoid data inconsistency issues and have an option to
> allow comparing one of the tuples.
> 

I did some performance tests to see if 0003 patch has much overhead.
With which I compared applying first two patches and applying first three 
patches in four cases:
1) only old rows match the filter.
2) only new rows match the filter.
3) both old rows and new rows match the filter.
4) neither old rows nor new rows match the filter.

0003 patch checks both old rows and new rows, and without 0003 patch, it only
checks either old or new rows. We want to know whether it would take more time
if we check the old rows.

I ran the tests in asynchronous mode and compared the SQL execution time. I also
tried some complex filters, to see if the difference could be more obvious.

The result and the script are attached.
I didn’t see big difference between the result of applying 0003 patch and the
one not in all cases. So I think 0003 patch doesn’t have much overhead.

Regards,
Tang


perf.sh
Description: perf.sh

Re: row filtering for logical replication

2021-12-19 Thread Amit Kapila

On Mon, Dec 20, 2021 at 6:07 AM Greg Nancarrow  wrote:
>
> On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila  wrote:
> >
> > >
> > > I think it's a concern, for such a basic example with only one row,
> > > getting unpredictable (and even wrong) replication results, depending
> > > upon the order of operations.
> > >
> >
> > I am not sure how we can deduce that. The results are based on current
> > and new values of row which is what I think we are expecting here.
> >
>
> In the two simple cases presented, the publisher ends up with the same
> single row (2,1) in both cases, but in one of the cases the subscriber
> ends up with an extra row (1,1) that the publisher doesn't have. So,
> in using a "filter", a new row has been published that the publisher
> doesn't have. I'm not so sure a user would be expecting that. Not to
> mention that if (1,1) is subsequently INSERTed on the publisher side,
> it will result in a duplicate key error on the publisher.
>

Personally, I feel users need to be careful in defining publications
and subscriptions, otherwise, there are various ways "duplicate key
error .." kind of issues can arise. Say, you different publications
which publish the same table, and then you have different
subscriptions on the subscriber which subscribe to those publications.

> > > Doesn't this problem result from allowing different WHERE clauses for
> > > different pubactions for the same table?
> > > My current thoughts are that this shouldn't be allowed, and also WHERE
> > > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > > using only columns covered by the replica identity or primary key.
> > >
> >
> > Hmm, even if we do that one could have removed the insert row filter
> > by the time we are evaluating the update. So, we will get the same
> > result. I think the behavior in your example is as we expect as per
> > the specs defined by the patch and I don't see any problem, in this
> > case, w.r.t replication results. Let us see what others think on this?
> >
>
> Here I'm talking about the typical use-case of setting the
> row-filtering WHERE clause up-front and not changing it thereafter.
> I think that dynamically changing filters after INSERT/UPDATE/DELETE
> operations is not the typical use-case, and IMHO it's another thing
> entirely (could result in all kinds of unpredictable, random results).
>

Yeah, that's what I also wanted to say that but users need to
carefully define publications/subscriptions, otherwise, with up-front
definition also leads to unpredictable results as shared in the
explanation above. I feel Hou-San's latest email [1] explains the
current rules very well and maybe we should document them in some way
to avoid confusion.

> Personally I think it would make more sense to:
> 1) Disallow different WHERE clauses on the same table, for different 
> pubactions.
> 2) If only INSERTs are being published, allow any column in the WHERE
> clause, otherwise (as for UPDATE and DELETE) restrict the referenced
> columns to be part of the replica identity or primary key.
>

We can restrict in some way like you are saying or we can even
restrict such that we "disallow specifying row filters unless
pubactions have all the dml operations and allow row filter to have
columns that are part of replica identity or primary key". I feel it
is better to provide flexibility as the current patch does and
document it to make users aware of the kind of problems that can arise
with the wrong usage.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB57168F4384D50656A4FC2DC5947B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2021-12-19 Thread Ajin Cherian

On Mon, Dec 20, 2021 at 12:51 PM houzj.f...@fujitsu.com
 wrote:
>
> I think it might not be hard to predict the current behavior. User only need 
> to be
> aware of that:
> 1) pubaction and row filter on different publications are combined with 'OR'.
> 2) FOR UPDATE, we execute the fiter for both OLD and NEW tuple and would 
> change
>the operation type accordingly.
>
> For the example mentioned:
> create table tbl1 (a int primary key, b int);
> create publication A for table tbl1 where (b<2) with(publish='insert');
> create publication B for table tbl1 where (a>1) with(publish='update');
>
> If we follow the rule 1) and 2), I feel we are able to predict the following
> conditions:
> --
> WHERE (action = 'insert' AND b < 2) OR (action = 'update' AND a > 1)
> --
>
> So, it seems acceptable to me.
>
> Personally, I think the current design could give user more flexibility to
> handle some complex scenario. If user want some simple setting for 
> publication,
> they can also set same row filter for the same table in different 
> publications.
> To avoid confusion, I think we can document about these rules clearly.
>
> BTW, From the document of IBM, I think IBM also support this kind of complex
> condition [1].
> [1] https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables

Yes, I agree with this. It's better to give users more flexibility
while warning him on what the consequences are rather than restricting
him with constraints.
We could explain this in the documentation so that users can better
predict the effect of having pubaction specific filters.

regards,
Ajin Cherian
Fujitsu Australia

RE: Confused comment about drop replica identity index

2021-12-19 Thread wangw.f...@fujitsu.com

On Tue, Dec 16, 2021 at 10:27AM, Michael Paquier wrote:
> On Tue, Dec 16, 2021 at 06:40AM, Michael Paquier wrote:
> > Would you like to write a patch to address all that?
> 
> OK, I will push it soon.


Here is a patch to correct wrong comment about REPLICA_IDENTITY_INDEX, And 
improve the pg-doc.


Regards,
Wang wei


Correct-wrong-comment-about-REPLICA_IDENTITY_INDEX.patch
Description: Correct-wrong-comment-about-REPLICA_IDENTITY_INDEX.patch

Re: parallel vacuum comments

2021-12-19 Thread Amit Kapila

On Mon, Dec 20, 2021 at 8:33 AM Masahiko Sawada  wrote:
>
> On Sat, Dec 18, 2021 at 3:38 PM Amit Kapila  wrote:
> >
> > Few comments:
> > =
> > 1.
> > + * dead_items stores TIDs whose index tuples are deleted by index 
> > vacuuming.
> > + * Each TID points to an LP_DEAD line pointer from a heap page that has 
> > been
> > + * processed by lazy_scan_prune.  Also needed by lazy_vacuum_heap_rel, 
> > which
> > + * marks the same LP_DEAD line pointers as LP_UNUSED during second heap 
> > pass.
> >   */
> > - LVDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
> > + VacDeadItems *dead_items; /* TIDs whose index tuples we'll delete */
> >
> > Isn't it better to keep these comments atop the structure VacDeadItems
> > declaration?
>
> I think LP_DEAD and LP_UNUSED stuff are specific to heap. Given moving
> VacDeadItems to vacuum.c, I thought it's better to keep it as generic
> TID storage.
>

Okay, that makes sense.

> >
> > 2. What is the reason for not moving
> > lazy_vacuum_one_index/lazy_cleanup_one_index to vacuum.c so that they
> > can be called from vacuumlazy.c and vacuumparallel.c? Without this
> > refactoring patch, I think both leader and workers set the same error
> > context phase (VACUUM_ERRCB_PHASE_VACUUM_INDEX) during index
> > vacuuming? Is it because you want a separate context phase for a
> > parallel vacuum?
>
> Since the phases defined as VacErrPhase like
> VACUUM_ERRCB_PHASE_SCAN_HEAP and VACUUM_ERRCB_PHASE_VACUUM_HEAP etc.
> and error callback function, vacuum_error_callback(), are specific to
> heap, I thought it'd not be a good idea to move
> lazy_vacuum/cleanup_one_index() so that both vacuumlazy.c and
> vacuumparallel.c can use the phases and error callback function.
>

How about exposing it via heapam.h? We have already exposed a few
things via heapam.h (see /* in heap/vacuumlazy.c */). In the current
proposal, we need to have separate callbacks and phases for index
vacuuming so that it can be used by both vacuumlazy.c and
vacuumparallel.c which might not be a good idea.

-- 
With Regards,
Amit Kapila.

Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker

On Sun, Dec 19, 2021 at 7:00 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Which brings up a tangential question, is there value in having something
> > that brings in one or more env vars as psql vars directly. I'm thinking
> > something like:
>
> > \importenv pattern [prefix]
>
> Meh ... considering we've gone this long without any getenv capability
> in psql at all, that seems pretty premature.
>
> regards, tom lane
>

Fair enough.

Patches didn't apply with `git apply` but did fine with `patch -p1`, from
there it passes make check-world.

Re: row filtering for logical replication

2021-12-19 Thread Peter Smith

On Thu, Dec 2, 2021 at 7:40 PM vignesh C  wrote:
>
...

> Thanks for the updated patch, few comments:
> 1) Both testpub5a and testpub5c publication are same, one of them can be 
> removed
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION testpub5a FOR TABLE testpub_rf_tbl1 WHERE (a > 1)
> WITH (publish="insert");
> +CREATE PUBLICATION testpub5b FOR TABLE testpub_rf_tbl1;
> +CREATE PUBLICATION testpub5c FOR TABLE testpub_rf_tbl1 WHERE (a > 3)
> WITH (publish="insert");
> +RESET client_min_messages;
> +\d+ testpub_rf_tbl1
> +DROP PUBLICATION testpub5a, testpub5b, testpub5c;
>
> testpub5b will be covered in the earlier existing case above:
> ALTER PUBLICATION testpib_ins_trunct ADD TABLE pub_test.testpub_nopk,
> testpub_tbl1;
>
> \d+ pub_test.testpub_nopk
> \d+ testpub_tbl1
>
> I felt test related to testpub5b is also not required

Skipped. Strictly speaking you may be correct to say this code path is
already tested elsewhere. But this test case was meant for \d+ so I
wanted it to be "self-contained" and easy to observe it displaying
both with and without a filters both at the same time.

> 3) testpub7 can be renamed to testpub6 to maintain the continuity
> since the previous testpub6 did not succeed:
> +CREATE OPERATOR =#> (PROCEDURE = testpub_rf_func, LEFTARG = integer,
> RIGHTARG = integer);
> +CREATE PUBLICATION testpub6 FOR TABLE testpub_rf_tbl3 WHERE (e =#> 27);
> +-- fail - WHERE not allowed in DROP
> +ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl3 WHERE (e < 27);
> +-- fail - cannot ALTER SET table which is a member of a pre-existing schema
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION testpub7 FOR ALL TABLES IN SCHEMA testpub_rf_myschema1;
> +ALTER PUBLICATION testpub7 SET ALL TABLES IN SCHEMA
> testpub_rf_myschema1, TABLE testpub_rf_myschema1.testpub_rf_tb16;
> +RESET client_min_messages;
>

Fixed in v48 [1]

> 4) Did this test intend to include where clause in testpub_rf_tb16, if
> so it can be added:
> +-- fail - cannot ALTER SET table which is a member of a pre-existing schema
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION testpub7 FOR ALL TABLES IN SCHEMA testpub_rf_myschema1;
> +ALTER PUBLICATION testpub7 SET ALL TABLES IN SCHEMA
> testpub_rf_myschema1, TABLE testpub_rf_myschema1.testpub_rf_tb16;
> +RESET client_min_messages;
>

Fixed in v48 [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuHz1oFM7oaiHeqxMQqd0L70bV_hT7u_mDf3b8As5kwig%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: simplifying foreign key/RI checks

2021-12-19 Thread Corey Huinker

>
>
>
> I wasn't able to make much inroads into how we might be able to get
> rid of the DETACH-related partition descriptor hacks, the item (3),
> though I made some progress on items (1) and (2).
>
> For (1), the attached 0001 patch adds a new isolation suite
> fk-snapshot.spec to exercise snapshot behaviors in the cases where we
> no longer go through SPI.  It helped find some problems with the
> snapshot handling in the earlier versions of the patch, mainly with
> partitioned PK tables.  It also contains a test along the lines of the
> example you showed upthread, which shows that the partition descriptor
> hack requiring ActiveSnapshot to be set results in wrong results.
> Patch includes the buggy output for that test case and marked as such
> in a comment above the test.
>
> In updated 0002, I fixed things such that the snapshot-setting
> required by the partition descriptor hack is independent of
> snapshot-setting of the RI query such that it no longer causes the PK
> index scan to return rows that the RI query mustn't see.  That fixes
> the visibility bug illustrated in your example, and as mentioned, also
> exercised in the new test suite.
>
> I also moved find_leaf_pk_rel() into execPartition.c with a new name
> and a new set of parameters.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com


Sorry for the delay. This patch no longer applies, it has some conflict
with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Re: PublicationActions - use bit flags.

2021-12-19 Thread Greg Nancarrow

On Mon, Dec 20, 2021 at 11:19 AM Peter Smith  wrote:
>
> For some reason the current HEAD PublicationActions is a struct of
> boolean representing combinations of the 4 different "publication
> actions".
>
> I felt it is more natural to implement boolean flag combinations using
> a bitmask instead of a struct of bools. IMO using the bitmask also
> simplifies assignment and checking of said flags.
>
> PSA a small patch for this.
>
> Thoughts?
>

+1
I think the bit flags are a more natural fit, and also the patch
removes the unnecessary use of a palloc'd tiny struct to return the
PublicationActions (which also currently isn't explicitly pfree'd).


Regards,
Greg Nancarrow
Fujitsu Australia

Re: row filtering for logical replication

2021-12-19 Thread Amit Kapila

On Wed, Nov 24, 2021 at 3:22 PM vignesh C  wrote:
>
> On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian  wrote:
> >
>
> 3) Should we include row filter condition in pg_publication_tables
> view like in describe publication(\dRp+) , since the prqual is not
> easily readable in pg_publication_rel table:
>

How about exposing pubdef (or publicationdef) column via
pg_publication_tables? In this, we will display the publication
definition. This is similar to what we do for indexes via pg_indexes
view:
postgres=# select * from pg_indexes where tablename like '%t1%';
schemaname | tablename | indexname | tablespace | indexdef
+---+---++---
public | t1 | idx_t1 | | CREATE INDEX idx_t1 ON public.t1 USING btree
(c1) WHERE (c1 < 10)
(1 row)

The one advantage I see with this is that we will avoid adding
additional columns for the other patches like "column filter". Also,
it might be convenient for users. What do you think?

-- 
With Regards,
Amit Kapila.

Re: row filtering for logical replication

2021-12-19 Thread Peter Smith

On Mon, Dec 20, 2021 at 4:13 PM Amit Kapila  wrote:
>
> On Wed, Nov 24, 2021 at 3:22 PM vignesh C  wrote:
> >
> > On Tue, Nov 23, 2021 at 4:58 PM Ajin Cherian  wrote:
> > >
> >
> > 3) Should we include row filter condition in pg_publication_tables
> > view like in describe publication(\dRp+) , since the prqual is not
> > easily readable in pg_publication_rel table:
> >
>
> How about exposing pubdef (or publicationdef) column via
> pg_publication_tables? In this, we will display the publication
> definition. This is similar to what we do for indexes via pg_indexes
> view:
> postgres=# select * from pg_indexes where tablename like '%t1%';
> schemaname | tablename | indexname | tablespace | indexdef
> +---+---++---
> public | t1 | idx_t1 | | CREATE INDEX idx_t1 ON public.t1 USING btree
> (c1) WHERE (c1 < 10)
> (1 row)
>
> The one advantage I see with this is that we will avoid adding
> additional columns for the other patches like "column filter". Also,
> it might be convenient for users. What do you think?
>

I think it is a good idea, particularly since there are already some precedents.

OTOH maybe there is no immediate requirement for this feature because
there are already alternative ways to conveniently display the filters
(e.g. psql \d+ and \dRp+).

Currently, there is no pg_get_pubdef function (analogous to the
index's pg_get_indexdef) so that would need to be written from
scratch.

So I feel this is a good feature, but it could be implemented as an
independent patch in another thread.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: simplifying foreign key/RI checks

2021-12-19 Thread Amit Langote

On Mon, Dec 20, 2021 at 2:00 PM Corey Huinker  wrote:
> Sorry for the delay. This patch no longer applies, it has some conflict with 
> d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a

Thanks Corey for the heads up.  Rebased with some cosmetic adjustments.


--
Amit Langote
EDB: http://www.enterprisedb.com


v11-0001-Add-isolation-tests-for-snapshot-behavior-in-ri_.patch
Description: Binary data


v11-0002-Avoid-using-SPI-for-some-RI-checks.patch
Description: Binary data

Re: In-placre persistance change of a relation

2021-12-19 Thread Kyotaro Horiguchi

Hello, Jakub.

At Fri, 17 Dec 2021 09:10:30 +, Jakub Wartak  
wrote in 
> the patch didn't apply clean (it's from March; some hunks were failing), so 
> I've fixed it and the combined git format-patch is attached. It did conflict 
> with the following:

Thanks for looking this. Also thanks for Justin for finding the silly
use-after-free bug. (Now I see the regression test fails and I'm not
sure how come I didn't find this before.)

>   b0483263dda - Add support for SET ACCESS METHOD in ALTER TABLE
>   7b565843a94 - Add call to object access hook at the end of table 
> rewrite in ALTER TABLE
>   9ce346eabf3 - Report progress of startup operations that take a long 
> time.
>   f10f0ae420 - Replace RelationOpenSmgr() with RelationGetSmgr().
> 
> I'm especially worried if I didn't screw up something/forgot something 
> related to the last one (rd->rd_smgr changes), but I'm getting "All 210 tests 
> passed".

About the last one, all rel->rd_smgr acesses need to be repalced with
RelationGetSmgr(). On the other hand we can simply remove
RelationOpenSmgr() calls since the target smgrrelation is guaranteed
to be loaded by RelationGetSmgr().

The fix you made for RelationCreate/DropInitFork is correct and
changes you made would work, but I prefer that the code not being too
permissive for unknown (or unexpected) states.

> Basic demonstration of this patch (with wal_level=minimal):
>   create unlogged table t6 (id bigint, t text);
>   -- produces 110GB table, takes ~5mins
>   insert into t6 select nextval('s1'), repeat('A', 1000) from 
> generate_series(1, 1);
>   alter table t6 set logged;
>   on baseline SET LOGGED takes: ~7min10s
>   on patched SET LOGGED takes: 25s
> 
> So basically one can - thanks to this patch - use his application (performing 
> classic INSERTs/UPDATEs/DELETEs, so without the need to rewrite to use COPY) 
> and perform literally batch upload and then just switch the tables to LOGGED. 

This result is significant. That operation finally requires WAL writes
but I was not sure how much gain FPIs (or bulk WAL logging) gives in
comparison to operational WALs.

> Some more intensive testing also looks good, assuming table prepared to put 
> pressure on WAL:
>   create unlogged table t_unlogged (id bigint, t text) partition by hash 
> (id);
>   create unlogged table t_unlogged_h0 partition of t_unlogged  FOR VALUES 
> WITH (modulus 4, remainder 0);
>   [..]
>   create unlogged table t_unlogged_h3 partition of t_unlogged  FOR VALUES 
> WITH (modulus 4, remainder 3);
> 
> Workload would still be pretty heavy on LWLock/BufferContent,WALInsert and 
> Lock/extend .
>   t_logged.sql = insert into t_logged select nextval('s1'), repeat('A', 
> 1000) from generate_series(1, 1000); # according to pg_wal_stats.wal_bytes 
> generates ~1MB of WAL
>   t_unlogged.sql = insert into t_unlogged select nextval('s1'), 
> repeat('A', 1000) from generate_series(1, 1000); # according to 
> pg_wal_stats.wal_bytes generates ~3kB of WAL
> 
> so using: pgbench -f .sql -T 30 -P 1 -c 32 -j 3 t 
> with synchronous_commit =ON(default):
>   with t_logged.sql:   tps = 229 (lat avg = 138ms)
>   with t_unlogged.sql  tps = 283 (lat avg = 112ms) # almost all on 
> LWLock/WALWrite
> with synchronous_commit =OFF:
>   with t_logged.sql:   tps = 413 (lat avg = 77ms)
>   with t_unloged.sql:  tps = 782 (lat avg = 40ms)
> Afterwards switching the unlogged ~16GB partitions takes 5s per partition. 
> 
> As the thread didn't get a lot of traction, I've registered it into current 
> commitfest https://commitfest.postgresql.org/36/3461/ with You as the author 
> and in 'Ready for review' state.
> 
> I think it behaves as almost finished one and apparently after reading all 
> those discussions that go back over 10years+ time span about this feature, 
> and lot of failed effort towards wal_level=noWAL I think it would be nice to 
> finally start getting some of that of it into the core.

Thanks for taking the performance benchmark.

I didn't register for some reasons.

1. I'm not sure that we want to have the new mark files.

2. Aside of possible bugs, I'm not confident that the crash-safety of
  this patch is actually water-tight. At least we need tests for some
  failure cases.

3. As mentioned in transam/README, failure in removing smgr mark files
   leads to immediate shut down.  I'm not sure this behavior is acceptable.

4. Including the reasons above, this is not fully functionally.
   For example, if we execute the following commands on primary,
   replica dones't work correctly. (boom!)

   =# CREATE UNLOGGED TABLE t (a int);
   =# ALTER TABLE t SET LOGGED;


The following fixes are done in the attched v8.

- Rebased. Referring to Jakub and Justin's work, I replaced direct
  access to ->rd_smgr with RelationGetSmgr() and removed calls to
  RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN
  TABLESPACE SET LOGGED/UNLOGGED"

relcache not invalidated when ADD PRIMARY KEY USING INDEX

2021-12-19 Thread houzj.f...@fujitsu.com

Hi,

When reviewing some replica identity related patches, I found that when adding
primary key using an existing unique index on not null columns, the
target table's relcache won't be invalidated.

This would cause an error When REPLICA IDENTITY is default and we are
UPDATE/DELETE a published table , because we will refer to
RelationData::rd_pkindex to check if the UPDATE or DELETE can be safety
executed in this case.

---reproduction steps
CREATE TABLE test(a int not null, b int);
CREATE UNIQUE INDEX a ON test(a);
CREATE PUBLICATION PUB for TABLE test;
UPDATE test SET a = 2;
ERROR:  cannot update table "test" because it does not have a replica 
identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER 
TABLE.

alter table test add primary key using index a;
UPDATE test SET a = 2;
ERROR:  cannot update table "test" because it does not have a replica 
identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER 
TABLE.
---

I think the bug exists in HEAD ~ PG11 after the commit(f66e8bf) which remove
relhaspkey from pg_class. In PG10, when adding a primary key, it will always
update the relhaspkey in pg_class which will invalidate the relcache, so it
was OK.

I tried to write a patch to fix this by invalidating the relcache after marking
primary key in index_constraint_create().

Best regards,
Hou zj


0001-Invalid-relcache-when-ADD-PRIMARY-KEY-USING-INDEX.patch
Description:  0001-Invalid-relcache-when-ADD-PRIMARY-KEY-USING-INDEX.patch

Re: In-placre persistance change of a relation

2021-12-19 Thread Kyotaro Horiguchi

At Mon, 20 Dec 2021 15:28:23 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> 
> 4. Including the reasons above, this is not fully functionally.
>For example, if we execute the following commands on primary,
>replica dones't work correctly. (boom!)
> 
>=# CREATE UNLOGGED TABLE t (a int);
>=# ALTER TABLE t SET LOGGED;

> - The issue 4 above is not fixed (yet).

Not only for the case, RelationChangePersistence needs to send a
truncate record before FPIs.  If primary crashes amid of the
operation, the table content will be vanish with the persistence
change. That is the correct behavior.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b28163fd7b3527e69f5b76f252891f800d7ac98c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 11 Nov 2020 21:51:11 +0900
Subject: [PATCH v9 1/2] In-place table persistence change

Even though ALTER TABLE SET LOGGED/UNLOGGED does not require data
rewriting, currently it runs heap rewrite which causes large amount of
file I/O.  This patch makes the command run without heap rewrite.
Addition to that, SET LOGGED while wal_level > minimal emits WAL using
XLOG_FPI instead of massive number of HEAP_INSERT's, which should be
smaller.

Also this allows for the cleanup of files left behind in the crash of
the transaction that created it.
---
 src/backend/access/rmgrdesc/smgrdesc.c |  52 +++
 src/backend/access/transam/README  |   8 +
 src/backend/access/transam/xlog.c  |  17 +
 src/backend/catalog/storage.c  | 593 +++--
 src/backend/commands/tablecmds.c   | 256 +--
 src/backend/replication/basebackup.c   |   3 +-
 src/backend/storage/buffer/bufmgr.c|  88 
 src/backend/storage/file/fd.c  |   4 +-
 src/backend/storage/file/reinit.c  | 344 ++
 src/backend/storage/smgr/md.c  |  93 +++-
 src/backend/storage/smgr/smgr.c|  32 ++
 src/backend/storage/sync/sync.c|  20 +-
 src/bin/pg_rewind/parsexlog.c  |  24 +
 src/common/relpath.c   |  47 +-
 src/include/catalog/storage.h  |   2 +
 src/include/catalog/storage_xlog.h |  42 +-
 src/include/common/relpath.h   |   9 +-
 src/include/storage/bufmgr.h   |   2 +
 src/include/storage/fd.h   |   1 +
 src/include/storage/md.h   |   8 +-
 src/include/storage/reinit.h   |  10 +-
 src/include/storage/smgr.h |  17 +
 22 files changed, 1465 insertions(+), 207 deletions(-)

diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 773d57..d251f22207 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -40,6 +40,49 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
+	else if (info == XLOG_SMGR_UNLINK)
+	{
+		xl_smgr_unlink *xlrec = (xl_smgr_unlink *) rec;
+		char	   *path = relpathperm(xlrec->rnode, xlrec->forkNum);
+
+		appendStringInfoString(buf, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_MARK)
+	{
+		xl_smgr_mark *xlrec = (xl_smgr_mark *) rec;
+		char	   *path = GetRelationPath(xlrec->rnode.dbNode,
+		   xlrec->rnode.spcNode,
+		   xlrec->rnode.relNode,
+		   InvalidBackendId,
+		   xlrec->forkNum, xlrec->mark);
+		char	   *action;
+
+		switch (xlrec->action)
+		{
+			case XLOG_SMGR_MARK_CREATE:
+action = "CREATE";
+break;
+			case XLOG_SMGR_MARK_UNLINK:
+action = "DELETE";
+break;
+			default:
+action = "";
+break;
+		}
+
+		appendStringInfo(buf, "%s %s", action, path);
+		pfree(path);
+	}
+	else if (info == XLOG_SMGR_BUFPERSISTENCE)
+	{
+		xl_smgr_bufpersistence *xlrec = (xl_smgr_bufpersistence *) rec;
+		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
+
+		appendStringInfoString(buf, path);
+		appendStringInfo(buf, " persistence %d", xlrec->persistence);
+		pfree(path);
+	}
 }
 
 const char *
@@ -55,6 +98,15 @@ smgr_identify(uint8 info)
 		case XLOG_SMGR_TRUNCATE:
 			id = "TRUNCATE";
 			break;
+		case XLOG_SMGR_UNLINK:
+			id = "UNLINK";
+			break;
+		case XLOG_SMGR_MARK:
+			id = "MARK";
+			break;
+		case XLOG_SMGR_BUFPERSISTENCE:
+			id = "BUFPERSISTENCE";
+			break;
 	}
 
 	return id;
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index 1edc8180c1..b344bbe511 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -724,6 +724,14 @@ we must panic and abort recovery.  The DBA will have to manually clean up
 then restart recovery.  This is part of the reason for not writing a WAL
 entry until we've successfully done the original action.
 
+The Smgr MARK files
+
+
+An smgr mark file is created when a new relation file is created to
+mark the relfilenode needs to be cleaned up at recovery time.  In
+contrast to the four actions above, failure to remove smgr mark files
+will lead to data loss,

RE: In-placre persistance change of a relation

2021-12-19 Thread Jakub Wartak

Hi Kyotaro, I'm glad you are still into this

> I didn't register for some reasons.

Right now in v8 there's a typo in ./src/backend/catalog/storage.c :

storage.c: In function 'RelationDropInitFork':
storage.c:385:44: error: expected statement before ')' token
pending->unlink_forknum != INIT_FORKNUM)) <-- here, one ) too much

> 1. I'm not sure that we want to have the new mark files.

I can't help with such design decision, but if there are doubts maybe then add 
checking return codes around:
a) pg_fsync() and fsync_parent_path() (??) inside mdcreatemark()
b) mdunlinkmark() inside mdunlinkmark()
and PANIC if something goes wrong?

> 2. Aside of possible bugs, I'm not confident that the crash-safety of
>  this patch is actually water-tight. At least we need tests for some
>  failure cases.
>
> 3. As mentioned in transam/README, failure in removing smgr mark files
>leads to immediate shut down.  I'm not sure this behavior is acceptable.

Doesn't it happen for most of the stuff already? There's even data_sync_retry 
GUC.

> 4. Including the reasons above, this is not fully functionally.
>For example, if we execute the following commands on primary,
>replica dones't work correctly. (boom!)
> 
>=# CREATE UNLOGGED TABLE t (a int);
>=# ALTER TABLE t SET LOGGED;
> 
> 
> The following fixes are done in the attched v8.
> 
> - Rebased. Referring to Jakub and Justin's work, I replaced direct
>   access to ->rd_smgr with RelationGetSmgr() and removed calls to
>   RelationOpenSmgr(). I still separate the "ALTER TABLE ALL IN
>   TABLESPACE SET LOGGED/UNLOGGED" statement part.
> 
> - Fixed RelationCreate/DropInitFork's behavior for non-target
>   relations. (From Jakub's work).
> 
> - Fixed wording of some comments.
> 
> - As revisited, I found a bug around recovery. If the logged-ness of a
>   relation gets flipped repeatedly in a transaction, duplicate
>   pending-delete entries are accumulated during recovery and work in a
>   wrong way. sgmr_redo now adds up to one entry for a action.
> 
> - The issue 4 above is not fixed (yet).

Thanks again, If you have any list of crush tests ideas maybe I'll have some 
minutes 
to try to figure them out. Is there is any goto list of stuff to be checked to 
add confidence
to this patch (as per point #2) ?

BTW fast feedback regarding that ALTER patch  (there were 4 unlogged tables):
#  ALTER TABLE ALL IN TABLESPACE tbs1 set logged;
WARNING:  unrecognized node type: 349

-J.

Re: Schema variables - new implementation for Postgres 15

postgres_fdw: incomplete subabort cleanup of connections used in async execution

sqlsmith: ERROR: XX000: bogus varno: 2

Re: Getting rid of regression test input/ and output/ files

Re: sqlsmith: ERROR: XX000: bogus varno: 2

Re: Getting rid of regression test input/ and output/ files

Re: Getting rid of regression test input/ and output/ files

Re: row filtering for logical replication

Re: Getting rid of regression test input/ and output/ files

Re: Getting rid of regression test input/ and output/ files

PublicationActions - use bit flags.

Re: row filtering for logical replication

RE: row filtering for logical replication

Re: parallel vacuum comments

RE: [PATCH] pg_stat_toast

RE: row filtering for logical replication

Re: row filtering for logical replication

Re: row filtering for logical replication

RE: Confused comment about drop replica identity index

Re: parallel vacuum comments

Re: Getting rid of regression test input/ and output/ files

Re: row filtering for logical replication

Re: simplifying foreign key/RI checks

Re: PublicationActions - use bit flags.

Re: row filtering for logical replication

Re: row filtering for logical replication

Re: simplifying foreign key/RI checks

Re: In-placre persistance change of a relation

relcache not invalidated when ADD PRIMARY KEY USING INDEX

Re: In-placre persistance change of a relation

RE: In-placre persistance change of a relation

31 matches

Site Navigation

Mail list logo

Footer information