Handle infinite recursion in logical replication setup

2022-02-22 Thread vignesh C
 Hi,

In logical replication, currently Walsender sends the data that is
generated locally and the data that are replicated from other
instances. This results in infinite recursion in circular logical
replication setup.
Here the user is trying to have a 2-way replication setup with node 1
publishing data to node2 and node2 publishing data to node1, so that
the user can perform dml operations from any node, it can act as a
2-way multi master replication setup.

This problem can be reproduced with the following steps:
-- Instance 1
create publication pub1 for table t1;
create table t1(c1 int);

-- Instance 2
create table t1(c1 int);
create publication pub2 for table t1;
create subscription sub1 CONNECTION 'dbname=postgres port=5432'
publication pub1;

-- Instance 1
create subscription sub2 CONNECTION 'dbname=postgres port=5433'
publication pub2; insert into t1 values(10);

In this scenario, the Walsender in publisher pub1 sends data to the
apply worker in subscriber sub1, the apply worker in sub1 maps the
data to local tables and applies the individual changes as they are
received. Then the Walsender in publisher pub2 sends data to the apply
worker in subscriber sub2, the apply worker in sub2 maps the data to
local tables and applies the individual changes as they are received.
This process repeats infinitely.

Currently we do not differentiate if the data is locally generated
data, or a replicated data and we send both the data which causes
infinite recursion.

We could see that the record count has increased significantly within sometime:
select count(*) from t1;
  count
--
 400
(1 row)

If the table had primary key constraint, we could notice that the
first insert is successful and when the same insert is sent back, the
insert fails because of constraint error:
2022-02-23 09:28:43.592 IST [14743] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2022-02-23 09:28:43.592 IST [14743] DETAIL:  Key (c1)=(10) already exists.
2022-02-23 09:28:43.592 IST [14743] CONTEXT:  processing remote data
during "INSERT" for replication target relation "public.t1" in
transaction 727 at 2022-02-23 09:28:43.406738+05:30
2022-02-23 09:28:43.593 IST [14678] LOG:  background worker "logical
replication worker" (PID 14743) exited with exit code 1
2022-02-23 09:28:48.608 IST [14745] LOG:  logical replication apply
worker for subscription "sub2" has started
2022-02-23 09:28:48.624 IST [14745] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2022-02-23 09:28:48.624 IST [14745] DETAIL:  Key (c1)=(10) already exists.
2022-02-23 09:28:48.624 IST [14745] CONTEXT:  processing remote data
during "INSERT" for replication target relation "public.t1" in
transaction 727 at 2022-02-23 09:28:43.406738+05:30
2022-02-23 09:28:48.626 IST [14678] LOG:  background worker "logical
replication worker" (PID 14745) exited with exit code 1

The same problem can occur in any circular node setup like 3 nodes,
4node etc like: a) node1 publishing to node2 b) node2 publishing to
node3 c) node3 publishing back to node1.

Here there are two problems for the user: a) incremental
synchronization of table sending both local data and replicated data
by walsender b) Table synchronization of table using copy command
sending both local data and replicated data

For the first problem "Incremental synchronization of table by
Walsender" can be solved by:
Currently the locally generated data does not have replication origin
associated and the data that has originated from another instance will
have a replication origin associated. We could use this information to
differentiate locally generated data and replicated data and send only
the locally generated data. This "only_local" could be provided as an
option while subscription is created:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
PUBLICATION pub1 with (only_local = on);

I have attached a basic patch for this, if the idea is accepted, I
will work further to test more scenarios, add documentation, and test
and post an updated patch.
For the second problem, Table synchronization of table including local
data and replicated data using copy command.

Let us consider the following scenario:
a) node1 publishing to node2 b) node2 publishing to node1. Here in
this case node1 will have replicated data from node2 and vice versa.

In the above if user wants to include node3 to subscribe data from
node2. Users will have to create a subscription in node3 to get the
data from node2. During table synchronization we send the complete
table data from node2 to node3. Node2 will have local data from node2
and also replicated data from node1. Currently we don't have an option
to differentiate between the locally generated data and replicated
data in the heap which will cause infinite recursion as described
above.

To handle this if user has specified only_local option, we could throw
a warning or error out while creating subscription in this case, we
could have a colum

Re: Handle infinite recursion in logical replication setup

2022-03-01 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.

Once these issues are resolved, it can be used for bi-directional
logical replication.

> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.

Yes.

> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.

Replication origin is created by the apply worker and it will be used
for all the transactions received from the walsender. I feel the
replication origin will be present always.

> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.

I will post an updated version for this soon.

> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?

I wanted to get the opinion from others too just to make sure the
approach is right. I will fix this including the documentation, test,
etc in the later versions.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-02 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.
>
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.
>
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.
>
> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.
Rebased the patch on top of head

> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?
Modified

Thanks for the comments, the attached patch has the changes for the same.

Regards,
Vignesh
From 7c67cc23584e1106fbf2011c8c6658442125e48f Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Wed, 2 Mar 2022 20:40:34 +0530
Subject: [PATCH v2] Skip replication of non local data.

Add an option only_local which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=' PUBLICATION pub1 with (only_local = on);
---
 contrib/test_decoding/test_decoding.c |  13 +++
 doc/src/sgml/ref/alter_subscription.sgml  |   3 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   3 +-
 src/backend/commands/subscriptioncmds.c   |  29 -
 .../libpqwalreceiver/libpqwalreceiver.c   |  18 ++-
 src/backend/replication/logical/decode.c  |  36 --
 src/backend/replication/logical/logical.c |  35 ++
 src/backend/replication/logical/tablesync.c   |   2 +-
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  25 
 src/backend/replication/slot.c|   4 +-
 src/backend/replication/slotfuncs.c   |  18 ++-
 src/backend/replication/walreceiver.c |   2 +-
 src/backend/replication/walsender.c   |  21 +++-
 src/bin/psql/tab-complete.c   |   4 +-
 src/include/catalog/pg_proc.dat   |   6 +-
 src/include/catalog/pg_subscription.h |   3 +
 src/include/replication/logical.h |   4 +
 src/include/replication/output_plugin.h   |   7 ++
 src/include/replication/pgoutput.h|   1 +
 src/include/replication/slot.h|   5 +-
 src/include/replication/walreceiver.h |   8 +-
 src/test/regress/expected/rules.out   |   5 +-
 src/test/regress/expected/subscription.out|   4 +
 src/test/regress/sql/subscription.sql |   4 +
 src/test/subscription/t/029_circular.pl   | 108 ++
 src/tools/pgindent/typedefs.list  |   1 +
 29 files changed, 345 insertions(+), 39 deletions(-)
 create mode 100644 src/test/subscription/t/029_circular.pl

diff --git a/contrib/test_decoding/test_decoding.c b/contrib/test_decoding/test_decoding.c
index ea22649e41..58bc5dbc1c 100644
--- a/contrib/test_decoding/test_decoding.c
+++ b/contrib/test_decoding/test_decoding.c
@@ -73,6 +73,8 @@ static void pg_decode_truncate(LogicalDecodingContext *ctx,
 			   ReorderBufferChange *change);
 static bool pg_decode_filter(LogicalDecodingContext *ctx,
 			 RepOriginId origin_id);
+static bool pg_decode_filter_remotedata(LogicalDecodingContext *ctx,
+		RepOriginId origin_id);
 static void pg_decode_message(LogicalDecodingContext *ctx,
 			  ReorderBufferTXN *txn, XLogRecPtr message_lsn,
 			  bool transactional, const char *prefix,
@@ -148,6 +150,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
 	cb->truncate_cb = pg_de

Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
>
> Hi Vignesh, I also have not looked at the patch yet, but I have what
> seems like a very fundamental (and possibly dumb) question...
>
> Basically, I do not understand the choice of syntax for setting things up.
>
> IMO that "only-local" option sounds very similar to the other
> PUBLICATION ("publish") options which decide the kinds of things that
> will be published. So it feels more natural for me to think of the
> publisher as being the one to decide what will be published.
>
> e.g.
>
> option 1:
> CREATE PUBLICATION p1 FOR TABLE t1;
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
>
> option 2:
> CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
>
> ~~
>
> IIUC the patch is using option 1. My first impression was it feels
> back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> publish.
>
> So, why does the patch use syntax option 1?

I felt the advantage with keeping it at the subscription side is that,
the subscriber from one node can subscribe with only_local option on
and a different subscriber from a different node can subscribe with
only_local option as off. This might not be possible with having the
option at publisher side. Having it at the subscriber side might give
more flexibility for the user.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Tue, Mar 1, 2022 at 4:12 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vignesh,
>
> > In logical replication, currently Walsender sends the data that is
> > generated locally and the data that are replicated from other
> > instances. This results in infinite recursion in circular logical
> > replication setup.
>
> Thank you for good explanation. I understand that this fix can be used
> for a bidirectional replication.
>
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
>
> So you wanted to solve these two problem and currently focused on
> the first one, right? We can check one by one.
>
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> Sounds good, but I cannot distinguish whether the assumption will keep.
>
> I played with your patch, but it could not be applied to current master.
> I tested from bd74c40 and I confirmed infinite loop was not appeared.
>
> local_only could not be set from ALTER SUBSCRIPTION command.
> Is it expected?

I felt changing only_local option might be useful for the user while
modifying the subscription like setting it with a different set of
publications. Changes for this are included in the v2 patch attached
at [1].
[2] - 
https://www.postgresql.org/message-id/CALDaNm0WSo5369pr2eN1obTGBeiJU9cQdF6Ju1sC4hMQNy5BfQ%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread vignesh C
On Mon, Mar 7, 2022 at 11:45 AM Peter Smith  wrote:
>
> On Mon, Mar 7, 2022 at 4:20 PM vignesh C  wrote:
> >
> > On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
> > >
> > > Hi Vignesh, I also have not looked at the patch yet, but I have what
> > > seems like a very fundamental (and possibly dumb) question...
> > >
> > > Basically, I do not understand the choice of syntax for setting things up.
> > >
> > > IMO that "only-local" option sounds very similar to the other
> > > PUBLICATION ("publish") options which decide the kinds of things that
> > > will be published. So it feels more natural for me to think of the
> > > publisher as being the one to decide what will be published.
> > >
> > > e.g.
> > >
> > > option 1:
> > > CREATE PUBLICATION p1 FOR TABLE t1;
> > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> > >
> > > option 2:
> > > CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> > >
> > > ~~
> > >
> > > IIUC the patch is using option 1. My first impression was it feels
> > > back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> > > publish.
> > >
> > > So, why does the patch use syntax option 1?
> >
> > I felt the advantage with keeping it at the subscription side is that,
> > the subscriber from one node can subscribe with only_local option on
> > and a different subscriber from a different node can subscribe with
> > only_local option as off. This might not be possible with having the
> > option at publisher side. Having it at the subscriber side might give
> > more flexibility for the user.
> >
>
> OK.  Option 2 needs two publications for that scenario. IMO it's more
> intuitive this way, but maybe you wanted to avoid the extra
> publications?

Yes, I wanted to avoid the extra publication creation that you pointed
out. Option 1 can handle this scenario without creating the extra
publications:
node0: CREATE PUBLICATION p1 FOR TABLE t1;
node1: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = on);
node2:  CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = off);

I'm ok with both the approaches, now that this scenario can be handled
by using both the options. i.e providing only_local option as an
option while creating publication or providing only_local option as an
option while creating subscription as Peter has pointed out at [1].
option 1:
CREATE PUBLICATION p1 FOR TABLE t1;
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);

option 2:
CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'only_local');
CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;

Shall we get a few opinions on this and take it in that direction?

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPsAWaETh9VMymbBfMrqiE1KuqMq%2BwpBg0s7eMzwLATr%2Bw%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread vignesh C
On Mon, Mar 7, 2022 at 1:45 PM Peter Smith  wrote:
>
> On Mon, Mar 7, 2022 at 6:17 PM vignesh C  wrote:
> >
> > On Mon, Mar 7, 2022 at 11:45 AM Peter Smith  wrote:
> > >
> > > On Mon, Mar 7, 2022 at 4:20 PM vignesh C  wrote:
> > > >
> > > > On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  
> > > > wrote:
> > > > >
> > > > > Hi Vignesh, I also have not looked at the patch yet, but I have what
> > > > > seems like a very fundamental (and possibly dumb) question...
> > > > >
> > > > > Basically, I do not understand the choice of syntax for setting 
> > > > > things up.
> > > > >
> > > > > IMO that "only-local" option sounds very similar to the other
> > > > > PUBLICATION ("publish") options which decide the kinds of things that
> > > > > will be published. So it feels more natural for me to think of the
> > > > > publisher as being the one to decide what will be published.
> > > > >
> > > > > e.g.
> > > > >
> > > > > option 1:
> > > > > CREATE PUBLICATION p1 FOR TABLE t1;
> > > > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> > > > >
> > > > > option 2:
> > > > > CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> > > > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> > > > >
> > > > > ~~
> > > > >
> > > > > IIUC the patch is using option 1. My first impression was it feels
> > > > > back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> > > > > publish.
> > > > >
> > > > > So, why does the patch use syntax option 1?
> > > >
> > > > I felt the advantage with keeping it at the subscription side is that,
> > > > the subscriber from one node can subscribe with only_local option on
> > > > and a different subscriber from a different node can subscribe with
> > > > only_local option as off. This might not be possible with having the
> > > > option at publisher side. Having it at the subscriber side might give
> > > > more flexibility for the user.
> > > >
> > >
> > > OK.  Option 2 needs two publications for that scenario. IMO it's more
> > > intuitive this way, but maybe you wanted to avoid the extra
> > > publications?
> >
> > Yes, I wanted to avoid the extra publication creation that you pointed
> > out. Option 1 can handle this scenario without creating the extra
> > publications:
> > node0: CREATE PUBLICATION p1 FOR TABLE t1;
> > node1: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = on);
> > node2:  CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = 
> > off);
> >
> > I'm ok with both the approaches, now that this scenario can be handled
> > by using both the options. i.e providing only_local option as an
> > option while creating publication or providing only_local option as an
> > option while creating subscription as Peter has pointed out at [1].
> > option 1:
> > CREATE PUBLICATION p1 FOR TABLE t1;
> > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> >
> > option 2:
> > CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'only_local');
> > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> >
> > Shall we get a few opinions on this and take it in that direction?
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CAHut%2BPsAWaETh9VMymbBfMrqiE1KuqMq%2BwpBg0s7eMzwLATr%2Bw%40mail.gmail.com
> >
> > Regards,
> > Vignesh
>
> BTW here is a counter-example to your scenario from earlier.
>
> Let's say I have a publication p1 and p2 and want to subscribe to p1
> with only_local=true, and p2 with only_local = false;
>
> Using the current OPtion 1 syntax you cannot do this with a single
> subscription because the option is tied to the subscription.
> But using syntax Option 2 you may be able to do it.
>
> Option 1:
> CREATE PUBLICATION p1 FOR TABLE t1;
> CREATE PUBLICATION p2 FOR TABLE t2;
> CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 WITH (local_only = true);
> CREATE SUBSCRIPTION s2 ... FOR PUBLICATION p1 WITH (local_only = false);
>
> Option 2:
> CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'local_only');
> CREATE PUBLICATION p2 FOR TABLE t2;
> CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1, p2;

I felt having multiple publications will create duplicate entries in
the system table, Amit also has pointed this at [1]. Also enhancing
this approach to support filtering based on replication origin which
is suggested by dilip at [2] is also on the client side and also the
initial check to handle the copy_data specified by Amit at [3] will be
done by the client side. Based on the above I feel the existing
approach is better. I might be missing something here.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LgCVv8u-fOsMPbGC96sWXhT3EKOBAeFW3g84otjStztw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAFiTN-tKbjHDjAFNnqRoR8u1B%2Bfs0wunGz%3D3wp0iU-sUaxZJTQ%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAA4eK1%2Bco2cd8a6okgUD_pcFEHcc7mVc0k_RE2%3D6ahyv3WPRMg%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread vignesh C
On Mon, Mar 7, 2022 at 5:51 PM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
>  wrote:
> >
> > Hi Vignesh,
> > I agree with Peter's comment that the changes to
> > FilterRemoteOriginData() should be part of FilterByOrigin()
> >
> > Further, I wonder why "onlylocal_data" is a replication slot's
> > property. A replication slot tracks the progress of replication and it
> > may be used by different receivers with different options. I could
> > start one receiver which wants only local data, say using
> > "pg_logical_slot_get_changes" and later start another receiver which
> > fetches all the data starting from where the first receiver left. This
> > option prevents such flexibility.
> >
> > As discussed earlier in the thread, local_only can be property of
> > publication or subscription, depending upon the use case, but I can't
> > see any reason that it should be tied to a replication slot.
> >
>
> I thought it should be similar to 'streaming' option of subscription
> but may be Vignesh has some other reason which makes it different.

Yes, this can be removed from the replication slot. It is my mistake
that I have made while making the code similar to two-phase, I'm
working on making the changes for this. I will fix and post an updated
version for this.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-08 Thread vignesh C
On Mon, Mar 7, 2022 at 2:28 PM Peter Smith  wrote:
>
> Hi Vignesh,
>
> Here are some review comments for patch v2.
>
> ==
>
> 1. Question about syntax
>
> I already posted some questions about why the syntax is on the CREATE
> SUBSCRCRIBER side.
> IMO "local_only" is a publisher option, so it seemed more natural to
> me for it to be specified as a "publish" option.
>
> Ref [1] my original question + suggestion for Option 2
> Ref [2] some other examples of subscribing to multiple-publishers
>
> Anyway, +1 to see what other people think.
>

I feel we can support it in the subscriber side first and then extend
it to the publisher side as being discussed in [1]. I have retained it
as it is.

> ~~~
>
> 2. ALTER
>
> (related also to the question about syntax)
>
> If subscribing to multiple publications then ALTER is going to change
> the 'local_only' for all of them, which might not be what you want
> (??)
>

 I feel we can support it in the subscriber side first and then extend
it to the publisher side as being discussed in [1]. When it is
extended to the publisher side, it will get handled. I have retained
it as it is.

> ~~~
>
> 3. subscription_parameter
>
> (related also to the question about syntax)
>
> CREATE SUBSCRIPTION subscription_name
> CONNECTION 'conninfo'
> PUBLICATION publication_name [, ...]
> [ WITH ( subscription_parameter [= value] [, ... ] ) ]
>
> ~
>
> That WITH is for *subscription* options, not the publication options.
>
> So IMO 'local_only' intuitively seems like "local" means local where
> the subscriber is.
>
> So, if the Option 1 syntax is chosen (see comment #1) then I think the
> option name maybe should change to be something more like
> 'publish_local_only' or something similar to be more clear what local
> actually means.
>

Changed it to publish_local_only

> ~~~
>
> 4. contrib/test_decoding/test_decoding.c
>
> @@ -484,6 +487,16 @@ pg_decode_filter(LogicalDecodingContext *ctx,
>   return false;
>  }
>
> +static bool
> +pg_decode_filter_remotedata(LogicalDecodingContext *ctx,
> +   RepOriginId origin_id)
> +{
> + TestDecodingData *data = ctx->output_plugin_private;
> +
> + if (data->only_local && origin_id != InvalidRepOriginId)
> + return true;
> + return false;
> +}
>
> 4a. Maybe needs function comment.

Modified

> 4b. Missing blank line following this function
>

Modified

> ~~~
>
> 5. General - please check all of the patch.
>
> There seems inconsistency with the member names, local variable names,
> parameter names etc. There are all variations of:
>
> - only_local
> - onlylocaldata
> - onlylocal_data
> - etc
>
> Please try using the same name everywhere for everything if possible.
>

I have changed it to only_local wherever possible.

> ~~~
>
> 6. src/backend/replication/logical/decode.c - FilterRemoteOriginData
>
> @@ -585,7 +594,8 @@ logicalmsg_decode(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
>   message = (xl_logical_message *) XLogRecGetData(r);
>
>   if (message->dbId != ctx->slot->data.database ||
> - FilterByOrigin(ctx, origin_id))
> + FilterByOrigin(ctx, origin_id) ||
> + FilterRemoteOriginData(ctx, origin_id))
>   return;
>
> I noticed that every call to FilterRemoteOriginData has an associated
> preceding call to FilterByOrigin. It might be worth just combining the
> logic into FilterByOrigin. Then none of that calling code (9 x places)
> would need to change at all.

Modified

> ~~~
>
> 7. src/backend/replication/logical/logical.c  - CreateInitDecodingContext
>
> @@ -451,6 +453,8 @@ CreateInitDecodingContext(const char *plugin,
>   */
>   ctx->twophase &= slot->data.two_phase;
>
> + ctx->onlylocal_data &= slot->data.onlylocal_data;
>
> The equivalent 'twophase' option had a big comment. Probably this new
> option should also have a similar comment?

These change is not required anymore, the comment no more applies. I
have not made any change for this.

> ~~~
>
> 8. src/backend/replication/logical/logical.c - filter_remotedata_cb_wrapper
>
> +bool
> +filter_remotedata_cb_wrapper(LogicalDecodingContext *ctx,
> +RepOriginId origin_id)
> +{
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + bool ret;
> +
> + Assert(!ctx->fast_forward);
> +
> + /* Push callback + info on the error context stack */
> + state.ctx = ctx;
> + state.callback_name = "filter_remoteorigin";
>
> There is no consistency between the function and the name:
>
> "filter_remoteorigin" versus filter_remotedata_cb.
>
> A similar inconsistency for this is elsewhere. See review comment #9

Modified it to filter_remote_origin_cb_wrapper and changed to
*_remotedata_* to *_remote_origin_*

> ~~~
>
> 9. src/backend/replication/pgoutput/pgoutput.c
>
> @@ -215,6 +217,7 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
>   cb->commit_prepared_cb = pgoutput_commit_prepared_txn;
>   cb->rollback_prepared_cb = pgoutput_rollback_prepared_txn;
>   cb->filter_by_origin_cb = pgoutput_origin_filter;
> + cb->filter_remotedata_cb = pgoutput_remoteorigin_fil

Re: Handle infinite recursion in logical replication setup

2022-03-08 Thread vignesh C
On Mon, Mar 7, 2022 at 9:57 PM vignesh C  wrote:
>
> On Mon, Mar 7, 2022 at 5:51 PM Amit Kapila  wrote:
> >
> > On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Hi Vignesh,
> > > I agree with Peter's comment that the changes to
> > > FilterRemoteOriginData() should be part of FilterByOrigin()
> > >
> > > Further, I wonder why "onlylocal_data" is a replication slot's
> > > property. A replication slot tracks the progress of replication and it
> > > may be used by different receivers with different options. I could
> > > start one receiver which wants only local data, say using
> > > "pg_logical_slot_get_changes" and later start another receiver which
> > > fetches all the data starting from where the first receiver left. This
> > > option prevents such flexibility.
> > >
> > > As discussed earlier in the thread, local_only can be property of
> > > publication or subscription, depending upon the use case, but I can't
> > > see any reason that it should be tied to a replication slot.
> > >
> >
> > I thought it should be similar to 'streaming' option of subscription
> > but may be Vignesh has some other reason which makes it different.
>
> Yes, this can be removed from the replication slot. It is my mistake
> that I have made while making the code similar to two-phase, I'm
> working on making the changes for this. I will fix and post an updated
> version for this.

I have made the changes for this, the changes for this are available
in the v3 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm0JcV-7iQZhyy3kehnWTy6x%3Dz%2BsX6u6Df%2B%2By8z33pz%2BBw%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-11 Thread vignesh C
On Wed, Mar 2, 2022 at 3:59 PM Amit Kapila  wrote:
>
> On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> >
> ...
> ...
> > I have attached a basic patch for this, if the idea is accepted, I
> > will work further to test more scenarios, add documentation, and test
> > and post an updated patch.
> > For the second problem, Table synchronization of table including local
> > data and replicated data using copy command.
> >
> > Let us consider the following scenario:
> > a) node1 publishing to node2 b) node2 publishing to node1. Here in
> > this case node1 will have replicated data from node2 and vice versa.
> >
> > In the above if user wants to include node3 to subscribe data from
> > node2. Users will have to create a subscription in node3 to get the
> > data from node2. During table synchronization we send the complete
> > table data from node2 to node3. Node2 will have local data from node2
> > and also replicated data from node1. Currently we don't have an option
> > to differentiate between the locally generated data and replicated
> > data in the heap which will cause infinite recursion as described
> > above.
> >
> > To handle this if user has specified only_local option, we could throw
> > a warning or error out while creating subscription in this case, we
> > could have a column srreplicateddata in pg_subscription_rel which
> > could indicate if the table has any replicated data or not:
> > postgres=# select * from pg_subscription_rel;
> >  srsubid | srrelid | srsubstate | srsublsn  | srreplicateddata
> > -+-++---+--
> >16389 |   16384 | r  | 0/14A4640 |t
> >16389 |   16385 | r  | 0/14A4690 |f
> > (1 row)
> >
> > In the above example, srreplicateddata with true indicates, tabel t1
> > whose relid is 16384 has replicated data and the other row having
> > srreplicateddata  as false indicates table t2 whose relid is 16385
> > does not have replicated data.
> > When creating a new subscription, the subscriber will connect to the
> > publisher and check if the relation has replicated data by checking
> > srreplicateddata in pg_subscription_rel table.
> > If the table has any replicated data, log a warning or error for this.
> >
>
> If you want to give the error in this case, then I think we need to
> provide an option to the user to allow copy. One possibility could be
> to extend existing copy_data option as 'false', 'true', 'force'. For
> 'false', there shouldn't be any change, for 'true', if 'only_local'
> option is also set and the new column indicates replicated data then
> give an error, for 'force', we won't give an error even if the
> conditions as mentioned for 'true' case are met, rather we will allow
> copy in this case.

When a subscription is created with publish_local_only and copy_data,
it will connect to the publisher and check if the published tables
have also been subscribed from other nodes by checking if the entry is
present in pg_subscription_rel and throw an error if present. The
attached v4-0002-Support-force-option-for-copy_data-check-and-thro.patch
has the implementation for the same.
Thoughts?

Regards,
Vignesh
From ef978c53b5b195b861210a2ef7f2ff876f004e94 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Mon, 7 Mar 2022 11:19:10 +0530
Subject: [PATCH v4 1/2] Skip replication of non local data.

Add an option publish_local_only which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=' PUBLICATION pub1 with (publish_local_only = on);
---
 contrib/test_decoding/test_decoding.c |  20 +++
 doc/src/sgml/ref/alter_subscription.sgml  |   3 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   5 +-
 src/backend/commands/subscriptioncmds.c   |  26 +++-
 .../libpqwalreceiver/libpqwalreceiver.c   |   4 +
 src/backend/replication/logical/decode.c  |  15 ++-
 src/backend/replication/logical/logical.c |  33 +
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  45 +++
 src/bin/pg_dump/pg_dump.c |  16 ++-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   8 +-
 src/bin/psql/tab-complete.c 

Re: Handle infinite recursion in logical replication setup

2022-03-11 Thread vignesh C
On Fri, Mar 11, 2022 at 4:28 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Vegnesh,
>
> While considering about second problem, I was very confusing about it.
> I'm happy if you answer my question.
>
> > To handle this if user has specified only_local option, we could throw
> > a warning or error out while creating subscription in this case, we
> > could have a column srreplicateddata in pg_subscription_rel which
> > could indicate if the table has any replicated data or not:
> > postgres=# select * from pg_subscription_rel;
> >  srsubid | srrelid | srsubstate | srsublsn  | srreplicateddata
> > -+-++---+--
> >16389 |   16384 | r  | 0/14A4640 |t
> >16389 |   16385 | r  | 0/14A4690 |f
> > (1 row)
> > In the above example, srreplicateddata with true indicates, tabel t1
> > whose relid is 16384 has replicated data and the other row having
> > srreplicateddata  as false indicates table t2 whose relid is 16385
> > does not have replicated data.
> > When creating a new subscription, the subscriber will connect to the
> > publisher and check if the relation has replicated data by checking
> > srreplicateddata in pg_subscription_rel table.
> > If the table has any replicated data, log a warning or error for this.
>
> IIUC srreplicateddata represents whether the subscribed data is not
> generated from the publisher, but another node.
> My first impression was that the name 'srreplicateddata' is not friendly
> because all subscribed data is replicated from publisher.
> Also I was not sure how value of the column was set.
> IIUC a filtering by replication origins is done in publisher node
> and subscriber node cannot know
> whether some data are really filtered or not.
> If we distinguish by subscriber option publish_local_only,
> it cannot reproduce your example because same subscriber have different 
> 'srreplicateddata'.

Let's consider an existing Multi master logical replication setup
between Node1 and Node2 that is created using the following steps:
a) Node1 - Publication publishing employee table - pub1
b) Node2 - Subscription subscribing from publication pub1 with
publish_local_only - sub1_pub1_node1
c) Node2 - Publication publishing employee table - pub2
d) Node1 - Subscription subscribing from publication pub2 with
publish_local_only - sub2_pub2_node2

To create a subscription in node3, we will be using the following steps:
a) Node2 - Publication publishing employee table. - pub3
b) Node3 - Subscription subscribing from publication in Node2 with
publish_local_only - sub3_pub3_node2

When we create a subscription in Node3, Node3 will connect to
Node2(this will not be done in Node3) and check if the employee table
is present in pg_subscription_rel, in our case Node2 will have
employee table present in pg_subscription_rel (sub1_pub1_node1
subscribing to employee table from pub1 in Node1). As employee table
is being subscribed in node2 from node1, we will throw an error like
below:
postgres=# create subscription sub2 CONNECTION 'dbname =postgres port
= ' publication pub2 with (publish_local_only=on);
ERROR:  CREATE/ALTER SUBSCRIPTION with publish_local_only and
copy_data as true is not allowed when the publisher might have
replicated data, table:public.t1 might have replicated data in the
publisher
HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force

I was initially planning to add srreplicateddata field but I have
changed it slightly to keep the design simple. Now we just check if
the relation is present in pg_subscription_rel and throw an error if
copy_data and publish_local_only option is specified. The changes for
the same are available at [1].

[1] - 
https://www.postgresql.org/message-id/CALDaNm0V%2B%3Db%3DCeZJNAAUO2PmSXH5QzNX3jADXb-0hGO_jVj0vA%40mail.gmail.com
Thoughts?

Regards,
Vignesh




Re: Printing backtrace of postgres processes

2022-03-11 Thread vignesh C
On Wed, Mar 9, 2022 at 9:26 PM Justin Pryzby  wrote:
>
> rebased to appease cfbot.
>
> + couple of little fixes as 0002.

Thanks for rebasing and fixing a few issues. I have taken all your
changes except for mcxtfuncs changes as those changes were not done as
part of this patch. Attached v19 patch which has the changes for the
same.

Regards,
Vignesh
From baff4bd9331eff3d3d021724847c2fb89d4bcdcf Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 25 Jan 2022 08:21:22 +0530
Subject: [PATCH v19] Add function to log the backtrace of the specified
 postgres process.

This commit adds pg_log_backtrace() function that requests to log
the backtrace of the specified backend or auxiliary process except
logger and statistic collector.

Only superusers are allowed to request to log the backtrace
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its backtrace at LOG_SERVER_ONLY level,
so that the backtrace will appear in the server log but not
be sent to the client.

Bump catalog version.

Authors: Vignesh C, Bharath Rupireddy, Justin Pryzby
Reviewers: Bharath Rupireddy, Justin Pryzby, Fujii Masao, Atsushi Torikoshi, Dilip Kumar, Robert Haas, Andres Freund, Tom lane, Craig Ringer
Discussion: https://www.postgresql.org/message-id/CALDaNm3ZzmFS-=r7oduzj7y7bgqv+n06kqyft6c3xzdoknk...@mail.gmail.com
---
 doc/src/sgml/func.sgml| 62 +++
 src/backend/catalog/system_functions.sql  |  2 +
 src/backend/postmaster/autovacuum.c   |  4 ++
 src/backend/postmaster/checkpointer.c |  4 ++
 src/backend/postmaster/interrupt.c|  4 ++
 src/backend/postmaster/pgarch.c   | 10 +++-
 src/backend/postmaster/startup.c  |  4 ++
 src/backend/postmaster/walwriter.c|  4 ++
 src/backend/storage/ipc/procarray.c   | 56 +
 src/backend/storage/ipc/procsignal.c  | 42 +
 src/backend/storage/ipc/signalfuncs.c | 73 ---
 src/backend/tcop/postgres.c   |  4 ++
 src/backend/utils/adt/mcxtfuncs.c | 40 +++--
 src/backend/utils/error/elog.c| 19 --
 src/backend/utils/init/globals.c  |  1 +
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/miscadmin.h   |  1 +
 src/include/storage/procarray.h   |  2 +
 src/include/storage/procsignal.h  |  3 +-
 src/include/utils/elog.h  |  2 +
 src/test/regress/expected/backtrace.out   | 49 +++
 src/test/regress/expected/backtrace_1.out | 55 +
 src/test/regress/parallel_schedule|  2 +-
 src/test/regress/sql/backtrace.sql| 33 ++
 24 files changed, 416 insertions(+), 65 deletions(-)
 create mode 100644 src/test/regress/expected/backtrace.out
 create mode 100644 src/test/regress/expected/backtrace_1.out
 create mode 100644 src/test/regress/sql/backtrace.sql

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8a802fb225..4171208719 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25355,6 +25355,30 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backtrace
+
+pg_log_backtrace ( pid integer )
+boolean
+   
+   
+Requests to log the backtrace of the backend with the
+specified process ID.  This function can send the request to
+backends and auxiliary processes except the logger and statistics
+collector.  The backtraces will be logged at LOG
+message level. They will appear in the server log based on the log
+configuration set (See  for
+more information), but will not be sent to the client regardless of
+. A backtrace identifies
+which function a process is currently executing and it may be useful
+for developers to diagnose stuck processes and other problems. This
+function is supported only if PostgreSQL was built with the ability to
+capture backtraces, otherwise it will emit a warning.
+   
+  
+
   

 
@@ -25574,6 +25598,44 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_backtrace can be used to log the backtrace of
+a backend process. For example:
+
+postgres=# select pg_log_backtrace(pg_backend_pid());
+ pg_log_backtrace
+--
+ t
+(1 row)
+
+The backtrace will be logged as specified by the logging configuration.
+For example:
+
+2021-01-27 11:33:50.247 IST [111735] LOG:  current backtrace:
+postgres: postgresdba postgres [local] SELECT(set_backtrace+0x38) [0xae06c5]
+postgres: postgresdba postgres [local] SELECT(Pr

Re: Logical replication - schema change not invalidating the relation cache

2022-03-11 Thread vignesh C
On Fri, Dec 3, 2021 at 3:21 PM vignesh C  wrote:
>
> On Fri, Dec 3, 2021 at 1:13 PM Michael Paquier  wrote:
> >
> > On Thu, Aug 26, 2021 at 09:00:39PM +0530, vignesh C wrote:
> > > The previous patch was failing because of the recent test changes made
> > > by commit 201a76183e2 which unified new and get_new_node, attached
> > > patch has the changes to handle the changes accordingly.
> >
> > Please note that the CF app is complaining about this patch, so a
> > rebase is required.  I have moved it to next CF, waiting on author,
> > for now.
>
> Thanks for letting me know, I have rebased it on top of HEAD, the
> attached v2 version has the rebased changes.

The patch was not applying on top of the HEAD, attached v3 version
which has the rebased changes.

Regards,
Vignesh
From ad79b95e9362239e32cc597aeac1d6e22aaa7504 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sat, 12 Mar 2022 13:04:55 +0530
Subject: [PATCH v3] Fix for invalidating logical replication relations when
 there is a change in schema.

When the schema gets changed, the rel sync cache invalidation was not
happening, fixed it by adding a callback for schema change.
---
 src/backend/replication/pgoutput/pgoutput.c | 44 
 src/test/subscription/t/001_rep_changes.pl  | 74 +
 2 files changed, 118 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index ea57a0477f..2ce634a90b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -176,6 +176,8 @@ static RelationSyncEntry *get_rel_sync_entry(PGOutputData *data,
 static void rel_sync_cache_relation_cb(Datum arg, Oid relid);
 static void rel_sync_cache_publication_cb(Datum arg, int cacheid,
 		  uint32 hashvalue);
+static void rel_sync_cache_namespace_cb(Datum arg, int cacheid,
+		uint32 hashvalue);
 static void set_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
 			TransactionId xid);
 static bool get_schema_sent_in_streamed_txn(RelationSyncEntry *entry,
@@ -1658,6 +1660,9 @@ init_rel_sync_cache(MemoryContext cachectx)
 	CacheRegisterSyscacheCallback(PUBLICATIONNAMESPACEMAP,
   rel_sync_cache_publication_cb,
   (Datum) 0);
+	CacheRegisterSyscacheCallback(NAMESPACEOID,
+  rel_sync_cache_namespace_cb,
+  (Datum) 0);
 }
 
 /*
@@ -1989,6 +1994,45 @@ rel_sync_cache_relation_cb(Datum arg, Oid relid)
 	}
 }
 
+/*
+ * Namespace syscache invalidation callback
+ */
+static void
+rel_sync_cache_namespace_cb(Datum arg, int cacheid, uint32 hashvalue)
+{
+	HASH_SEQ_STATUS status;
+	RelationSyncEntry *entry;
+
+	/*
+	 * We can get here if the plugin was used in SQL interface as the
+	 * RelSchemaSyncCache is destroyed when the decoding finishes, but there
+	 * is no way to unregister the relcache invalidation callback.
+	 */
+	if (RelationSyncCache == NULL)
+		return;
+
+	hash_seq_init(&status, RelationSyncCache);
+	while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
+	{
+		/*
+		 * Reset schema sent status as the relation definition may have changed.
+		 * Also free any objects that depended on the earlier definition.
+		 */
+		entry->schema_sent = false;
+		list_free(entry->streamed_txns);
+		entry->streamed_txns = NIL;
+
+		if (entry->attrmap)
+			free_attrmap(entry->attrmap);
+		entry->attrmap = NULL;
+
+		if (hash_search(RelationSyncCache,
+(void *) &entry->relid,
+HASH_REMOVE, NULL) == NULL)
+			elog(ERROR, "hash table corrupted");
+	}
+}
+
 /*
  * Publication relation/schema map syscache invalidation callback
  *
diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index af0cff6a30..4b7b9e54ff 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -520,6 +520,80 @@ is($result, qq(0), 'check replication origin was dropped on subscriber');
 $node_subscriber->stop('fast');
 $node_publisher->stop('fast');
 
+# Test schema invalidation by renaming the schema
+# Create tables on publisher
+# Initialize publisher node
+my $node_publisher1 = PostgreSQL::Test::Cluster->new('publisher1');
+$node_publisher1->init(allows_streaming => 'logical');
+$node_publisher1->start;
+
+# Create subscriber node
+my $node_subscriber1 = PostgreSQL::Test::Cluster->new('subscriber1');
+$node_subscriber1->init(allows_streaming => 'logical');
+$node_subscriber1->start;
+
+my $publisher1_connstr = $node_publisher1->connstr . ' dbname=postgres';
+
+$node_publisher1->safe_psql('postgres', "CREATE SCHEMA sch1");
+$node_publisher1->safe_psql('postgres', "CREATE TABLE sch1.t1 (c1 int)");
+
+# Create tables on subscrib

Tab completion not listing schema list for create/alter publication for all tables in schema

2022-03-13 Thread vignesh C
Hi,

I noticed that the following commands "CREATE PUBLICATION pub1 FOR ALL
TABLES IN SCHEMA" and  "ALTER PUBLICATION pub1 ADD ALL TABLES IN
SCHEMA" does not complete with the schema list. I feel this is because
of the following code in tab-complete.c:
.
COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas
" AND nspname NOT LIKE E'pg_%'",
"CURRENT_SCHEMA");
.
Here "pg_%" should be "pg_%%".
Attached a patch to handle this.
Thoughts?

Regards,
Vignesh
From 4321bafb2b7594f6c1af5d02f64e934fdba9c2ef Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Sun, 13 Mar 2022 22:09:56 +0530
Subject: [PATCH] Tab completion not listing schema list for create/alter
 publication.

Tab completion not listing schema list for create/alter publication.
---
 src/bin/psql/tab-complete.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6957567264..6d5c928c10 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1811,7 +1811,7 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("(", "ALL TABLES IN SCHEMA", "TABLE");
 	else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD|DROP|SET", "ALL", "TABLES", "IN", "SCHEMA"))
 		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas
- " AND nspname NOT LIKE E'pg_%'",
+ " AND nspname NOT LIKE E'pg_%%'",
  "CURRENT_SCHEMA");
 	/* ALTER PUBLICATION  SET ( */
 	else if (HeadMatches("ALTER", "PUBLICATION", MatchAny) && TailMatches("SET", "("))
@@ -2956,7 +2956,7 @@ psql_completion(const char *text, int start, int end)
 	 */
 	else if (Matches("CREATE", "PUBLICATION", MatchAny, "FOR", "ALL", "TABLES", "IN", "SCHEMA"))
 		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas
- " AND nspname NOT LIKE E'pg_%'",
+ " AND nspname NOT LIKE E'pg_%%'",
  "CURRENT_SCHEMA");
 	else if (Matches("CREATE", "PUBLICATION", MatchAny, "FOR", "ALL", "TABLES", "IN", "SCHEMA", MatchAny) && (!ends_with(prev_wd, ',')))
 		COMPLETE_WITH("WITH (");
-- 
2.32.0



Re: Tab completion not listing schema list for create/alter publication for all tables in schema

2022-03-14 Thread vignesh C
On Mon, Mar 14, 2022 at 5:23 AM Tom Lane  wrote:
>
> vignesh C  writes:
> > Here "pg_%" should be "pg_%%".
>
> Right you are.  Patch pushed, thanks!

Thanks for pushing the patch.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-03-15 Thread vignesh C
On Tue, Mar 15, 2022 at 7:09 AM kuroda.hay...@fujitsu.com
 wrote:
>
> Dear Vignesh,
>
> Thank you for updating your patch!
>
> > Let's consider an existing Multi master logical replication setup
> > between Node1 and Node2 that is created using the following steps:
> > a) Node1 - Publication publishing employee table - pub1
> > b) Node2 - Subscription subscribing from publication pub1 with
> > publish_local_only - sub1_pub1_node1
> > c) Node2 - Publication publishing employee table - pub2
> > d) Node1 - Subscription subscribing from publication pub2 with
> > publish_local_only - sub2_pub2_node2
> >
> > To create a subscription in node3, we will be using the following steps:
> > a) Node2 - Publication publishing employee table. - pub3
> > b) Node3 - Subscription subscribing from publication in Node2 with
> > publish_local_only - sub3_pub3_node2
> >
> > When we create a subscription in Node3, Node3 will connect to
> > Node2(this will not be done in Node3) and check if the employee table
> > is present in pg_subscription_rel, in our case Node2 will have
> > employee table present in pg_subscription_rel (sub1_pub1_node1
> > subscribing to employee table from pub1 in Node1). As employee table
> > is being subscribed in node2 from node1, we will throw an error like
> > below:
> > postgres=# create subscription sub2 CONNECTION 'dbname =postgres port
> > = ' publication pub2 with (publish_local_only=on);
> > ERROR:  CREATE/ALTER SUBSCRIPTION with publish_local_only and
> > copy_data as true is not allowed when the publisher might have
> > replicated data, table:public.t1 might have replicated data in the
> > publisher
> > HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force
>
> Thanks for kind explanation.
> I read above and your doc in 0002, and I put some comments.
>
> 1. alter_subscription.sgml
>
> ```
> -copy_data (boolean)
> +copy_data (boolean | 
> force)
> ```
>
> I thought that it should be written as enum. For example, huge_pages GUC 
> parameter
> can accept {on, off, try}, and it has been written as enum.

Modified

> 2. create_subscription.sgml
>
> ```
> -copy_data (boolean)
> +copy_data (boolean | 
> force)
> ```
>
> Same as above.

Modified

> 3. create_subscription.sgml
>
> ```
> +
> + 
> +  If the publication tables were also subscribing data in the 
> publisher
> +  from other publishers, it will affect the
> +  CREATE SUBSCRIPTION based on the value specified
> +  for publish_local_only option. Refer to the
> +   for details.
> + 
> ```
>
> I seeked docs, but the words " publication tables " have not seen.
> How about "tables in the publication"?

Modified

> 4. create_subscription.sgml - about your example
>
> In the first section, we should describe about 2-nodes case more detail
> like Amit mentioned in [1]. I thought that Option-3 can be resolved by 
> defining
> subscriptions in both nodes with publish_local_only = true and copy_data = 
> force.

I thought existing information is enough because we have mentioned
that node1 and node2 have bidirectional replication setup done and
both the table data will be replicated and synchronized as and when
the DML operations are happening. In option-3 we need to create a
subscription with copy_data as force to one node and copy_data as
false to another node because both nodes will be having the same data,
copying the data just from one of the nodes should be enough.

Thanks for the comments, the attached v5 patch has the changes for the same.

Regards,
Vignesh
From 2e9b9192a1f485d7b1f3c53170626cb49a31ee1e Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 15 Mar 2022 16:47:30 +0530
Subject: [PATCH v5 1/2] Skip replication of non local data.

Add an option publish_local_only which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=' PUBLICATION pub1 with (publish_local_only = on);
---
 contrib/test_decoding/test_decoding.c |  20 +++
 doc/src/sgml/ref/alter_subscription.sgml  |   5 +-
 doc/src/sgml/ref/create_subscription.sgml |  12 ++
 src/backend/catalog/pg_subscription.c |   1 +
 src/backend/catalog/system_views.sql  |   4 +-
 src/backend/commands/subscriptioncmds.c   |  26 +++-
 .../libpqwalreceiver/libpqwalreceiver.c   |   4 +
 src/backend/replication/logical/decode.c  |  15 +-
 src/backend/replication/logical/logical.c |  33 +
 src/backend/replication/logical/worker.c  |   2 +
 src/backend/replication/pgoutput/pgoutput.c   |  45 ++
 src/bin/pg_dump/pg_dump.c |  13 +-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   8 +-
 src/bin/psql/tab-complete.c 

Re: Handle infinite recursion in logical replication setup

2022-03-15 Thread vignesh C
On Tue, Mar 15, 2022 at 9:55 AM kuroda.hay...@fujitsu.com
 wrote:
>
> Dear Vignesh,
>
> > Thanks for kind explanation.
> > I read above and your doc in 0002, and I put some comments.
>
> I forgot a comment about 0002 doc.
>
> 5. create_subscription.sgml - about your example
>
> Three possibilities were listed in the doc,
> but I was not sure about b) case.
> In the situation Node1 and Node2 have already become multi-master,
> and data has already synced at that time.
> If so, how do we realize that "there is data present only in one Node"?
> Case a) and c) seem reasonable.

Your point is valid, modified.

The changes for the same are available int the v5 patch available at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm3wCf0YcvVo%2BgHMGpupk9K6WKJxCyLUvhPC2GkPKRZUWA%40mail.gmail.com

Regards,
Vignesh




Skipping schema changes in publication

2022-03-22 Thread vignesh C
Hi,

This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;

A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.

Attached patch has the implementation for this.
This feature is for the pg16 version.
Thoughts?

Regards,
Vignesh
From 153e033a78ace66bfbe1cd37db2f3de506740bcb Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Fri, 18 Mar 2022 10:41:35 +0530
Subject: [PATCH v1] Skip publishing the tables of schema.

A new option "SKIP ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more skip schemas to be specified, publisher will skip sending the data
of the tables present in the skip schema to the subscriber.

The new syntax allows specifying schemas. For example:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;

A new column pnskip is added to table "pg_publication_namespace", to maintain
the schemas that the user wants to skip publishing through the publication.
Modified the output plugin (pgoutput) to skip publishing the changes if the
relation is part of skip schema publication.

Updates pg_dump to identify and dump skip schema publications. Updates the \d
family of commands to display skip schema publications and \dRp+ variant will
now display associated skip schemas if any.
---
 doc/src/sgml/catalogs.sgml|   9 ++
 doc/src/sgml/logical-replication.sgml |   7 +-
 doc/src/sgml/ref/alter_publication.sgml   |  28 +++-
 doc/src/sgml/ref/create_publication.sgml  |  28 +++-
 doc/src/sgml/ref/psql-ref.sgml|   5 +-
 src/backend/catalog/pg_publication.c  |  66 +++---
 src/backend/commands/publicationcmds.c| 123 +++---
 src/backend/commands/tablecmds.c  |   2 +-
 src/backend/nodes/copyfuncs.c |  14 ++
 src/backend/nodes/equalfuncs.c|  14 ++
 src/backend/parser/gram.y |  99 +-
 src/backend/replication/pgoutput/pgoutput.c   |  24 +---
 src/backend/utils/cache/relcache.c|  22 +++-
 src/bin/pg_dump/pg_dump.c |  33 -
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/pg_dump/pg_dump_sort.c|   7 +
 src/bin/pg_dump/t/002_pg_dump.pl  |  30 +
 src/bin/psql/describe.c   |  17 +++
 src/bin/psql/tab-complete.c   |  25 ++--
 src/include/catalog/pg_publication.h  |  20 ++-
 .../catalog/pg_publication_namespace.h|   1 +
 src/include/commands/publicationcmds.h|   3 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/parsenodes.h|   1 +
 src/test/regress/expected/publication.out |  84 +++-
 src/test/regress/sql/publication.sql  |  41 +-
 .../t/030_rep_changes_skip_schema.pl  |  96 ++
 src/tools/pgindent/typedefs.list  |   1 +
 28 files changed, 678 insertions(+), 124 deletions(-)
 create mode 100644 src/test/subscription/t/030_rep_changes_skip_schema.pl

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2a8cd02664..18e3cf82aa 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6281,6 +6281,15 @@ SCRAM-SHA-256$:&l
Reference to schema
   
  
+
+
+  
+   pnskip bool
+  
+  
+   True if the schema is skip schema
+  
+ 
 

   
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 555fbd749c..e2a4b89226 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -599,9 +599,10 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
 
   
To add tables to a publication, the user must have ownership rights on the
-   table. To add all tables in schema to a publication, the user must be a
-   superuser. To create a publication that publishes all tables or all tables in
-   schema automatically, the user must be a superuser.
+   table. To add all tables in schema or skip all tables in schema to a
+   publication, the user must be a superuser. To create a publication that
+   publishes all tables or all tables in schema automaticall

Re: Identify missing publications from publisher while create/alter subscription.

2022-03-22 Thread vignesh C
On Tue, Mar 22, 2022 at 5:29 AM Andres Freund  wrote:
>
> On 2022-02-13 19:34:05 +0530, vignesh C wrote:
> > Thanks for the comments, the attached v14 patch has the changes for the 
> > same.
>
> The patch needs a rebase, it currently fails to apply:
> http://cfbot.cputube.org/patch_37_2957.log

The attached v15 patch is rebased on top of HEAD.

Regards,
Vignesh
From fa175c7c823dc9fbcca7676ddec944430da81022 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 22 Mar 2022 15:09:13 +0530
Subject: [PATCH v15] Identify missing publications from publisher while
 create/alter subscription.

Creating/altering subscription is successful when we specify a publication which
does not exist in the publisher. This patch checks if the specified publications
are present in the publisher and throws an error if any of the publication is
missing in the publisher.

Author: Vignesh C
Reviewed-by: Bharath Rupireddy, Japin Li, Dilip Kumar, Euler Taveira, Ashutosh Sharma
Discussion: https://www.postgresql.org/message-id/flat/20220321235957.i4jtjn4wyjucex6b%40alap3.anarazel.de#b846fd4ef657cfaa8c9890f044e4
---
 doc/src/sgml/ref/alter_subscription.sgml  |  13 ++
 doc/src/sgml/ref/create_subscription.sgml |  20 ++-
 src/backend/commands/subscriptioncmds.c   | 201 +++---
 src/bin/psql/tab-complete.c   |  15 +-
 src/test/subscription/t/007_ddl.pl|  54 ++
 5 files changed, 270 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index ac2db249cb..995c1f270d 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -172,6 +172,19 @@ ALTER SUBSCRIPTION name RENAME TO <
  
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publications doesn't exist. The default is
+  false.
+ 
+
+   
+
   
 

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index b701752fc9..f2e7e8744d 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -110,12 +110,14 @@ CREATE SUBSCRIPTION subscription_nametrue.  Setting this to
   false will force the values of
-  create_slot, enabled and
-  copy_data to false.
+  create_slot, enabled,
+  copy_data and
+  validate_publication to false.
   (You cannot combine setting connect
   to false with
   setting create_slot, enabled,
-  or copy_data to true.)
+  copy_data or
+  validate_publication to true.)
  
 
  
@@ -170,6 +172,18 @@ CREATE SUBSCRIPTION subscription_name
 

+
+   
+validate_publication (boolean)
+
+ 
+  When true, the command verifies if all the specified publications
+  that are being subscribed to are present in the publisher and throws
+  an error if any of the publications doesn't exist. The default is
+  false.
+ 
+
+   
   
  
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index e16f04626d..6c066a1dfc 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -64,6 +64,7 @@
 #define SUBOPT_TWOPHASE_COMMIT		0x0200
 #define SUBOPT_DISABLE_ON_ERR		0x0400
 #define SUBOPT_LSN	0x0800
+#define SUBOPT_VALIDATE_PUB			0x1000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -86,6 +87,7 @@ typedef struct SubOpts
 	bool		streaming;
 	bool		twophase;
 	bool		disableonerr;
+	bool		validate_publication;
 	XLogRecPtr	lsn;
 } SubOpts;
 
@@ -137,6 +139,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->twophase = false;
 	if (IsSet(supported_opts, SUBOPT_DISABLE_ON_ERR))
 		opts->disableonerr = false;
+	if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB))
+		opts->validate_publication = false;
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -292,6 +296,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB) &&
+ strcmp(defel->defname, "validate_publication") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_VALIDATE_PUB))
+errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_VALIDATE_PUB;
+			opts->validate_publication = defGetBoolean(defel);
+		}
 		else
 			ereport(ERROR,
 	(errcode(E

Re: Printing backtrace of postgres processes

2020-12-01 Thread vignesh C
On Tue, Dec 1, 2020 at 9:31 AM Tom Lane  wrote:
>
> Andres Freund  writes:
> > It should be quite doable to emit such backtraces directly to stderr,
> > instead of using appendStringInfoString()/elog().
>
> No, please no.
>
> (1) On lots of logging setups (think syslog), anything that goes to
> stderr is just going to wind up in the bit bucket.  I realize that
> we have that issue already for memory context dumps on OOM errors,
> but that doesn't make it a good thing.
>
> (2) You couldn't really write "to stderr", only to fileno(stderr),
> creating issues about interleaving of the output with regular stderr
> output.  For instance it's quite likely that the backtrace would
> appear before stderr output that had actually been emitted earlier,
> which'd be tremendously confusing.
>
> (3) This isn't going to do anything good for my concerns about interleaved
> output from different processes, either.
>

I felt if we are not agreeing on logging on the stderr, even using
static buffer we might not be able to log as
send_message_to_server_log calls appendStringInfo. I felt that doing
it from CHECK_FOR_INTERRUPTS may be better.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Added missing copy related data structures to typedefs.list

2020-12-07 Thread vignesh C
Hi,

Added missing copy related data structures to typedefs.list, these
data structures were added while copy files were split during the
recent commit. I found this while running pgindent for parallel copy
patches.
The Attached patch has the changes for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From d656c2f0bfbf68f5ceb98a0eb205e5e77f21602f Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 7 Dec 2020 13:48:31 +0530
Subject: [PATCH] Added missing copy related data structures.

Added missing copy related data structures to typedefs.list, these data
structures were added while copy files were split during the recent commit.
---
 src/tools/pgindent/typedefs.list | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 974b138..ecf9c3e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -421,12 +421,18 @@ ConversionLocation
 ConvertRowtypeExpr
 CookedConstraint
 CopyDest
+CopyFormatOptions
+CopyFromState
+CopyFromStateData
 CopyInsertMethod
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
+CopySource
 CopyState
 CopyStateData
 CopyStmt
+CopyToState
+CopyToStateData
 Cost
 CostSelector
 Counters
-- 
1.8.3.1



Re: Printing backtrace of postgres processes

2020-12-08 Thread vignesh C
On Tue, Dec 1, 2020 at 2:15 PM vignesh C  wrote:
>
> On Tue, Dec 1, 2020 at 9:31 AM Tom Lane  wrote:
> >
> > Andres Freund  writes:
> > > It should be quite doable to emit such backtraces directly to stderr,
> > > instead of using appendStringInfoString()/elog().
> >
> > No, please no.
> >
> > (1) On lots of logging setups (think syslog), anything that goes to
> > stderr is just going to wind up in the bit bucket.  I realize that
> > we have that issue already for memory context dumps on OOM errors,
> > but that doesn't make it a good thing.
> >
> > (2) You couldn't really write "to stderr", only to fileno(stderr),
> > creating issues about interleaving of the output with regular stderr
> > output.  For instance it's quite likely that the backtrace would
> > appear before stderr output that had actually been emitted earlier,
> > which'd be tremendously confusing.
> >
> > (3) This isn't going to do anything good for my concerns about interleaved
> > output from different processes, either.
> >
>
> I felt if we are not agreeing on logging on the stderr, even using
> static buffer we might not be able to log as
> send_message_to_server_log calls appendStringInfo. I felt that doing
> it from CHECK_FOR_INTERRUPTS may be better.
>

I have implemented printing of backtrace based on handling it in
CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
getting backtrace of any particular process based on the suggestions.
Attached patch has the implementation for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From ffcd85f310dcedbe21f78ed4a7f7b281dc4d6ad8 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 22 Nov 2020 05:58:24 +0530
Subject: [PATCH] Print backtrace of postgres process that are part of this
 instance.

The idea here is to implement & expose pg_print_callstack function, internally
what this function does is, the connected backend will send SIGUSR1 signal by
setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Postmaster process
will send SIGUSR1 signal to process by setting PROCSIG_BACKTRACE_PRINT if the
process that have access to ProcSignal. As syslogger process & Stats process
don't have access to ProcSignal, multiplexing with SIGUSR1 is not possible
for these processes, hence SIGUSR2 signal will be sent for these process.
Once the process receives this signal it will log the backtrace of the process.
---
 src/backend/postmaster/autovacuum.c   |  4 ++
 src/backend/postmaster/checkpointer.c |  5 +++
 src/backend/postmaster/interrupt.c|  5 +++
 src/backend/postmaster/pgstat.c   | 34 -
 src/backend/postmaster/postmaster.c   | 29 ++
 src/backend/postmaster/syslogger.c| 30 ++-
 src/backend/storage/ipc/procsignal.c  | 50 
 src/backend/storage/ipc/signalfuncs.c | 67 
 src/backend/tcop/postgres.c   | 38 ++
 src/backend/utils/init/globals.c  |  1 +
 src/bin/pg_ctl/t/005_backtrace.pl | 72 +++
 src/include/catalog/pg_proc.dat   |  9 -
 src/include/miscadmin.h   |  2 +
 src/include/storage/pmsignal.h|  2 +
 src/include/storage/procsignal.h  |  4 ++
 src/include/tcop/tcopprot.h   |  1 +
 16 files changed, 350 insertions(+), 3 deletions(-)
 create mode 100644 src/bin/pg_ctl/t/005_backtrace.pl

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index aa5b97f..597f14b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -832,6 +832,10 @@ HandleAutoVacLauncherInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Process printing back trace */
+	if (PrintBacktracePending)
+		LogBackTrace();
+
 	/* Process sinval catchup interrupts that happened while sleeping */
 	ProcessCatchupInterrupt();
 }
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 429c801..5f9501b 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -57,6 +57,7 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
@@ -547,6 +548,10 @@ HandleCheckpointerInterrupts(void)
 	if (ProcSignalBarrierPending)
 		ProcessProcSignalBarrier();
 
+	/* Process printing back trace */
+	if (PrintBacktracePending)
+		LogBackTrace();
+
 	if (ConfigReloadPending)
 	{
 		ConfigReloadPending = false;
diff --git a/src/backend/postmaster/interrupt.c b/src/backend/postmaster/interrupt.c
index ee7dbf9..d341

Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-08 Thread vignesh C
On Mon, Dec 7, 2020 at 2:35 PM Greg Nancarrow  wrote:
>
> On Fri, Nov 20, 2020 at 7:44 PM Greg Nancarrow  wrote:
> >
> > Posting an updated set of patches, with some additional testing and
> > documentation updates, and including the latest version of the
> > Parallel Insert patch.
> > Any feedback appreciated, especially on the two points mentioned in
> > the previous post.
> >
>
> Posting an updated set of patches, since a minor bug was found in the
> 1st patch that was causing a postgresql-cfbot build failure.
>

Most of the code present in
v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
applicable for parallel copy patch also. The patch in this thread
handles the check for PROPARALLEL_UNSAFE, we could slightly make it
generic by handling like the comments below, that way this parallel
safety checks can be used based on the value set in
max_parallel_hazard_context. There is nothing wrong with the changes,
I'm providing these comments so that this patch can be generalized for
parallel checks and the same can also be used by parallel copy.
Few comments:
1)
+   trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+   if (trigtype == RI_TRIGGER_FK)
+   {
+   context->max_hazard = PROPARALLEL_RESTRICTED;
+
+   /*
+* As we're looking for the max parallel
hazard, we don't break
+* here; examine any further triggers ...
+*/
+   }

Can we change this something like:
trigtype = RI_FKey_trigger_type(trigger->tgfoid);
if (trigtype == RI_TRIGGER_FK)
{
if(max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context)
break;
}

This below line is not required as it will be taken care by
max_parallel_hazard_test.
context->max_hazard = PROPARALLEL_RESTRICTED;

2)
+   /* Recursively check each partition ... */
+   pdesc = RelationGetPartitionDesc(rel);
+   for (i = 0; i < pdesc->nparts; i++)
+   {
+   if (rel_max_parallel_hazard_for_modify(pdesc->oids[i],
+
command_type,
+
context,
+
AccessShareLock) == PROPARALLEL_UNSAFE)
+   {
+   table_close(rel, lockmode);
+   return context->max_hazard;
+   }
+   }


Can we change this something like:
/* Recursively check each partition ... */
pdesc = RelationGetPartitionDesc(rel);
for (i = 0; i < pdesc->nparts; i++)
{
char max_hazard = rel_max_parallel_hazard_for_modify(pdesc->oids[i],

command_type,

context,

AccessShareLock);

if(max_parallel_hazard_test(max_hazard, context)
{
table_close(rel, lockmode);
return context->max_hazard;
}
}

3)
Similarly for the below:
+   /*
+* If there are any index expressions, check that they are parallel-mode
+* safe.
+*/
+   if (index_expr_max_parallel_hazard_for_modify(rel, context) ==
PROPARALLEL_UNSAFE)
+   {
+   table_close(rel, lockmode);
+   return context->max_hazard;
+   }
+
+   /*
+* If any triggers exist, check that they are parallel safe.
+*/
+   if (rel->trigdesc != NULL &&
+   trigger_max_parallel_hazard_for_modify(rel->trigdesc,
context) == PROPARALLEL_UNSAFE)
+   {
+   table_close(rel, lockmode);
+   return context->max_hazard;
+   }


4) Similar change required for the below:
+   /*
+* If the column is of a DOMAIN type,
determine whether that
+* domain has any CHECK expressions that are
not parallel-mode
+* safe.
+*/
+   if (get_typtype(att->atttypid) == TYPTYPE_DOMAIN)
+   {
+   if
(domain_max_parallel_hazard_for_modify(att->atttypid, context) ==
PROPARALLEL_UNSAFE)
+   {
+   table_close(rel, lockmode);
+   return context->max_hazard;
+   }
+   }

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-12-09 Thread vignesh C
On Mon, Dec 7, 2020 at 3:00 PM Hou, Zhijie  wrote:
>
> > Attached v11 patch has the fix for this, it also includes the changes to
> > rebase on top of head.
>
> Thanks for the explanation.
>
> I think there is still chances we can know the size.
>
> +* line_size will be set. Read the line_size again to be sure 
> if it is
> +* completed or partial block.
> +*/
> +   dataSize = pg_atomic_read_u32(&lineInfo->line_size);
> +   if (dataSize != -1)
> +   {
>
> If I am not wrong, this seems the branch that procsssing the populated block.
> I think we can check the copiedSize here, if copiedSize == 0, that means
> Datasizes is the size of the whole line and in this case we can do the 
> enlarge.
>
>

Yes this optimization can be done, I will handle this in the next patch set.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel INSERT (INTO ... SELECT ...)

2020-12-09 Thread vignesh C
On Wed, Dec 9, 2020 at 10:11 AM Greg Nancarrow  wrote:
>
> On Wed, Dec 9, 2020 at 1:35 AM vignesh C  wrote:
> >
> > Most of the code present in
> > v9-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch is
> > applicable for parallel copy patch also. The patch in this thread
> > handles the check for PROPARALLEL_UNSAFE, we could slightly make it
> > generic by handling like the comments below, that way this parallel
> > safety checks can be used based on the value set in
> > max_parallel_hazard_context. There is nothing wrong with the changes,
> > I'm providing these comments so that this patch can be generalized for
> > parallel checks and the same can also be used by parallel copy.
>
> Hi Vignesh,
>
> You are absolutely right in pointing that out, the code was taking
> short-cuts knowing that for Parallel Insert,
> "max_parallel_hazard_context.max_interesting" had been set to
> PROPARALLEL_UNSAFE, which doesn't allow that code to be generically
> re-used by other callers.
>
> I've attached a new set of patches that includes your suggested improvements.
>

Thanks for fixing and posting a new patch.
Few comments:
+   Node   *index_expr;
+
+   if (index_expr_item == NULL)
 /* shouldn't happen */
+   elog(ERROR, "too few
entries in indexprs list");
+
+   index_expr = (Node *)
lfirst(index_expr_item);

We can change this elog to below to maintain consistency:
if (index_expr_item == NULL)/* shouldn't happen */
{
  context->max_hazard = PROPARALLEL_UNSAFE;
  return context->max_hazard;
}

static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
/*
* To allow parallel inserts, we need to ensure that they are safe to be
* performed in workers. We have the infrastructure to allow parallel
* inserts in general except for the cases where inserts generate a new
* CommandId (eg. inserts into a table having a foreign key column).
*/
I felt we could remove the above comments or maybe rephrase it.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-23 Thread vignesh C
On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
 wrote:
>
> On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> Attaching v14 patch set that has above changes. Please consider this
> for further review.
>

Few comments:
In the below case, should create be above Gather?
postgres=# explain  create table t7 as select * from t6;
QUERY PLAN
---
 Gather  (cost=0.00..9.17 rows=0 width=4)
   Workers Planned: 2
 ->  Create t7
   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
(4 rows)

Can we change it to something like:
---
Create t7
 -> Gather  (cost=0.00..9.17 rows=0 width=4)
  Workers Planned: 2
  ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
(4 rows)

You could change intoclause_len = strlen(intoclausestr) to
strlen(intoclausestr) + 1 and use intoclause_len in the remaining
places. We can avoid the +1 in the other places.
+   /* Estimate space for into clause for CTAS. */
+   if (IS_CTAS(intoclause) && OidIsValid(objectid))
+   {
+   intoclausestr = nodeToString(intoclause);
+   intoclause_len = strlen(intoclausestr);
+   shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+   shm_toc_estimate_keys(&pcxt->estimator, 1);
+   }

Can we use  node->nworkers_launched == 0 in place of
node->need_to_scan_locally, that way the setting and resetting of
node->need_to_scan_locally can be removed. Unless need_to_scan_locally
is needed in any of the functions that gets called.
+   /* Enable leader to insert in case no parallel workers were launched. */
+   if (node->nworkers_launched == 0)
+   node->need_to_scan_locally = true;
+
+   /*
+* By now, for parallel workers (if launched any), would have
started their
+* work i.e. insertion to target table. In case the leader is chosen to
+* participate for parallel inserts in CTAS, then finish its
share before
+* going to wait for the parallel workers to finish.
+*/
+   if (node->need_to_scan_locally)
+   {

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-24 Thread vignesh C
On Thu, Dec 24, 2020 at 11:29 AM Amit Kapila 
wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C  wrote:
> >
> > On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> > > Attaching v14 patch set that has above changes. Please consider this
> > > for further review.
> > >
> >
> > Few comments:
> > In the below case, should create be above Gather?
> > postgres=# explain  create table t7 as select * from t6;
> > QUERY PLAN
> > ---
> >  Gather  (cost=0.00..9.17 rows=0 width=4)
> >Workers Planned: 2
> >  ->  Create t7
> >->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > (4 rows)
> >
> > Can we change it to something like:
> > ---
> > Create t7
> >  -> Gather  (cost=0.00..9.17 rows=0 width=4)
> >   Workers Planned: 2
> >   ->  Parallel Seq Scan on t6  (cost=0.00..9.17 rows=417 width=4)
> > (4 rows)
> >
>
> I think it is better to have it in a way as in the current patch
> because that reflects that we are performing insert/create below
> Gather which is the purpose of this patch. I think this is similar to
> what the Parallel Insert patch [1] has for a similar plan.
>
>
> [1] - https://commitfest.postgresql.org/31/2844/
>

Also another thing that I felt was that actually the Gather nodes will
actually do the insert operation, the Create table will be done earlier
itself. Should we change Create table to Insert table something like below:
 QUERY PLAN
---
 Gather  (cost=0.00..9.17 rows=0 width=4)
   Workers Planned: 2
 ->  *Insert table2 **(instead of Create table2)*
   ->  Parallel Seq Scan on table1  (cost=0.00..9.17 rows=417 width=4)

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Added missing copy related data structures to typedefs.list

2020-12-26 Thread vignesh C
On Thu, Dec 17, 2020 at 4:28 AM Bruce Momjian  wrote:
>
> On Mon, Dec  7, 2020 at 01:56:50PM +0530, vignesh C wrote:
> > Hi,
> >
> > Added missing copy related data structures to typedefs.list, these
> > data structures were added while copy files were split during the
> > recent commit. I found this while running pgindent for parallel copy
> > patches.
> > The Attached patch has the changes for the same.
> > Thoughts?
>
> Uh, we usually only update the typedefs file before we run pgindent on
> the master branch.
>

Ok, Thanks for the clarification. I was not sure as in few of the
enhancements it was included as part of the patches.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-12-26 Thread vignesh C
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie  wrote:
>
> Hi
>
> > Yes this optimization can be done, I will handle this in the next patch
> > set.
> >
>
> I have a suggestion for the parallel safety-check.
>
> As designed, The leader does not participate in the insertion of data.
> If User use (PARALLEL 1), there is only one worker process which will do the 
> insertion.
>
> IMO, we can skip some of the safety-check in this case, becase the 
> safety-check is to limit parallel insert.
> (except temporary table or ...)
>
> So, how about checking (PARALLEL 1) separately ?
> Although it looks a bit complicated, But (PARALLEL 1) do have a good 
> performance improvement.
>

Thanks for the comments Hou Zhijie, I will run a few tests with 1
worker and try to include this in the next patch set.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-26 Thread vignesh C
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
 wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C  wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > +   /* Estimate space for into clause for CTAS. */
> > +   if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > +   {
> > +   intoclausestr = nodeToString(intoclause);
> > +   intoclause_len = strlen(intoclausestr);
> > +   shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 
> > 1);
> > +   shm_toc_estimate_keys(&pcxt->estimator, 1);
> > +   }
>
> Done.
>
> > Can we use  node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > +   /* Enable leader to insert in case no parallel workers were 
> > launched. */
> > +   if (node->nworkers_launched == 0)
> > +   node->need_to_scan_locally = true;
> > +
> > +   /*
> > +* By now, for parallel workers (if launched any), would have
> > started their
> > +* work i.e. insertion to target table. In case the leader is 
> > chosen to
> > +* participate for parallel inserts in CTAS, then finish its
> > share before
> > +* going to wait for the parallel workers to finish.
> > +*/
> > +   if (node->need_to_scan_locally)
> > +   {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.
>

+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment  "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"

+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.

+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ln text;
+begin
+for ln in
+execute format('explain (analyze, costs off, summary off,
timing off) %s',
+$1)
+loop
+ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
+ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
+ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
+return next ln;
+end loop;
+end;
+$$;

The above function is same as function present in partition_prune.sql:
create function explain_parallel_append(text) returns setof text
language plpgsql as
$$
declare
ln text;
begin
for ln in
execute format('explain (analyze, costs off, summary off,
timing off) %s',
$1)
loop
ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
return next ln;
end loop;
end;
$$;

If possible try to make a common function for both and use.

+   if (intoclausestr && OidIsValid(objectid))
+   fpes->objectid = objectid;
+   else
+   fpes->objectid = InvalidOid;
Here OidIsValid(objectid) check is not required intoclausestr will be
set only if OidIsValid.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread vignesh C
On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
>
> On Sat, Dec 26, 2020 at 9:20 PM vignesh C  wrote:
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment  "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on normal table"
> >
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create unlogged table parallel_write as select length(stringu1) from
tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on unlogged table"
> > Similar comment need to be handled in other places also.
>
> I think the existing comments look fine. The info like table type and
> the Query CTAS or CMV is visible by looking at the test case. What I
> wanted from the comments is whether we support parallel inserts or not
> and if not why so that it will be easy to read. I tried to keep it as
> succinctly as possible.
>

I saw few inconsistencies in the patch:

*+-- parallel inserts must occur*+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+  explain_pictas


*+-- parallel inserts must not occur as the table is temporary*+select
explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from
tenk1;');
+  explain_pictas


*+-- parallel inserts must occur, as there is init plan that gets executed
by+-- each parallel worker*
+select explain_pictas(
+'create table parallel_write as select two col1,
+(select two from (select * from tenk2) as tt limit 1) col2
+from tenk1  where tenk1.four = 3;');
+ explain_pictas


*+-- the top node is Gather under which merge join happens, so parallel
inserts+-- must occur*
+set enable_nestloop to off;
+set enable_mergejoin to on;


*+-- parallel hash join happens under Gather node, so parallel inserts must
occur*+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(

Test comments are detailed in a few cases and in few others it is not
detailed for similar kinds of parallelism selected tests. I felt we could
make the test comments consistent across the file.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-12-28 Thread vignesh C
On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila  wrote:
>
> On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas  wrote:
> >
> > On 02/11/2020 08:14, Amit Kapila wrote:
> > > On Fri, Oct 30, 2020 at 10:11 PM Heikki Linnakangas  
> > > wrote:
> > >>
> > >> In this design, you don't need to keep line boundaries in shared memory,
> > >> because each worker process is responsible for finding the line
> > >> boundaries of its own block.
> > >>
> > >> There's a point of serialization here, in that the next block cannot be
> > >> processed, until the worker working on the previous block has finished
> > >> scanning the EOLs, and set the starting position on the next block,
> > >> putting it in READY state. That's not very different from your patch,
> > >> where you had a similar point of serialization because the leader
> > >> scanned the EOLs,
> > >
> > > But in the design (single producer multiple consumer) used by the
> > > patch the worker doesn't need to wait till the complete block is
> > > processed, it can start processing the lines already found. This will
> > > also allow workers to start much earlier to process the data as it
> > > doesn't need to wait for all the offsets corresponding to 64K block
> > > ready. However, in the design where each worker is processing the 64K
> > > block, it can lead to much longer waits. I think this will impact the
> > > Copy STDIN case more where in most cases (200-300 bytes tuples) we
> > > receive line-by-line from client and find the line-endings by leader.
> > > If the leader doesn't find the line-endings the workers need to wait
> > > till the leader fill the entire 64K chunk, OTOH, with current approach
> > > the worker can start as soon as leader is able to populate some
> > > minimum number of line-endings
> >
> > You can use a smaller block size.
> >
>
> Sure, but the same problem can happen if the last line in that block
> is too long and we need to peek into the next block. And then there
> could be cases where a single line could be greater than 64K.
>
> > However, the point of parallel copy is
> > to maximize bandwidth.
> >
>
> Okay, but this first-phase (finding the line boundaries) can anyway be
> not done in parallel and we have seen in some of the initial
> benchmarking that this initial phase is a small part of work
> especially when the table has indexes, constraints, etc. So, I think
> it won't matter much if this splitting is done in a single process or
> multiple processes.
>

I wrote a patch to compare the performance of the current
implementation leader identifying the line bound design vs the workers
identifying the line boundary. The results of the same is given below:
The below data can be read as parallel copy time taken in seconds
based on the leader identifying the line boundary design, parallel
copy time taken in seconds based on the workers identifying the line
boundary design, workers.

Use case 1 - 10million rows, 5.2GB data,3 indexes on integer columns:
(211.206, 632.583, 1), (165.402, 360.152, 2), (137.608, 219.623, 4),
(128.003, 206.851, 8), (114.518, 177.790, 16), (109.257, 170.058, 20),
(102.050, 158.376, 30)

Use case 2 - 10million rows, 5.2GB data,2 indexes on integer columns,
1 index on text column, csv file:
(1212.356, 1602.118, 1), (707.191, 849.105, 2), (369.620, 441.068, 4),
(221.359, 252.775, 8), (167.152, 180.207, 16), (168.804, 181.986, 20),
(172.320, 194.875, 30)

Use case 3 - 10million rows, 5.2GB data without index:
(96.317, 437.453, 1), (70.730, 240.517, 2), (64.436, 197.604, 4),
(67.186, 175.630, 8), (76.561, 156.015, 16), (81.025, 150.687, 20),
(86.578, 148.481, 30)

Use case 4 - 1 records, 9.6GB, toast data:
(147.076, 276.323, 1), (101.610, 141.893, 2), (100.703, 134.096, 4),
(112.583, 134.765, 8), (101.898, 135.789, 16), (109.258, 135.625, 20),
(109.219, 136.144, 30)

Attached is a patch that was used for the same. The patch is written
on top of the parallel copy patch.
The design Amit, Andres & myself voted for that is the leader
identifying the line bound design and sharing it in shared memory is
performing better.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From dd9b6be19573b5391d01373b53e64a5c1dc305fd Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 28 Dec 2020 15:00:48 +0530
Subject: [PATCH v12 7/7] Parallel copy based on workers identifying line
 boundary.

Parallel copy based on workers identifying line boundary.
---
 src/backend/commands/copyfromparse.c |  93 +++
 src/backend/commands/copyparallel.c  | 441 +--
 src/include/commands/copyfrom

Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread vignesh C
On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
> > I have completed reviewing 0001, I don't have more comments, just one
> > question.  Soon I will review the remaining patches.
>
> Thanks.
>
> > +/* If parallel inserts are to be allowed, set a few extra information. 
> > */
> > +if (myState->is_parallel)
> > +{
> > +myState->object_id = intoRelationAddr.objectId;
> > +
> > +/*
> > + * We don't need to skip contacting FSM while inserting tuples for
> > + * parallel mode, while extending the relations, workers instead of
> > + * blocking on a page while another worker is inserting, can check 
> > the
> > + * FSM for another page that can accommodate the tuples. This 
> > results
> > + * in major benefit for parallel inserts.
> > + */
> > +myState->ti_options = 0;
> >
> > Is there any performance data for this or just theoretical analysis?
>
> I have seen that we don't get much performance with the skip fsm
> option, though I don't have the data to back it up. I'm planning to
> run performance tests after the patches 0001, 0002 and 0003 get
> reviewed. I will capture the data at that time. Hope that's fine.
>

When you run the performance tests, you can try to capture and publish
relation size & the number of pages that are getting created for base
table and the CTAS table, you can use something like SELECT relpages
FROM pg_class WHERE relname = 'tablename &  SELECT
pg_total_relation_size('tablename'). Just to make sure that there is
no significant difference between the base table and CTAS table.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread vignesh C
On Wed, Dec 30, 2020 at 9:25 AM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu  wrote:
> > w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
> >
> > + * Push the dest receiver to Gather node when it is either at the top of 
> > the
> > + * plan or under top Append node unless it does not have any projections 
> > to do.
> >
> > I think the 'unless' should be 'if'. As can be seen from the body of the 
> > method:
> >
> > +   if (!ps->ps_ProjInfo)
> > +   {
> > +   GatherState *gstate = (GatherState *) ps;
> > +
> > +   parallel = true;
>
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
>
> Please consider v18 patch set for further review.
>

Few comments:
-   /*
-* To allow parallel inserts, we need to ensure that they are safe to be
-* performed in workers. We have the infrastructure to allow parallel
-* inserts in general except for the cases where inserts generate a new
-* CommandId (eg. inserts into a table having a foreign key column).
-*/
-   if (IsParallelWorker())
-   ereport(ERROR,
-   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
-errmsg("cannot insert tuples in a
parallel worker")));

Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.

+   Oid objectid;   /* workers to
open relation/table.  */
+   /* Number of tuples inserted by all the workers. */
+   pg_atomic_uint64processed;

We can just mention relation instead of relation/table.

+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+  explain_pictas
+--
+ Gather (actual rows=N loops=N)
+   Workers Planned: 4
+   Workers Launched: N
+ ->  Create parallel_write
+   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;

Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id  in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added missing copy related data structures to typedefs.list

2021-01-04 Thread vignesh C
On Wed, Dec 30, 2020 at 7:10 PM Amit Kapila  wrote:
>
> On Sat, Dec 26, 2020 at 9:16 PM vignesh C  wrote:
> >
> > On Thu, Dec 17, 2020 at 4:28 AM Bruce Momjian  wrote:
> > >
> > > On Mon, Dec  7, 2020 at 01:56:50PM +0530, vignesh C wrote:
> > > > Hi,
> > > >
> > > > Added missing copy related data structures to typedefs.list, these
> > > > data structures were added while copy files were split during the
> > > > recent commit. I found this while running pgindent for parallel copy
> > > > patches.
> > > > The Attached patch has the changes for the same.
> > > > Thoughts?
> > >
> > > Uh, we usually only update the typedefs file before we run pgindent on
> > > the master branch.
> > >
> >
> > Ok, Thanks for the clarification. I was not sure as in few of the
> > enhancements it was included as part of the patches.
> >
>
> Yeah, I do that while committing patches that require changes in
> typedefs. It is not a norm and I am not sure how much value it adds to
> do it separately for the missing ones unless you are making changes in
> the same file they are used and you are facing unrelated diffs due to
> those missing ones.

I found this while I was running pgindent for parallel copy patches. I
was not sure if this change was left out intentionally or by mistake.
I'm fine if it is committed separately or together at a later point.
It is not a major problem for my patch since I know the change, I will
do the required adjustment when I make changes on top of it, if it is
not getting committed. But I felt we can commit this since it is a
recent change.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread vignesh C
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> > Few comments:
> > -   /*
> > -* To allow parallel inserts, we need to ensure that they are safe 
> > to be
> > -* performed in workers. We have the infrastructure to allow 
> > parallel
> > -* inserts in general except for the cases where inserts generate a 
> > new
> > -* CommandId (eg. inserts into a table having a foreign key column).
> > -*/
> > -   if (IsParallelWorker())
> > -   ereport(ERROR,
> > -   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -errmsg("cannot insert tuples in a
> > parallel worker")));
> >
> > Is it possible to add a check if it is a CTAS insert here as we do not
> > support insert in parallel workers from others as of now.
>
> Currently, there's no global variable in which we can selectively skip
> this in case of parallel insertion in CTAS. How about having a
> variable in any of the worker global contexts, set that when parallel
> insertion is chosen for CTAS and use that in heap_prepare_insert() to
> skip the above error? Eventually, we can remove this restriction
> entirely in case we fully allow parallelism for INSERT INTO SELECT,
> CTAS, and COPY.
>
> Thoughts?

Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.

>
> > +   Oid objectid;   /* workers to
> > open relation/table.  */
> > +   /* Number of tuples inserted by all the workers. */
> > +   pg_atomic_uint64processed;
> >
> > We can just mention relation instead of relation/table.
>
> I will modify it in the next patch set.
>
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +  explain_pictas
> > +--
> > + Gather (actual rows=N loops=N)
> > +   Workers Planned: 4
> > +   Workers Launched: N
> > + ->  Create parallel_write
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > +(5 rows)
> > +
> > +select count(*) from parallel_write;
> >
> > Can we include selection of cmin, xmin for one of the test to verify
> > that it uses the same transaction id  in the parallel workers
> > something like:
> > select distinct(cmin,xmin) from parallel_write;
>
> This is not possible since cmin and xmin are dynamic, we can not use
> them in test cases. I think it's not necessary to check whether the
> leader and workers are in the same txn or not, since we are not
> creating a new txn. All the txn state from the leader is serialized in
> SerializeTransactionState and restored in
> StartParallelWorkerTransaction.
>

I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Added schema level support for publication.

2021-01-07 Thread vignesh C
Hi,

This feature adds schema option while creating publication. Users will
be able to specify one or more schemas while creating publication,
when the user specifies schema option, then the data changes for the
tables present in the schema specified by the user will be replicated
to the subscriber. Few examples have been listed below:

Create a publication that publishes all changes for all the tables
present in production schema:
CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;

Create a publication that publishes all changes for all the tables
present in marketing and sales schemas:
CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;

Add some schemas to the publication:
ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;

Drop some schema from the publication:
ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;

Attached is a POC patch for the same. I felt this feature would be quite useful.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From f9b5134182229f718bbc1a9162b6043f879a6410 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 7 Jan 2021 11:38:17 +0530
Subject: [PATCH v1] Added schema level support for publication.

This patch adds schema level support for publication along with for all tables.
User can specify multiple schemas with schema option. When user specifies
schema option, then the tables present in the schema specified will be selected
by publisher for sending the data to subscriber.
---
 doc/src/sgml/ref/alter_publication.sgml   |  32 ++
 doc/src/sgml/ref/create_publication.sgml  |  33 +-
 src/backend/catalog/pg_publication.c  |  55 +-
 src/backend/commands/publicationcmds.c| 172 +-
 src/backend/parser/gram.y |  38 ++-
 src/bin/pg_dump/pg_dump.c |  22 +++-
 src/bin/pg_dump/pg_dump.h |   1 +
 src/bin/psql/describe.c   |   6 +-
 src/include/catalog/pg_publication.h  |   9 +-
 src/include/nodes/parsenodes.h|   2 +
 src/test/regress/expected/publication.out |  36 +++
 11 files changed, 376 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index c2946df..c00b8da 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -24,6 +24,9 @@ PostgreSQL documentation
 ALTER PUBLICATION name ADD TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name SET TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name DROP TABLE [ ONLY ] table_name [ * ] [, ...]
+ALTER PUBLICATION name ADD SCHEMA schema_name [, ...]
+ALTER PUBLICATION name SET SCHEMA schema_name [, ...]
+ALTER PUBLICATION name DROP SCHEMA schema_name [, ...]
 ALTER PUBLICATION name SET ( publication_parameter [= value] [, ... ] )
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
@@ -97,6 +100,15 @@ ALTER PUBLICATION name RENAME TO 
 

+schema_name
+
+ 
+  Name of an existing schema.
+ 
+
+   
+
+   
 SET ( publication_parameter [= value] [, ... ] )
 
  
@@ -141,6 +153,26 @@ ALTER PUBLICATION noinsert SET (publish = 'update, delete');
 
 ALTER PUBLICATION mypublication ADD TABLE users, departments;
 
+
+  
+   Add some schemas to the publication:
+
+ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
+
+  
+
+  
+   Drop some schema from the publication:
+
+ALTER PUBLICATION production_quarterly_publication DROP SCHEMA production_july;
+
+  
+
+  
+   Set schema to the publication:
+
+ALTER PUBLICATION production_publication SET SCHEMA production_july;
+
  
 
  
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index ff82fbc..c941643 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -22,8 +22,8 @@ PostgreSQL documentation
  
 
 CREATE PUBLICATION name
-[ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
-  | FOR ALL TABLES ]
+[ FOR TABLE [ ONLY ] table_name [ * ] [, ... ]
+  | FOR ALL TABLES [ SCHEMA schema_name [, ... ] ] ]
 [ WITH ( publication_parameter [= value] [, ... ] ) ]
 
  
@@ -100,6 +100,19 @@ CREATE PUBLICATION name

 

+SCHEMA
+
+ 
+  Specifies the list of schema that should be added to the publication.
+  If SCHEMA is specified, then the tables present in the
+  specified schema list is selected and added to the publication.  The rest
+  of the tables will be skipped. This option should be specified
+  with FOR ALL TABLES option.
+ 
+
+   
+
+   
 WITH ( publication_parameter [= value] [, ... ] )
 
  
@@ -222,6 +235,22 @@ CREATE PUBLICATION alltables FOR ALL TABLES;
 
 CREATE PUBLICATION insert_only

Re: Parallel INSERT (INTO ... SELECT ...)

2021-01-08 Thread vignesh C
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow  wrote:
>
> Posting an updated set of patches to address recent feedback:
>
> - Removed conditional-locking code used in parallel-safety checking
> code (Tsunakawa-san feedback). It turns out that for the problem test
> case, no parallel-safety checking should be occurring that locks
> relations because those inserts are specifying VALUES, not an
> underlying SELECT, so the parallel-safety checking code was updated to
> bail out early if no underlying SELECT is specified for the INSERT. No
> change to the test code was required.
> - Added comment to better explain the reason for treating "INSERT ...
> ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
> - Added assertion to heap_prepare_insert() (Amit)
> - Updated error handling for NULL index_expr_item case (Vignesh)

Thanks Greg for fixing and posting a new patch.
Few comments:
+-- Test INSERT with underlying query.
+-- (should create plan with parallel SELECT, Gather parent node)
+--
+explain(costs off) insert into para_insert_p1 select unique1,
stringu1 from tenk1;
+   QUERY PLAN
+
+ Insert on para_insert_p1
+   ->  Gather
+ Workers Planned: 4
+ ->  Parallel Seq Scan on tenk1
+(4 rows)
+
+insert into para_insert_p1 select unique1, stringu1 from tenk1;
+select count(*), sum(unique1) from para_insert_p1;
+ count |   sum
+---+--
+ 1 | 49995000
+(1 row)
+

For one of the test you can validate that the same transaction has
been used by all the parallel workers, you could use something like
below to validate:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  para_insert_p1) as dt;

Few includes are not required:
 #include "executor/nodeGather.h"
+#include "executor/nodeModifyTable.h"
 #include "executor/nodeSubplan.h"
 #include "executor/tqueue.h"
 #include "miscadmin.h"
@@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
GatherState *gatherstate;
Plan   *outerNode;
TupleDesc   tupDesc;
+   Index   varno;

This include is not required in nodeModifyTable.c

+#include "catalog/index.h"
+#include "catalog/indexing.h"

@@ -43,7 +49,11 @@
 #include "parser/parse_agg.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_func.h"
+#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
 #include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"

The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c

There are few typos:
+table and populate it can use a parallel plan. Another
exeption is the command
+INSERT INTO ... SELECT ... which can use a
parallel plan for
+the underlying SELECT part of the query.

exeption should be exception

+ /*
+ * For the number of workers to use for a parallel
+ * INSERT/UPDATE/DELETE, it seems resonable to use the same number
+ * of workers as estimated for the underlying query.
+ */
+ parallelModifyWorkers = path->parallel_workers;
resonable should be reasonable

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-09 Thread vignesh C
On Fri, Jan 8, 2021 at 4:32 PM Amit Kapila  wrote:
>
> On Thu, Jan 7, 2021 at 10:03 PM vignesh C  wrote:
> >
> > This feature adds schema option while creating publication. Users will
> > be able to specify one or more schemas while creating publication,
> > when the user specifies schema option, then the data changes for the
> > tables present in the schema specified by the user will be replicated
> > to the subscriber. Few examples have been listed below:
> >
> > Create a publication that publishes all changes for all the tables
> > present in production schema:
> > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA production;
> >
> > Create a publication that publishes all changes for all the tables
> > present in marketing and sales schemas:
> > CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, sales;
> >
> > Add some schemas to the publication:
> > ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
> >
> > Drop some schema from the publication:
> > ALTER PUBLICATION production_quarterly_publication DROP SCHEMA 
> > production_july;
> >
> > Attached is a POC patch for the same. I felt this feature would be quite 
> > useful.
> >
>
> What do we do if the user Drops the schema? Do we automatically remove
> it from the publication?
>
I have not yet handled this scenario yet, I will handle this and
adding of tests in the next patch.

> I see some use of such a feature but you haven't described the use
> case or how did you arrive at the conclusion that it would be quite
> useful?
>
Currently there are a couple of options "FOR All TABLES" and "FOR
TABLE" when a user creates a publication, 1) either to subscribe to
the changes of all the tables or 2) subscribe to a few tables. There
is no option for users to subscribe to relations present in the
schemas. User has to manually identify the list of tables present in
the schema and specify the list of tables in that schema using the
"FOR TABLE" option. Similarly if a user wants to subscribe to n number
of schemas, the user has to do this for the required schemas, this is
a tedious process. This feature helps the user to take care of this
internally using schema option.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-10 Thread vignesh C
Thanks for your comments Bharath, please find my opinion below.

On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
 wrote:
> I think this feature can be useful, in case a user has a lot of tables
> to publish inside a schema. Having said that, I wonder if this feature
> mandates users to create the same schema with same
> permissions/authorizations manually on the subscriber, because logical
> replication doesn't propagate any ddl's so are the schema or schema
> changes? Or is it that the list of tables from the publisher can go
> into a different schema on the subscriber?
>

DDL's will not be propagated to the subscriber. Users have to create
the schema & tables in the subscriber. No change in
Permissions/authorizations handling, it will be the same as the
existing behavior for relations.

> Since the schema can have other objects such as data types, functions,
> operators, I'm sure with your feature, non-table objects will be
> skipped.
>

Yes, only table data will be sent to subscribers, non-table objects
will be skipped.

> As Amit pointed out earlier, the behaviour when schema dropped, I
> think we should also consider when schema is altered, say altered to a
> different name, maybe we should change that in the publication too.
>

I agree that when schema is altered the renamed schema should be
reflected in the publication.

> In general, what happens if we have some temporary tables or foreign
> tables inside the schema, will they be allowed to send the data to
> subscribers?
>

Temporary tables & foreign tables will not be added to the publications.

> And, with this feature, since there can be many huge tables inside a
> schema, the initial table sync phase of the replication can take a
> while.
>

Yes this is required.

> Say a user has created a publication for a schema with hundreds of
> tables in it, at some point later, can he stop replicating a single or
> some tables from that schema?
>

There is no provision for this currently.

> IMO, it's better to have the syntax - CREATE PUBLICATION
> production_publication FOR ALL TABLES IN SCHEMA production - just
> added IN between for all tables and schema.
>

I'm ok with the proposed syntax, I would like others' opinion too
before making the change.

> Say a user has a schema with 121 tables in it, and wants to replicate
> only 120 or 199 or even lesser tables out of it, so can we have some
> skip option to the new syntax, something like below?
> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
> production WITH skip = marketing, accounts, sales;  --> meaning is,
> replicate all the tables in the schema production except marketing,
> accounts, sales tables.
>

Yes this is a good use case, will include this change.

Thanks for the comments, I will handle the comments and post a patch for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-10 Thread vignesh C
On Sat, Jan 9, 2021 at 8:14 PM Bharath Rupireddy
 wrote:
>
> One more point - if the publication is created for a schema with no or
> some initial tables, will all the future tables that may get added to
> the schema will be replicated too?
>

I agree on this, when a relation is added to the schema it should be
added to the publication.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Libpq support to connect to standby server as priority

2021-02-11 Thread vignesh C
On Fri, Feb 12, 2021 at 7:07 AM Greg Nancarrow  wrote:
>
> On Wed, Feb 10, 2021 at 5:09 PM vignesh C  wrote:
> >
> > Modified.
> > These comments are handled in v22 patch posted in my earlier mail.
> >
>
> Thanks, just one minor thing I missed in doc/src/sgml/libpq.sgml.
>
> +The support of read-write transactions is determined by the
> value of the
> +default_transaction_read_only and
> +in_hot_standby configuration parameters, that is
> +reported by the server (if supported) upon successful connection.  If
>
>
> should be:
>
> +The support of read-write transactions is determined by the
> values of the
> +default_transaction_read_only and
> +in_hot_standby configuration parameters, that are
> +reported by the server (if supported) upon successful connection.  If
>
>
> (i.e. "value" -> "values" and "is" -> "are")

Thanks for the comments, this is handled in the v23 patch attached.
Thoughts?

Regards,
Vignesh
From 45d5680adae88a5c6f9d81a5077601f036e487c5 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 8 Feb 2021 11:23:31 +0530
Subject: [PATCH v23] Enhance the libpq "target_session_attrs" connection
 parameter.

Enhance the connection parameter "target_session_attrs" to support new values:
read-only/primary/standby/prefer-standby.
Add a new "read-only" target_session_attrs option value, to support connecting
to a read-only server if available from the list of hosts (otherwise the
connection attempt fails).
Add a new "primary" target_session_attrs option value, to support connecting to
a server which is not in hot-standby mode, if available from the list of hosts
(otherwise the connection attempt fails).
Add a new "standby" target_session_attrs option value, to support connecting to
a server which is in hot-standby mode, if available from the list of hosts
(otherwise the connection attempt fails).
Add a new "prefer-standby" target_session_attrs option value, to support
connecting to a server which is in hot-standby mode, if available from the list
of hosts (otherwise connect to a server which is not in hot-standby mode).

Discussion: https://www.postgresql.org/message-id/flat/caf3+xm+8-ztokav9ghij3wfgentq97qcjxqt+rbfq6f7onz...@mail.gmail.com
---
 doc/src/sgml/high-availability.sgml   |  16 +-
 doc/src/sgml/libpq.sgml   |  73 +-
 doc/src/sgml/protocol.sgml|   5 +-
 src/backend/utils/misc/guc.c  |   3 +-
 src/interfaces/libpq/fe-connect.c | 432 ++
 src/interfaces/libpq/fe-exec.c|  18 +-
 src/interfaces/libpq/libpq-fe.h   |   3 +-
 src/interfaces/libpq/libpq-int.h  |  52 +++-
 src/test/recovery/t/001_stream_rep.pl |  79 ++-
 src/tools/pgindent/typedefs.list  |   2 +
 10 files changed, 602 insertions(+), 81 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index f49f5c0..2bbd52c 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1700,14 +1700,14 @@ synchronous_standby_names = 'ANY 2 (s1, s2, s3)'

 

-During hot standby, the parameter transaction_read_only is always
-true and may not be changed.  But as long as no attempt is made to modify
-the database, connections during hot standby will act much like any other
-database connection.  If failover or switchover occurs, the database will
-switch to normal processing mode.  Sessions will remain connected while the
-server changes mode.  Once hot standby finishes, it will be possible to
-initiate read-write transactions (even from a session begun during
-hot standby).
+During hot standby, the parameter in_hot_standby and
+transaction_read_only are always true and may not be
+changed.  But as long as no attempt is made to modify the database,
+connections during hot standby will act much like any other database
+connection.  If failover or switchover occurs, the database will switch to
+normal processing mode.  Sessions will remain connected while the server
+changes mode.  Once hot standby finishes, it will be possible to initiate
+read-write transactions (even from a session begun during hot standby).

 

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index b7a8245..08a1b8e 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1836,18 +1836,66 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   target_session_attrs
   

-If this parameter is set to read-write, only a
-connection in which read-write transactions are accepted by default
-is considered acceptable.  The query
-SHOW transaction_read_only will be sent upon any
-succe

Re: logical replication seems broken

2021-02-14 Thread vignesh C
On Sat, Feb 13, 2021 at 5:58 PM Erik Rijkers  wrote:
>
> > On 02/13/2021 11:49 AM Amit Kapila  wrote:
> >
> > On Fri, Feb 12, 2021 at 10:00 PM  wrote:
> > >
> > > > On 02/12/2021 1:51 PM Amit Kapila  wrote:
> > > >
> > > > On Fri, Feb 12, 2021 at 6:04 PM Erik Rijkers  wrote:
> > > > >
> > > > > I am seeing errors in replication in a test program that I've been 
> > > > > running for years with very little change (since 2017, really [1]).
> > >
> > > Hi,
> > >
> > > Here is a test program.  Careful, it deletes stuff.  And it will need 
> > > some changes:
> > >
> >
> > Thanks for sharing the test. I think I have found the problem.
> > Actually, it was an existing code problem exposed by the commit
> > ce0fdbfe97. In pgoutput_begin_txn(), we were sometimes sending the
> > prepare_write ('w') message but then the actual message was not being
> > sent. This was the case when we didn't found the origin of a txn. This
> > can happen after that commit because we have now started using origins
> > for tablesync workers as well and those origins are removed once the
> > tablesync workers are finished. We might want to change the behavior
> > related to the origin messages as indicated in the comments but for
> > now, fixing the existing code.
> >
> > Can you please test if the attached fixes the problem at your end as well?
>
> > [fix_origin_message_1.patch]
>
> I compiled just now a binary from HEAD, and a binary from HEAD+patch
>
> HEAD is still broken; your patch rescues it, so yes, fixed.
>
> Maybe a test (check or check-world) should be added to run a second replica?  
> (Assuming that would have caught this bug)
>

+1 for the idea of having a test for this. I have written a test for this.
Thanks for the fix Amit, I could reproduce the issue without your fix
and verified that the issue gets fixed with the patch you shared.
Attached a patch for the same. Thoughts?

Regards,
Vignesh
From eff13970ae34bcdc8590a9602eddce5eb6200195 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 15 Feb 2021 11:41:55 +0530
Subject: [PATCH v1] Test for verifying data is replicated in cascaded setup.

Test for verifying data is replicated in cascaded setup.
---
 src/test/subscription/t/100_bugs.pl | 65 +
 1 file changed, 65 insertions(+)

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index d1e407a..afb2d08 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -153,3 +153,68 @@ is($node_twoways->safe_psql('d2', "SELECT count(f) FROM t"),
 	$rows * 2, "2x$rows rows in t");
 is($node_twoways->safe_psql('d2', "SELECT count(f) FROM t2"),
 	$rows * 2, "2x$rows rows in t2");
+
+# Verify table data is synced with cascaded replication setup.
+my $node_pub = get_new_node('testpublisher1');
+$node_pub->init(allows_streaming => 'logical');
+$node_pub->start;
+
+my $node_pub_sub = get_new_node('testpublisher_subscriber');
+$node_pub_sub->init(allows_streaming => 'logical');
+$node_pub_sub->start;
+
+my $node_sub = get_new_node('testsubscriber1');
+$node_sub->init(allows_streaming => 'logical');
+$node_sub->start;
+
+# Create the tables in all nodes.
+$node_pub->safe_psql('postgres', "CREATE TABLE tab1 (a int)");
+$node_pub_sub->safe_psql('postgres', "CREATE TABLE tab1 (a int)");
+$node_sub->safe_psql('postgres', "CREATE TABLE tab1 (a int)");
+
+# Create a cascaded replication setup like:
+# N1 - Create publication testpub1.
+# N2 - Create publication testpub2 and also include subscriber which subscribes
+#  to testpub1.
+# N3 - Create subscription testsub2 subscribes to testpub2.
+$node_pub->safe_psql('postgres',
+	"CREATE PUBLICATION testpub1 FOR TABLE tab1");
+
+$node_pub_sub->safe_psql('postgres',
+	"CREATE PUBLICATION testpub2 FOR TABLE tab1");
+
+my $publisher1_connstr = $node_pub->connstr . ' dbname=postgres';
+my $publisher2_connstr = $node_pub_sub->connstr . ' dbname=postgres';
+
+$node_sub->safe_psql('postgres',
+	"CREATE SUBSCRIPTION testsub2 CONNECTION '$publisher2_connstr' PUBLICATION testpub2"
+);
+
+$node_pub_sub->safe_psql('postgres',
+	"CREATE SUBSCRIPTION testsub1 CONNECTION '$publisher1_connstr' PUBLICATION testpub1"
+);
+
+$node_pub->safe_psql('postgres',
+	"INSERT INTO tab1 values(generate_series(1,10))");
+
+# Verify that the data is cascaded from testpub1 to testsub1 and further from
+# testpub2 (which had testsub1) to testsub2.
+$node_pub->wait_for_catchup('testsub1');
+$node_pub_sub->wait_for_catchup('testsub2');
+
+# Drop subscriptions as we don't need them anymore
+$node_pub_sub->safe_psql('postgres', "DROP SUBSCRIPTION testsub1");
+$node_sub->safe_psql('postgres', "DROP SUBSCRIPTION testsub2");
+
+# Drop publications as we don't need them anymore
+$node_pub->safe_psql('postgres', "DROP PUBLICATION testpub1");
+$node_pub_sub->safe_psql('postgres', "DROP PUBLICATION testpub2");
+
+# Clean up the tables on both publisher and subscriber as we don't need them
+$node_pub->safe_psql('postgres', "DROP TABLE 

Re: logical replication seems broken

2021-02-15 Thread vignesh C
On Mon, Feb 15, 2021 at 5:02 PM Amit Kapila  wrote:
>
> On Mon, Feb 15, 2021 at 11:53 AM vignesh C  wrote:
> >
> > On Sat, Feb 13, 2021 at 5:58 PM Erik Rijkers  wrote:
> > >
> > >
> > > I compiled just now a binary from HEAD, and a binary from HEAD+patch
> > >
> > > HEAD is still broken; your patch rescues it, so yes, fixed.
> > >
> > > Maybe a test (check or check-world) should be added to run a second 
> > > replica?  (Assuming that would have caught this bug)
> > >
> >
> > +1 for the idea of having a test for this. I have written a test for this.
> > Thanks for the fix Amit, I could reproduce the issue without your fix
> > and verified that the issue gets fixed with the patch you shared.
> > Attached a patch for the same. Thoughts?
> >
>
> I have slightly modified the comments in the test case to make things
> clear. I am planning not to backpatch this because there is no way in
> the core code to hit this prior to commit ce0fdbfe97 and we haven't
> received any complaints so far. What do you think?

The changes look fine to me.

Regards,
Vignesh




Re: logical replication seems broken

2021-02-15 Thread vignesh C
On Mon, Feb 15, 2021 at 6:14 PM  wrote:
>
>
> > On 2021.02.15. 12:31 Amit Kapila  wrote:
> > On Mon, Feb 15, 2021 at 11:53 AM vignesh C  wrote:
> > > On Sat, Feb 13, 2021 at 5:58 PM Erik Rijkers  wrote:
> > > > I compiled just now a binary from HEAD, and a binary from HEAD+patch
> > > > HEAD is still broken; your patch rescues it, so yes, fixed.
> > > > Maybe a test (check or check-world) should be added to run a second 
> > > > replica?  (Assuming that would have caught this bug)
> > > >
> > > +1 for the idea of having a test for this. I have written a test for this.
> > > Thanks for the fix Amit, I could reproduce the issue without your fix
> > > and verified that the issue gets fixed with the patch you shared.
> > > Attached a patch for the same. Thoughts?
> > >
> >
> > I have slightly modified the comments in the test case to make things
> > clear. I am planning not to backpatch this because there is no way in
> > the core code to hit this prior to commit ce0fdbfe97 and we haven't
> > received any complaints so far. What do you think?
>
> My tests indeed run OK with this.
>
> (I haven't tested whether the newly added test actually tests for the problem 
> that was there - I suppose one of you did that)
>

I could re-create the scenario that you had faced with this test. This
test case is a simplified test of your script, I have removed the use
of pgbench, reduced the number of tables used and simulated the same
problem with the similar replication setup that you had used.

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-02-25 Thread vignesh C
On Wed, Feb 24, 2021 at 5:06 PM Ajin Cherian  wrote:
>
> On Wed, Feb 24, 2021 at 4:48 PM Ajin Cherian  wrote:
>
> > I plan to split this into two patches next. But do review and let me
> > know if you have any comments.
>
> Attaching an updated patch-set with the changes for
> snapshot_was_exported_at_lsn separated out from the changes for the
> APIs pg_create_logical_replication_slot() and
> pg_logical_slot_get_changes(). Along with a rebase that takes in a few
> more commits since my last patch.

One observation while verifying the patch I noticed that most of
ReplicationSlotPersistentData structure members are displayed in
pg_replication_slots, but I did not see snapshot_was_exported_at_lsn
being displayed. Is this intentional? If not intentional we can
include snapshot_was_exported_at_lsn in pg_replication_slots.

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-02-26 Thread vignesh C
On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
>
> On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
>
> > I've updated snapshot_was_exported_at_  member to pg_replication_slots as 
> > well.
> > Do have a look and let me know if there are any comments.
>
> Update with both patches.

Thanks for fixing and providing an updated patch. Patch applies, make
check and make check-world passes. I could see the issue working fine.

Few minor comments:
+   snapshot_was_exported_at pg_lsn
+  
+  
+   The address (LSN) at which the logical
+   slot found a consistent point at the time of slot creation.
+   NULL for physical slots.
+  
+ 


I had seen earlier also we had some discussion on naming
snapshot_was_exported_at. Can we change snapshot_was_exported_at to
snapshot_exported_lsn, I felt if we can include the lsn in the name,
the user will be able to interpret easily and also it will be similar
to other columns in pg_replication_slots view.


 L.restart_lsn,
 L.confirmed_flush_lsn,
+   L.snapshot_was_exported_at,
 L.wal_status,
 L.safe_wal_size

Looks like there is some indentation issue here.

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-02-27 Thread vignesh C
On Sat, Feb 27, 2021 at 8:29 AM Amit Kapila  wrote:
>
> On Fri, Feb 26, 2021 at 7:26 PM vignesh C  wrote:
> >
> > On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
> > >
> > > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
> > >
> > > > I've updated snapshot_was_exported_at_  member to pg_replication_slots 
> > > > as well.
> > > > Do have a look and let me know if there are any comments.
> > >
> > > Update with both patches.
> >
> > Thanks for fixing and providing an updated patch. Patch applies, make
> > check and make check-world passes. I could see the issue working fine.
> >
> > Few minor comments:
> > +   snapshot_was_exported_at 
> > pg_lsn
> > +  
> > +  
> > +   The address (LSN) at which the logical
> > +   slot found a consistent point at the time of slot creation.
> > +   NULL for physical slots.
> > +  
> > + 
> >
> >
> > I had seen earlier also we had some discussion on naming
> > snapshot_was_exported_at. Can we change snapshot_was_exported_at to
> > snapshot_exported_lsn, I felt if we can include the lsn in the name,
> > the user will be able to interpret easily and also it will be similar
> > to other columns in pg_replication_slots view.
> >
>
> I have recommended above to change this name to initial_consistency_at
> because there are times when we don't export snapshot and we still set
> this like when creating slots with CRS_NOEXPORT_SNAPSHOT or when
> creating via SQL APIs.  I am not sure why Ajin neither changed the
> name nor responded to that comment. What is your opinion?

initial_consistency_at looks good to me. That is more understandable.

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-02-27 Thread vignesh C
On Sat, Feb 27, 2021 at 5:36 PM Amit Kapila  wrote:
>
> On Sat, Feb 27, 2021 at 11:38 AM Amit Kapila  wrote:
> >
> > On Fri, Feb 26, 2021 at 4:13 PM Ajin Cherian  wrote:
> > >
> > > On Fri, Feb 26, 2021 at 7:47 PM Ajin Cherian  wrote:
> > >
> > > > I've updated snapshot_was_exported_at_  member to pg_replication_slots 
> > > > as well.
> > > > Do have a look and let me know if there are any comments.
> > >
> > > Update with both patches.
> > >
> >
> > Thanks, I have made some minor changes to the first patch and now it
> > looks good to me. The changes are as below:
> > 1. Removed the changes related to exposing this new parameter via view
> > as mentioned in my previous email.
> > 2. Changed the variable name initial_consistent_point.
> > 3. Ran pgindent, minor changes in comments, and modified the commit message.
> >
> > Let me know what you think about these changes.
> >
>
> In the attached, I have just bumped SNAPBUILD_VERSION  as we are
> adding a new member in the SnapBuild structure.
>

Few minor comments:

git am v6-0001-Avoid-repeated-decoding-of-prepared-transactions-.patch
Applying: Avoid repeated decoding of prepared transactions after the restart.
/home/vignesh/postgres/.git/rebase-apply/patch:286: trailing whitespace.
#define SNAPBUILD_VERSION 4
warning: 1 line adds whitespace errors.

There is one whitespace error.

In commit a271a1b50e, we allowed decoding at prepare time and the prepare
was decoded again if there is a restart after decoding it. It was done
that way because we can't distinguish between the cases where we have not
decoded the prepare because it was prior to consistent snapshot or we have
decoded it earlier but restarted. To distinguish between these two cases,
we have introduced an initial_consisten_point at the slot level which is
an LSN at which we found a consistent point at the time of slot creation.

One minor typo in commit message, initial_consisten_point should be
initial_consistent_point

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-02-28 Thread vignesh C
On Mon, Mar 1, 2021 at 7:23 AM Ajin Cherian  wrote:
>
> On Sat, Feb 27, 2021 at 11:06 PM Amit Kapila  wrote:
>
> > Few comments on 0002 patch:
> > =
> > 1.
> > +
> > + /*
> > + * Disable two-phase here, it will be set in the core if it was
> > + * enabled whole creating the slot.
> > + */
> > + ctx->twophase = false;
> >
> > Typo, /whole/while. I think we don't need to initialize this variable
> > here at all.
> >
> > 2.
> > + /* If twophase is set on the slot at create time, then
> > + * make sure the field in the context is also updated
> > + */
> > + if (MyReplicationSlot->data.twophase)
> > + {
> > + ctx->twophase = true;
> > + }
> > +
> >
> > For multi-line comments, the first line of comment should be empty.
> > Also, I think this is not the right place because the WALSender path
> > needs to set it separately. I guess you can set it in
> > CreateInitDecodingContext/CreateDecodingContext by doing something
> > like
> >
> > ctx->twophase &= MyReplicationSlot->data.twophase
>
> Updated accordingly.
>
> >
> > 3. I think we can support this option at the protocol level in a
> > separate patch where we need to allow it via replication commands (say
> > when we support it in CreateSubscription). Right now, there is nothing
> > to test all the code you have added in repl_gram.y.
> >
>
> Removed that.
>
>
> > 4. I think we can expose this new option via pg_replication_slots.
> >
>
> Done. Added,
>

v7-0002-Add-option-to-enable-two-phase-commits-in-pg_crea.patch adds
twophase to pg_create_logical_replication_slot, I feel this option
should be documented in src/sgml/func.sgml.

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-03-01 Thread vignesh C
On Tue, Mar 2, 2021 at 6:37 AM Ajin Cherian  wrote:
>
> On Mon, Mar 1, 2021 at 8:08 PM Amit Kapila  wrote:
>
> > Few minor comments on 0002 patch
> > =
> > 1.
> >   ctx->streaming &= enable_streaming;
> > - ctx->twophase &= enable_twophase;
> > +
> >  }
> >
> > Spurious line addition.
>
> Deleted.
>
> >
> > 2.
> > -  proallargtypes =>
> > '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8}',
> > -  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > -  proargnames =>
> > '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size}',
> > +  proallargtypes =>
> > '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool}',
> > +  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > +  proargnames =>
> > '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,twophase}',
> >prosrc => 'pg_get_replication_slots' },
> >  { oid => '3786', descr => 'set up a logical replication slot',
> >proname => 'pg_create_logical_replication_slot', provolatile => 'v',
> > -  proparallel => 'u', prorettype => 'record', proargtypes => 'name name 
> > bool',
> > -  proallargtypes => '{name,name,bool,name,pg_lsn}',
> > -  proargmodes => '{i,i,i,o,o}',
> > -  proargnames => '{slot_name,plugin,temporary,slot_name,lsn}',
> > +  proparallel => 'u', prorettype => 'record', proargtypes => 'name
> > name bool bool',
> > +  proallargtypes => '{name,name,bool,bool,name,pg_lsn}',
> > +  proargmodes => '{i,i,i,i,o,o}',
> > +  proargnames => '{slot_name,plugin,temporary,twophase,slot_name,lsn}',
> >
> > I think it is better to use two_phase here and at other places as well
> > to be consistent with similar parameters.
>
> Updated as requested.
> >
> > 3.
> > --- a/src/backend/catalog/system_views.sql
> > +++ b/src/backend/catalog/system_views.sql
> > @@ -894,7 +894,8 @@ CREATE VIEW pg_replication_slots AS
> >  L.restart_lsn,
> >  L.confirmed_flush_lsn,
> >  L.wal_status,
> > -L.safe_wal_size
> > +L.safe_wal_size,
> > + L.twophase
> >  FROM pg_get_replication_slots() AS L
> >
> > Indentation issue. Here, you need you spaces instead of tabs.
>
> Updated.
> >
> > 4.
> > @@ -533,6 +533,12 @@ CreateDecodingContext(XLogRecPtr start_lsn,
> >
> >   ctx->reorder->output_rewrites = ctx->options.receive_rewrites;
> >
> > + /*
> > + * If twophase is set on the slot at create time, then
> > + * make sure the field in the context is also updated.
> > + */
> > + ctx->twophase &= MyReplicationSlot->data.twophase;
> > +
> >
> > Why didn't you made similar change in CreateInitDecodingContext when I
> > already suggested the same in my previous email? If we don't make that
> > change then during slot initialization two_phase will always be true
> > even though user passed in as false. It looks inconsistent and even
> > though there is no direct problem due to that but it could be cause of
> > possible problem in future.
>
> Updated.
>

I have a minor comment regarding the below:
+ 
+  
+   two_phase bool
+  
+  
+  True if two-phase commits are enabled on this slot.
+  
+ 

Can we change something like:
True if the slot is enabled for decoding prepared transaction
information. Refer link for more information.(link should point where
more detailed information is available for two-phase in
pg_create_logical_replication_slot).

Also there is one small indentation in that line, I think there should
be one space before "True if".

Regards,
Vignesh




Re: repeated decoding of prepared transactions

2021-03-01 Thread vignesh C
On Tue, Mar 2, 2021 at 9:33 AM Amit Kapila  wrote:
>
> On Tue, Mar 2, 2021 at 8:20 AM vignesh C  wrote:
> >
> >
> > I have a minor comment regarding the below:
> > + 
> > +  
> > +   two_phase bool
> > +  
> > +  
> > +  True if two-phase commits are enabled on this slot.
> > +  
> > + 
> >
> > Can we change something like:
> > True if the slot is enabled for decoding prepared transaction
> > information. Refer link for more information.(link should point where
> > more detailed information is available for two-phase in
> > pg_create_logical_replication_slot).
> >
> > Also there is one small indentation in that line, I think there should
> > be one space before "True if".
> >
>
> Okay, fixed these but I added a slightly different description. I have
> also added the parameter description for
> pg_create_logical_replication_slot in docs and changed the comments at
> various places in the code. Apart from that ran pgindent. The patch
> looks good to me now. Let me know what do you think?

Patch applies cleanly, make check and make check-world passes. I did
not find any other issue. The patch looks good to me.

Regards,
Vignesh




Buildfarm failure in crake

2021-03-02 Thread vignesh C
Hi,

I noticed there is buildfarm failure in crake, it fails with the
following error:
Mar 02 21:22:56 ./src/test/recovery/t/001_stream_rep.pl: Variable
declared in conditional statement at line 88, column 2.  Declare
variables outside of the condition.
([Variables::ProhibitConditionalDeclarations] Severity: 5)

I felt the variable declaration and the assignment need to be split as
it involves conditional statements. I have attached a patch which will
help in fixing the problem.
Thoughts?

Regards,
Vignesh
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 07a9912..cf5211d 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -84,8 +84,9 @@ sub test_target_session_attrs
 	my $node2_host = $node2->host;
 	my $node2_port = $node2->port;
 	my $node2_name = $node2->name;
+	my $target_name;
 
-	my $target_name = $target_node->name if (defined $target_node);
+	$target_name = $target_node->name if (defined $target_node);
 
 	# Build connection string for connection attempt.
 	my $connstr = "host=$node1_host,$node2_host ";


Re: Libpq support to connect to standby server as priority

2021-03-02 Thread vignesh C
On Wed, Mar 3, 2021 at 7:37 AM Tom Lane  wrote:
>
> Greg Nancarrow  writes:
> > I've marked this as "Ready for Committer".
>
> I've pushed this after whacking it around a fair amount.  A lot of
> that was cosmetic, but one thing that wasn't is that I got rid of the
> proposed "which_primary_host" variable.  I thought the logic around
> that was way too messy and probably buggy.  Even if it worked exactly
> as intended, I'm dubious that the design intention was good.  I think
> it makes more sense just to go through the whole server list again
> without the restriction to standby servers.  In particular, that will
> give saner results if the servers' status isn't holding still.
>

Buildfarm machine crake and conchuela have failed after this commit.
I had checked the failures, crake is failing because of:
Mar 02 21:22:56 ./src/test/recovery/t/001_stream_rep.pl: Variable declared
in conditional statement at line 88, column 2.  Declare variables outside
of the condition.  ([Variables::ProhibitConditionalDeclarations] Severity:
5)
I have analyzed and posted a patch at [1] for this. That might fix this
problem.

Conchuela is failing because of:
ok 17 - connect to node standby_1 if mode "standby" and standby_1,primary
listed
ack Broken pipe: write( 13, 'SHOW port;' ) at
/usr/local/lib/perl5/site_perl/IPC/Run/IO.pm line 549.
### Stopping node "primary" using mode immediate
# Running: pg_ctl -D
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_001_stream_rep_primary_data/pgdata
-m immediate stop
waiting for server to shut down... done

I could not find the exact reason for this failure, I'm checking further on
why it is failing.
Thoughts?

[1] -
https://www.postgresql.org/message-id/CALDaNm3L%3DROeb%3D4rKf0XMN0CqrEnn6T%3D-44m4fsDAhcw-%40mail.gmail.com

OUCVA


Regards,
Vignesh


Re: [HACKERS] logical decoding of two-phase transactions

2021-03-05 Thread vignesh C
On Fri, Mar 5, 2021 at 12:21 PM Ajin Cherian  wrote:
>
> On Thu, Mar 4, 2021 at 9:53 PM Peter Smith  wrote:
>
> > [05a] Now syncing the psf file at prepare time
>
> The patch v46-0008 does not handle spooling of streaming prepare if
> the Subscription is configured for both two-phase and streaming.
> I feel that it would be best if we don't support both two-phase and
> streaming together in a subscription in this release.
> Probably a future release could handle this. So, changing the patch to
> not allow streaming and two-phase together.
> This new patch v49 has the following changes.
>
> * Don't support creating a subscription with both streaming and
> two-phase enabled.
> * Don't support altering a subscription enabling streaming if it was
> created with two-phase enabled.
> * Remove stream_prepare callback as a "required" callback, make it an
> optional callback and remove all code related to stream_prepare in the
> pgoutput plugin as well as in worker.c
>
> Also fixed
> * Don't support the alter of subscription setting two-phase. Toggling
> of two-phase mode using the alter command on the subscription can
> cause transactions to be missed and result in an inconsistent replica.
>

Thanks for the updated patch.
Few minor comments:

I'm not sure if we plan to change this workaround, if we are not
planning to change this workaround. We can reword the comments
suitably. We generally don't use workaround in our comments.
+   /*
+* Workaround Part 1 of 2:
+*
+* Make sure every tablesync has reached at least SYNCDONE state
+* before letting the apply worker proceed.
+*/
+   elog(DEBUG1,
+"apply_handle_begin_prepare, end_lsn = %X/%X,
final_lsn = %X/%X, lstate_lsn = %X/%X",
+LSN_FORMAT_ARGS(begin_data.end_lsn),
+LSN_FORMAT_ARGS(begin_data.final_lsn),
+LSN_FORMAT_ARGS(MyLogicalRepWorker->relstate_lsn));
+

We should include two_phase in tab completion (tab-complete.c file
psql_completion(const char *text, int start, int end) function) :
postgres=# create subscription sub1 connection 'port=5441
dbname=postgres' publication  pub1 with (
CONNECT COPY_DATA   CREATE_SLOT ENABLED
 SLOT_NAME   SYNCHRONOUS_COMMIT

+
+ 
+  It is not allowed to combine streaming set to
+  true and two_phase set to
+  true.
+ 
+
+
+   
+   
+two_phase (boolean)
+
+ 
+  Specifies whether two-phase commit is enabled for this subscription.
+  The default is false.
+ 
+
+ 
+  When two-phase commit is enabled then the decoded
transactions are sent
+  to the subscriber on the PREPARE TRANSACTION. By default,
the transaction
+  preapred on publisher is decoded as normal transaction at commit.
+ 
+
+ 
+  It is not allowed to combine two_phase set to
+  true and streaming set to
+  true.
+ 

It is not allowed to combine streaming set to true and two_phase set to true.
Should this be:
streaming option is not supported along with two_phase option.

Similarly here too:
It is not allowed to combine two_phase set to true and streaming set to true.
Should this be:
two_phase option is not supported along with streaming option.

Few indentation issues are present, we can run pgindent:
+extern void logicalrep_write_prepare(StringInfo out, ReorderBufferTXN *txn,
+
  XLogRecPtr prepare_lsn);
+extern void logicalrep_read_prepare(StringInfo in,
+
 LogicalRepPreparedTxnData *prepare_data);
+extern void logicalrep_write_commit_prepared(StringInfo out,
ReorderBufferTXN* txn,
+
  XLogRecPtr commit_lsn);

ReorderBufferTXN * should be ReorderBufferTXN*

Line exceeds 80 chars:
+   /*
+* Now that we replayed the psf it is no longer
needed. Just delete it.
+*/
+   prepare_spoolfile_delete(psfpath);

There is a typo, preapred should be prepared.
+ 
+  When two-phase commit is enabled then the decoded
transactions are sent
+  to the subscriber on the PREPARE TRANSACTION. By default,
the transaction
+  preapred on publisher is decoded as normal transaction at commit.
+ 

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-07 Thread vignesh C
On Mon, Mar 8, 2021 at 7:17 AM Peter Smith  wrote:
>
> Please find attached the latest patch set v52*
>
Few comments:

+logicalrep_read_begin_prepare(StringInfo in,
LogicalRepBeginPrepareData *begin_data)
+{
+   /* read fields */
+   begin_data->final_lsn = pq_getmsgint64(in);
+   if (begin_data->final_lsn == InvalidXLogRecPtr)
+   elog(ERROR, "final_lsn not set in begin message");
+   begin_data->end_lsn = pq_getmsgint64(in);
+   if (begin_data->end_lsn == InvalidXLogRecPtr)
+   elog(ERROR, "end_lsn not set in begin message");
+   begin_data->committime = pq_getmsgint64(in);
+   begin_data->xid = pq_getmsgint(in, 4);
+
+   /* read gid (copy it into a pre-allocated buffer) */
+   strcpy(begin_data->gid, pq_getmsgstring(in));
+}
In logicalrep_read_begin_prepare we validate final_lsn & end_lsn. But
this validation is not done in logicalrep_read_commit_prepared and
logicalrep_read_rollback_prepared. Should we keep it consistent?

@@ -170,5 +237,4 @@ extern void
logicalrep_write_stream_abort(StringInfo out, TransactionId xid,

   TransactionId subxid);
 extern void logicalrep_read_stream_abort(StringInfo in, TransactionId *xid,

  TransactionId *subxid);
-
 #endif /* LOGICAL_PROTO_H */
This change is not required.

@@ -242,15 +244,16 @@ create_replication_slot:
$$ = (Node *) cmd;
}
/* CREATE_REPLICATION_SLOT slot TEMPORARY
LOGICAL plugin */
-   | K_CREATE_REPLICATION_SLOT IDENT
opt_temporary K_LOGICAL IDENT create_slot_opt_list
+   | K_CREATE_REPLICATION_SLOT IDENT
opt_temporary opt_two_phase K_LOGICAL IDENT create_slot_opt_list
{
CreateReplicationSlotCmd *cmd;
cmd =
makeNode(CreateReplicationSlotCmd);
cmd->kind = REPLICATION_KIND_LOGICAL;
cmd->slotname = $2;
cmd->temporary = $3;
-   cmd->plugin = $5;
-   cmd->options = $6;
+   cmd->two_phase = $4;
+   cmd->plugin = $6;
+   cmd->options = $7;
$$ = (Node *) cmd;
}
Should we document two_phase in the below section:
CREATE_REPLICATION_SLOT slot_name [ TEMPORARY ] { PHYSICAL [
RESERVE_WAL ] | LOGICAL output_plugin [ EXPORT_SNAPSHOT |
NOEXPORT_SNAPSHOT | USE_SNAPSHOT ] }
Create a physical or logical replication slot. See Section 27.2.6 for
more about replication slots.


+   while (AnyTablesyncInProgress())
+   {
+   process_syncing_tables(begin_data.final_lsn);
+
+   /* This latch is to prevent 100% CPU looping. */
+   (void) WaitLatch(MyLatch,
+WL_LATCH_SET
| WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+1000L,
WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
+   ResetLatch(MyLatch);
+   }
Should we have CHECK_FOR_INTERRUPTS inside the while loop?

+   if (begin_data.final_lsn < BiggestTablesyncLSN())
+   {
+   charpsfpath[MAXPGPATH];
+
+   /*
+* Create the spoolfile.
+*/
+   prepare_spoolfile_name(psfpath, sizeof(psfpath),
+
MyLogicalRepWorker->subid, begin_data.gid);
+   prepare_spoolfile_create(psfpath);
We can make this as a single line comment.

+   if (!found)
+   {
+   elog(DEBUG1, "Not found file \"%s\". Create it.", path);
+   psf_cur.vfd = PathNameOpenFile(path, O_RDWR | O_CREAT
| O_TRUNC | PG_BINARY);
+   if (psf_cur.vfd < 0)
+   {
+   ereport(ERROR,
+   (errcode_for_file_access(),
+errmsg("could not create file
\"%s\": %m", path)));
+   }
+   memcpy(psf_cur.name, path, sizeof(psf_cur.name));
+   psf_cur.cur_offset = 0;
+   hentry->allow_delete = true;
+   }
+   else
+   {
+   /*
+* Open the file and seek to the beginning because we
always want to
+* create/overwrite this file.
+*/
+   elog(DEBUG1, "Found file \"%s\". Overwrite it.", path);
+   psf_cur.vfd = PathNameOpenFile(path, O_RDWR | O_CREAT
| O_TRUNC | PG_BINARY);
+   if (psf_cur.vfd < 0)
+  

Re: [HACKERS] logical decoding of two-phase transactions

2021-03-08 Thread vignesh C
On Mon, Mar 8, 2021 at 11:30 AM Ajin Cherian  wrote:
>
> On Fri, Mar 5, 2021 at 9:25 PM vignesh C  wrote:
>
>
> Created new patch v53:

Thanks for the updated patch.
I had noticed one issue, publisher does not get stopped normally in
the following case:
# Publisher steps
psql -d postgres -c "CREATE TABLE do_write(id serial primary key);"
psql -d postgres -c "INSERT INTO do_write VALUES(generate_series(1,10));"
psql -d postgres -c "CREATE PUBLICATION mypub FOR TABLE do_write;"

# Subscriber steps
psql -d postgres -p  -c "CREATE TABLE do_write(id serial primary key);"
psql -d postgres -p  -c "INSERT INTO do_write VALUES(1);" # to
cause a PK violation
psql -d postgres -p  -c "CREATE SUBSCRIPTION mysub CONNECTION
'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH
(two_phase = true);"

# prepare & commit prepared at publisher
psql -d postgres -c \
"begin; insert into do_write values (100); prepare transaction 'test1';"
psql -d postgres -c "commit prepared 'test1';"

Stop publisher:
./pg_ctl -D publisher stop
waiting for server to shut
down...
failed
pg_ctl: server does not shut down

This is because the following process does not exit:
postgres: walsender vignesh 127.0.0.1(41550) START_REPLICATION

It continuously loops at the below:
#0  0x7f1c520d3bca in __libc_pread64 (fd=6, buf=0x555b1b3f7870,
count=8192, offset=0) at ../sysdeps/unix/sysv/linux/pread64.c:29
#1  0x555b1a8f6d20 in WALRead (state=0x555b1b3f1ce0,
buf=0x555b1b3f7870 "\n\321\002", startptr=16777216, count=8192, tli=1,
errinfo=0x7ffe693b78c0) at xlogreader.c:1116
#2  0x555b1ac8ce10 in logical_read_xlog_page
(state=0x555b1b3f1ce0, targetPagePtr=16777216, reqLen=8192,
targetRecPtr=23049936, cur_page=0x555b1b3f7870 "\n\321\002")
at walsender.c:837
#3  0x555b1a8f6040 in ReadPageInternal (state=0x555b1b3f1ce0,
pageptr=23044096, reqLen=5864) at xlogreader.c:608
#4  0x555b1a8f5849 in XLogReadRecord (state=0x555b1b3f1ce0,
errormsg=0x7ffe693b79c0) at xlogreader.c:329
#5  0x555b1ac8ff4a in XLogSendLogical () at walsender.c:2846
#6  0x555b1ac8f1e5 in WalSndLoop (send_data=0x555b1ac8ff0e
) at walsender.c:2289
#7  0x555b1ac8db2a in StartLogicalReplication (cmd=0x555b1b3b78b8)
at walsender.c:1206
#8  0x555b1ac8e4dd in exec_replication_command (
cmd_string=0x555b1b331670 "START_REPLICATION SLOT \"mysub\"
LOGICAL 0/0 (proto_version '2', two_phase 'on', publication_names
'\"mypub\"')") at walsender.c:1646
#9  0x555b1ad04460 in PostgresMain (argc=1, argv=0x7ffe693b7cc0,
dbname=0x555b1b35cc58 "postgres", username=0x555b1b35cc38 "vignesh")
at postgres.c:4323

I felt the publisher should get stopped in this case.
Thoughts?

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-08 Thread vignesh C
On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila  wrote:
>
> On Mon, Mar 8, 2021 at 4:20 PM vignesh C  wrote:
> >
> > On Mon, Mar 8, 2021 at 11:30 AM Ajin Cherian  wrote:
> > >
> > > On Fri, Mar 5, 2021 at 9:25 PM vignesh C  wrote:
> > >
> > >
> > > Created new patch v53:
> >
> > Thanks for the updated patch.
> > I had noticed one issue, publisher does not get stopped normally in
> > the following case:
> > # Publisher steps
> > psql -d postgres -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -c "INSERT INTO do_write VALUES(generate_series(1,10));"
> > psql -d postgres -c "CREATE PUBLICATION mypub FOR TABLE do_write;"
> >
> > # Subscriber steps
> > psql -d postgres -p  -c "CREATE TABLE do_write(id serial primary key);"
> > psql -d postgres -p  -c "INSERT INTO do_write VALUES(1);" # to
> > cause a PK violation
> > psql -d postgres -p  -c "CREATE SUBSCRIPTION mysub CONNECTION
> > 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub WITH
> > (two_phase = true);"
> >
> > # prepare & commit prepared at publisher
> > psql -d postgres -c \
> > "begin; insert into do_write values (100); prepare transaction 'test1';"
> > psql -d postgres -c "commit prepared 'test1';"
> >
> > Stop publisher:
> > ./pg_ctl -D publisher stop
> > waiting for server to shut
> > down...
> > failed
> > pg_ctl: server does not shut down
> >
> > This is because the following process does not exit:
> > postgres: walsender vignesh 127.0.0.1(41550) START_REPLICATION
> >
> > It continuously loops at the below:
> >
>
> What happens if you don't set the two_phase option? If that also leads
> to the same error then can you please also check this case on the
> HEAD?

It succeeds without the two_phase option.
I had further analyzed this issue, see the details of it below:
We have the below code in WalSndDone function which will handle the
walsender exit:
if (WalSndCaughtUp && sentPtr == replicatedPtr &&
!pq_is_send_pending())
{
QueryCompletion qc;

/* Inform the standby that XLOG streaming is done */
SetQueryCompletion(&qc, CMDTAG_COPY, 0);
EndCommand(&qc, DestRemote, false);
pq_flush();

proc_exit(0);
}

But in case of with two_phase option, replicatedPtr and sentPtr never
becomes same:
(gdb) p /x replicatedPtr
$8 = 0x15faa70
(gdb) p /x sentPtr
$10 = 0x15fac50

Whereas in case of without two_phase option, replicatedPtr and sentPtr
becomes same and exits:
(gdb) p /x sentPtr
$7 = 0x15fae10
(gdb) p /x replicatedPtr
$8 = 0x15fae10

I think in case of two_phase option, replicatedPtr and sentPtr never
becomes the same which causes this process to hang.

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-08 Thread vignesh C
On Tue, Mar 9, 2021 at 9:14 AM Peter Smith  wrote:
>
> On Mon, Mar 8, 2021 at 4:58 PM vignesh C  wrote:
> >
> > LOGICAL_REP_MSG_TYPE = 'Y',
> > +   LOGICAL_REP_MSG_BEGIN_PREPARE = 'b',
> > +   LOGICAL_REP_MSG_PREPARE = 'P',
> > +   LOGICAL_REP_MSG_COMMIT_PREPARED = 'K',
> > +   LOGICAL_REP_MSG_ROLLBACK_PREPARED = 'r',
> > LOGICAL_REP_MSG_STREAM_START = 'S',
> > LOGICAL_REP_MSG_STREAM_END = 'E',
> > LOGICAL_REP_MSG_STREAM_COMMIT = 'c',
> > -   LOGICAL_REP_MSG_STREAM_ABORT = 'A'
> > +   LOGICAL_REP_MSG_STREAM_ABORT = 'A',
> > +   LOGICAL_REP_MSG_STREAM_PREPARE = 'p'
> >  } LogicalRepMsgType;
> > As we start adding more and more features, we will have to start
> > adding more message types, using meaningful characters might become
> > difficult. Should we start using numeric instead for the new feature
> > getting added?
>
> This may or may not become a problem sometime in the future, but I
> think the feedback is not really specific to the current patch set so
> I am skipping it at this time.
>
> If you want, maybe create it as a separate thread, Is it OK?

I was thinking of changing the newly added message types to something
like below:
> LOGICAL_REP_MSG_TYPE = 'Y',
> +   LOGICAL_REP_MSG_BEGIN_PREPARE = 1,
> +   LOGICAL_REP_MSG_PREPARE = 2,
> +   LOGICAL_REP_MSG_COMMIT_PREPARED = 3,
> +   LOGICAL_REP_MSG_ROLLBACK_PREPARED = 4,
> LOGICAL_REP_MSG_STREAM_START = 'S',
> LOGICAL_REP_MSG_STREAM_END = 'E',
> LOGICAL_REP_MSG_STREAM_COMMIT = 'c',
> -   LOGICAL_REP_MSG_STREAM_ABORT = 'A'
> +   LOGICAL_REP_MSG_STREAM_ABORT = 'A',
> +   LOGICAL_REP_MSG_STREAM_PREPARE = 5
>  } LogicalRepMsgType;

Changing these values at a later time may become difficult as it can
break backward compatibility. But if you feel the existing values are
better we can keep it as it is and think of it later when we add more
message types.

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-08 Thread vignesh C
On Tue, Mar 9, 2021 at 11:01 AM Amit Kapila  wrote:
>
> On Mon, Mar 8, 2021 at 8:09 PM vignesh C  wrote:
> >
> > On Mon, Mar 8, 2021 at 6:25 PM Amit Kapila  wrote:
> > >
> >
> > I think in case of two_phase option, replicatedPtr and sentPtr never
> > becomes the same which causes this process to hang.
> >
>
> The reason is that because on subscriber you have created a situation
> (PK violation) where it is not able to proceed with initial tablesync
> and then the apply worker is waiting for tablesync to complete, so it
> is not able to process new messages. I think as soon as you remove the
> duplicate row from the table it will be able to proceed.
>
> Now, we can see a similar situation even in HEAD without 2PC though it
> is a bit tricky to reproduce. Basically, when the tablesync worker is
> in SUBREL_STATE_CATCHUP state and it has a lot of WAL to process then
> the apply worker is just waiting for it to finish applying all the WAL
> and won't process any message. So at that time, if you try to stop the
> publisher you will see the same behavior. I have simulated a lot of
> WAL processing by manually debugging the tablesync and not proceeding
> for some time. You can also try by adding sleep after the tablesync
> worker has set the state as SUBREL_STATE_CATCHUP.
>
> So, I feel this is just an expected behavior and users need to
> manually fix the situation where tablesync worker is not able to
> proceed due to PK violation. Does this make sense?
>

Thanks for the detailed explanation, this behavior looks similar to
the issue you described, we can ignore this issue as it seems this
issue is not because of this patch. I also noticed that if we handle
the PK violation error by deleting that record which causes the PK
violation error, the server is able to stop immediately without any
issue.

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-09 Thread vignesh C
On Tue, Mar 9, 2021 at 10:46 AM Peter Smith  wrote:
>
> Please find attached the latest patch set v54*
>
> Differences from v53* are:
>
> * Rebased to HEAD @ today
>
> * Addresses some recent feedback issues for patch 0001
>
> Feedback from Amit @ 7/March [ak]
> - (36) Fixed. Comment about the psf replay.
> - (37) Fixed. prepare_spoolfile_create, check file already exists (on
> disk) instead of just checking HTAB.
> - (38) Fixed. Added comment about potential overwrite of existing file.
>
> Feedback from Vignesh @ 8/March [vc]
> - (45) Fixed. Changed some comment to be single-line comments (e.g. if
> they only apply to a single following stmt)
> - (46) Fixed. prepare_spoolfile_create, refactored slightly to make
> more use of common code in if/else
> - (47) Skipped. This was feedback suggesting using ints instead of
> character values for message type enum.
>

Thanks for the updated patch.
Few comments:

+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+   "CREATE PUBLICATION tap_pub");
+$node_publisher->safe_psql('postgres',
+   "ALTER PUBLICATION tap_pub ADD TABLE tab_full");

This can be changed to :
$node_publisher->safe_psql('postgres',
"CREATE PUBLICATION tap_pub FOR TABLE tab_full");

We can make similar changes in:
+# node_A (pub) -> node_B (sub)
+my $node_A_connstr = $node_A->connstr . ' dbname=postgres';
+$node_A->safe_psql('postgres',
+   "CREATE PUBLICATION tap_pub_A");
+$node_A->safe_psql('postgres',
+   "ALTER PUBLICATION tap_pub_A ADD TABLE tab_full");
+my $appname_B = 'tap_sub_B';
+$node_B->safe_psql('postgres', "
+   CREATE SUBSCRIPTION tap_sub_B
+   CONNECTION '$node_A_connstr application_name=$appname_B'
+   PUBLICATION tap_pub_A");
+
+# node_B (pub) -> node_C (sub)
+my $node_B_connstr = $node_B->connstr . ' dbname=postgres';
+$node_B->safe_psql('postgres',
+   "CREATE PUBLICATION tap_pub_B");
+$node_B->safe_psql('postgres',
+   "ALTER PUBLICATION tap_pub_B ADD TABLE tab_full");

+# rollback post the restart
+$node_publisher->safe_psql('postgres',
+   "ROLLBACK PREPARED 'test_prepared_tab';");
+$node_publisher->poll_query_until('postgres', $caughtup_query)
+   or die "Timed out while waiting for subscriber to catch up";
+
+# check inserts are rolled back
+$result = $node_subscriber->safe_psql('postgres',
+   "SELECT count(*) FROM tab_full where a IN (12,13);");
+is($result, qq(0), 'Rows inserted via 2PC are visible on the subscriber');

"Rows inserted via 2PC are visible on the subscriber"
should be something like:
"Rows rolled back are not on the subscriber"

git diff --check
src/backend/replication/logical/worker.c:3704: trailing whitespace.

Regards,
Vignesh




Do we support upgrade of logical replication?

2021-03-10 Thread vignesh C
Hi,

I was reviewing logical decoding of two-phase transactions feature,
while reviewing the feature I was checking if there is any impact on
publisher/subscriber upgrade.

I checked the existing pg_upgrade behaviour with logical replication.
I made a logical replication data instance with publisher and
subscriber with subscriber subscribed to a table.  Then I tried
upgrading publisher and subscriber individually. After upgrade I
noticed the following:

1) Pg_logical/mappings files were not copied in the upgraded data folder:
--
Pg_logical contents in old data folder:
publisher/pg_logical/replorigin_checkpoint
publisher/pg_logical/mappings:
map-32cb-4df-0_1767088-225-225
publisher/pg_logical/snapshots:
0-1643650.snap

New upgraded data folder:
publisher1/pg_logical/replorigin_checkpoint
publisher1/pg_logical/mappings:
publisher1/pg_logical/snapshots:

2) Replication slots were not copied:
select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary |
active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn | wal_status | safe_wal_size | t
wo_phase
---++---++--+---+++--+--+-+-++---+--
-
(0 rows)

3) The subscriber is in disabled mode in the upgraded data:
select * from pg_subscription;
  oid  | subdbid | subname | subowner | subenabled | subbinary |
substream | subtwophase |   subconninfo|
subslotname | subsynccommit | subpublicati
ons
---+-+-+--++---+---+-+--+-+---+-

16404 |   16401 | mysub   |   10 | f  | f | f
   | f   | host=localhost port=5432 dbname=postgres | mysub
   | off   | {mypub}
(1 row)

4) The pg_subscription_rel contents also were not copied:
select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn
-+-++--
(0 rows)

5) While logical decoding of transactions, the decoded changes will be
serialized based on logical_decoding_work_mem configuration. Even
these files were not copied during upgrade.

Do we support upgrading of logical replication, if it is supported
could someone point me to the document link on how to upgrade logical
replication?

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-11 Thread vignesh C
On Thu, Mar 11, 2021 at 7:20 AM Peter Smith  wrote:
>
> On Thu, Mar 11, 2021 at 12:46 PM Peter Smith  wrote:
> >
> > Please find attached the latest patch set v57*
> >
> > Differences from v56* are:
> >
> > * Rebased to HEAD @ today
> >
> > * Addresses the following feedback issues:
> >
> > (24) [vc-0305] Done. Ran pgindent for all patch 0001 source files.
> >
> > (49) [ak-0308] Fixed. In apply_handle_begion_prepare, don't set
> > in_remote_transaction if psf spooling
> >
> > (50) [ak-0308] Fixed. In apply_handle_prepare, assert
> > !in_remote_transaction if psf spooling.
> >
> > (52) [vc-0309] Done. Patch 0002. Simplify the way test 020 creates the
> > publication.
> >
> > (53) [vc-0309] Done. Patch 0002. Simplify the way test 022 creates the
> > publication.
> >
> > -
> > [vc-0305] 
> > https://www.postgresql.org/message-id/CALDaNm1rRG2EUus%2BmFrqRzEshZwJZtxja0rn_n3qXGAygODfOA%40mail.gmail.com
> > [vc-0309] 
> > https://www.postgresql.org/message-id/CALDaNm0QuncAis5OqtjzOxAPTZRn545JLqfjFEJwyRjUH-XvEw%40mail.gmail.com
> > [ak-0308] 
> > https://www.postgresql.org/message-id/CAA4eK1%2BoSUU77T92FueDJWsp%3DFjTroNaNC-K45Dgdr7f18aBFA%40mail.gmail.com
> >
> > Kind Regards,
> > Peter Smith.
> > Fujitsu Australia
>
> Oops. I posted the wrong patch set in my previous email.
>
> Here are the correct ones for v57*.

Thanks for the updated patch, few comments:
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -67,7 +67,8 @@ parse_subscription_options(List *options,
   char **synchronous_commit,
   bool *refresh,
   bool *binary_given,
bool *binary,
-  bool
*streaming_given, bool *streaming)
+  bool
*streaming_given, bool *streaming,
+  bool
*twophase_given, bool *twophase)


I felt twophase_given can be a local variable, it need not be added as
a function parameter as it is not used outside the function.

The corresponding changes can be done here too:
@@ -358,6 +402,8 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel)
boolcopy_data;
boolstreaming;
boolstreaming_given;
+   booltwophase;
+   booltwophase_given;
char   *synchronous_commit;
char   *conninfo;
char   *slotname;
@@ -382,7 +428,8 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel)
   &synchronous_commit,
   NULL,
 /* no "refresh" */

&binary_given, &binary,
-
&streaming_given, &streaming);
+
&streaming_given, &streaming,
+
&twophase_given, &twophase);


--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2930,6 +2930,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
+   newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))
I think this is not possible, should this be an assert.

@@ -252,6 +254,16 @@ parse_output_parameters(List *options, uint32
*protocol_version,

*enable_streaming = defGetBoolean(defel);
}
+   else if (strcmp(defel->defname, "two_phase") == 0)
+   {
+   if (twophase_given)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting
or redundant options")));
+   twophase_given = true;
+
+   *enable_twophase = defGetBoolean(defel);
+   }

We have the following check in parse_subscription_options:
if (twophase && *twophase_given && *twophase)
{
if (streaming && *streaming_given && *streaming)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("%s and %s are mutually exclusive options",
"two_phase = true", "streaming = true")));
}
Should we have a similar check in parse_output_parameters.

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-12 Thread vignesh C
On Fri, Mar 12, 2021 at 2:29 PM Peter Smith  wrote:
>
> On Fri, Mar 12, 2021 at 4:07 PM vignesh C  wrote:
>
> Hi Vignesh,
>
> Thanks for the review comments.
>
> But can you please resend it with each feedback enumerated as 1. 2.
> 3., or have some other clear separation for each comment.
>
> (Because everything is mushed together I am not 100% sure if your
> comment text applies to the code above or below it)

1) I felt twophase_given can be a local variable, it need not be added
as a function parameter as it is not used outside the function.
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -67,7 +67,8 @@ parse_subscription_options(List *options,
   char **synchronous_commit,
   bool *refresh,
   bool *binary_given,
bool *binary,
-  bool
*streaming_given, bool *streaming)
+  bool
*streaming_given, bool *streaming,
+  bool
*twophase_given, bool *twophase)

The corresponding changes should be done here too:
@@ -358,6 +402,8 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel)
boolcopy_data;
boolstreaming;
boolstreaming_given;
+   booltwophase;
+   booltwophase_given;
char   *synchronous_commit;
char   *conninfo;
char   *slotname;
@@ -382,7 +428,8 @@ CreateSubscription(CreateSubscriptionStmt *stmt,
bool isTopLevel)
   &synchronous_commit,
   NULL,
 /* no "refresh" */

&binary_given, &binary,
-
&streaming_given, &streaming);
+
&streaming_given, &streaming,
+
&twophase_given, &twophase);

2) I think this is not possible as we don't allow changing twophase
option, should this be an assert.
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2930,6 +2930,7 @@ maybe_reread_subscription(void)
strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
newsub->binary != MySubscription->binary ||
newsub->stream != MySubscription->stream ||
+   newsub->twophase != MySubscription->twophase ||
!equal(newsub->publications, MySubscription->publications))

3) We have the following check in parse_subscription_options:
if (twophase && *twophase_given && *twophase)
{
if (streaming && *streaming_given && *streaming)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("%s and %s are mutually exclusive options",
"two_phase = true", "streaming = true")));
}

Should we have a similar check in parse_output_parameters?
@@ -252,6 +254,16 @@ parse_output_parameters(List *options, uint32
*protocol_version,

*enable_streaming = defGetBoolean(defel);
}
+   else if (strcmp(defel->defname, "two_phase") == 0)
+   {
+   if (twophase_given)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting
or redundant options")));
+   twophase_given = true;
+
+   *enable_twophase = defGetBoolean(defel);
+   }

Regard,
Vignesh




Re: locking [user] catalog tables vs 2pc vs logical rep

2021-03-15 Thread vignesh C
On Tue, Feb 23, 2021 at 3:59 AM Andres Freund  wrote:
>
> Hi,
>
> The 2pc decoding added in
>
> commit a271a1b50e9bec07e2ef3a05e38e7285113e4ce6
> Author: Amit Kapila 
> Date:   2021-01-04 08:34:50 +0530
>
> Allow decoding at prepare time in ReorderBuffer.
>
> has a deadlock danger when used in a way that takes advantage of
> separate decoding of the 2PC PREPARE.
>
>
> I assume the goal of decoding the 2PC PREPARE is so one can wait for the
> PREPARE to have logically replicated, before doing the COMMIT PREPARED.
>
>
> However, currently it's pretty easy to get into a state where logical
> decoding cannot progress until the 2PC transaction has
> committed/aborted. Which essentially would lead to undetected deadlocks.
>
> The problem is that catalog tables accessed during logical decoding need
> to get locked (otherwise e.g. a table rewrite could happen
> concurrently). But if the prepared transaction itself holds a lock on a
> catalog table, logical decoding will block on that lock - which won't be
> released until replication progresses. A deadlock.
>
> A trivial example:
>
> SELECT pg_create_logical_replication_slot('test', 'test_decoding');
> CREATE TABLE foo(id serial primary key);
> BEGIN;
> LOCK pg_class;
> INSERT INTO foo DEFAULT VALUES;
> PREPARE TRANSACTION 'foo';
>
> -- hangs waiting for pg_class to be unlocked
> SELECT pg_logical_slot_get_changes('test', NULL, NULL, 'two-phase-commit', 
> '1');
>
>
> Now, more realistic versions of this scenario would probably lock a
> 'user catalog table' containing replication metadata instead of
> pg_class, but ...
>
>
> At first this seems to be a significant issue. But on the other hand, if
> you were to shut the cluster down in this situation (or disconnect all
> sessions), you have broken cluster on your hand - without logical
> decoding being involved. As it turns out, we need to read pg_class to
> log in...  And I can't remember this being reported to be a problem?
>
>
> Perhaps all that we need to do is to disallow 2PC prepare if [user]
> catalog tables have been locked exclusively? Similar to how we're
> disallowing preparing tables with temp table access.
>

Even I felt we should not allow prepare a transaction that has locked
system tables, as it does not allow creating a new session after
restart and also causes the deadlock while logical decoding of
prepared transaction.
I have made a patch to make the prepare transaction fail in this
scenario. Attached the patch for the same.
Thoughts?

Regards,
Vignesh
From 93014a12a6a7c576cdd11028eaa0e7fa2afa7205 Mon Sep 17 00:00:00 2001
From: vignesh 
Date: Mon, 15 Mar 2021 17:21:04 +0530
Subject: [PATCH v1] Fail prepared transaction if transaction has locked system
 tables/user catalog tables.

Don't allow PREPARE TRANSACTION if we've locked a system table or an user
catalog table in this transaction. Allowing the prepared  transactions when it
locked system tables will result in hang while logical decoding of prepared
transactions as logical decoding of prepared transactions locks the system
table.
---
 doc/src/sgml/ref/prepare_transaction.sgml |   8 ++
 src/backend/access/transam/xact.c |  14 +++
 src/backend/commands/lockcmds.c   |  17 +++
 src/include/access/xact.h |   7 ++
 src/test/regress/expected/prepared_xacts.out  | 102 
 .../regress/expected/prepared_xacts_1.out | 112 ++
 src/test/regress/sql/prepared_xacts.sql   |  75 
 7 files changed, 335 insertions(+)

diff --git a/doc/src/sgml/ref/prepare_transaction.sgml b/doc/src/sgml/ref/prepare_transaction.sgml
index f4f6118ac3..ee8e4072f5 100644
--- a/doc/src/sgml/ref/prepare_transaction.sgml
+++ b/doc/src/sgml/ref/prepare_transaction.sgml
@@ -106,6 +106,14 @@ PREPARE TRANSACTION transaction_id
tied to the current session to be useful in a transaction to be prepared.
   
 
+  
+   It is not currently allowed to PREPARE a transaction that
+   has locked system table(s) or user defined catalog table(s). These features
+   are too tightly tied to logical decoding of
+   PREPARE TRANSACTION and creation of new login session to
+   be useful in a transaction to be prepared.
+  
+
   
If the transaction modified any run-time parameters with SET
(without the LOCAL option),
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 6395a9b240..d5d54a9e3b 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2448,6 +2448,20 @@ PrepareTransaction(void)
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot PREPARE a transaction that has operated on temporary objects")));
 
+	/*
+	 * Don't allow PREPARE TRANSACTION if we've locked a system table or an user
+	 * catalog table in this transaction. Allowing the prepared transactions
+	 * when it locked system tables will result in deadlock while logical
+	 * decoding of prepared transactions as logical decoding of prepared
+	 * 

Re: subscriptionCheck failures

2021-03-16 Thread vignesh C
On Tue, Mar 16, 2021 at 12:29 PM Amit Kapila  wrote:
>
> On Tue, Mar 16, 2021 at 9:00 AM Amit Kapila  wrote:
> >
> > On Mon, Mar 15, 2021 at 6:00 PM Thomas Munro  wrote:
> > >
> > > Hi,
> > >
> > > This seems to be a new low frequency failure, I didn't see it mentioned 
> > > already:
> > >
> >
> > Thanks for reporting, I'll look into it.
> >
>
> By looking at the logs [1] in the buildfarm, I think I know what is
> going on here. After Create Subscription, the tablesync worker is
> launched and tries to create the slot for doing the initial copy but
> before it could finish creating the slot, we issued the Drop
> Subscription which first stops the tablesync worker and then tried to
> drop its slot. Now, it is quite possible that by the time Drop
> Subscription tries to drop the tablesync slot, it is not yet created.
> We treat this condition okay and just Logs the message. I don't think
> this is an issue because anyway generally such a slot created on the
> server will be dropped before we persist it but the test was checking
> the existence of slots on server before it gets dropped. I think we
> can avoid such a situation by preventing cancel/die interrupts while
> creating tablesync slot.
>
> This is a timing issue, so I have reproduced it via debugger and
> tested that the attached patch fixes it.
>

Thanks for the patch.
I was able to reproduce the issue using debugger by making it wait at
CreateReplicationSlot. After applying the patch the issue gets solved.

Few minor comments:
1) subscrition should be subscription in the below change:
+ * Prevent cancel/die interrupts while creating slot here because it is
+ * possible that before the server finishes this command a concurrent drop
+ * subscrition happens which would complete without removing this slot
+ * leading to a dangling slot on the server.
  */

2) "finishes this command a concurrent drop" should be "finishes this
command, a concurrent drop" in the below change:
+ * Prevent cancel/die interrupts while creating slot here because it is
+ * possible that before the server finishes this command a concurrent drop
+ * subscrition happens which would complete without removing this slot
+ * leading to a dangling slot on the server.
  */

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-16 Thread vignesh C
On Mon, Mar 15, 2021 at 6:14 PM Ajin Cherian  wrote:
>
> On Mon, Mar 15, 2021 at 2:04 PM Amit Kapila  wrote:
>>
>> I think something on these lines should be much
>> easier than the spool-file implementation unless we see any problem
>> with this idea.
>>
>
> Here's a new patch-set that implements this new solution proposed by Amit.

Thanks for the updated patch.
Few comments:
1) These are no longer needed as it has been removed with the new changes.
@@ -1959,6 +1962,8 @@ ProtocolVersion
 PrsStorage
 PruneState
 PruneStepResult
+PsfFile
+PsfHashEntry

2) "Binary mode and streaming and two_phase" should be "Binary mode,
streaming and two_phase" in the below code:
@@ -6097,13 +6097,15 @@ describeSubscriptions(const char *pattern, bool verbose)

if (verbose)
{
-   /* Binary mode and streaming are only supported in v14
and higher */
+   /* Binary mode and streaming and two_phase are only
supported in v14 and higher */
if (pset.sversion >= 14)
appendPQExpBuffer(&buf,

3) We have some reference to psf spoolfile, this should be removed.
Also check if the assert should be <= or ==.
+   /*
+* Normally, prepare_lsn == remote_final_lsn, but if this
prepare message
+* was dispatched via the psf spoolfile replay then the remote_final_lsn
+* is set to commit lsn instead. Hence the <= instead of == check below.
+*/
+   Assert(prepare_data.prepare_lsn <= remote_final_lsn);

4) Similarly in below code:
+   /*
+* It is possible that we haven't received prepare because it occurred
+* before walsender reached a consistent point in which case we need to
+* skip rollback prepared.
+*
+* And we also skip the FinishPreparedTransaction if we're using the
+* Prepare Spoolfile (using_psf) because in that case there is
no matching
+* PrepareTransactionBlock done yet.
+*/
+   if (LookupGXact(rollback_data.gid, rollback_data.prepare_end_lsn,
+   rollback_data.preparetime))
+   {

5) Should this be present:
+#if 1
+   /* This is just debugging, for confirmation the update worked. */
+   {
+   Subscription *new_s;
+
+   StartTransactionCommand();
+   new_s = GetSubscription(MySubscription->oid, false);
+   CommitTransactionCommand();
+   }
+#endif

Regards,
Vignesh




Re: [HACKERS] logical decoding of two-phase transactions

2021-03-16 Thread vignesh C
On Mon, Mar 15, 2021 at 6:14 PM Ajin Cherian  wrote:
>
> On Mon, Mar 15, 2021 at 2:04 PM Amit Kapila  wrote:
>>
>> I think something on these lines should be much
>> easier than the spool-file implementation unless we see any problem
>> with this idea.
>>
>
> Here's a new patch-set that implements this new solution proposed by Amit.

Another couple of comments:
1) Should Assert be changed to the following in the below code:
if (!HeapTupleIsValid(tup))
elog(ERROR, "cache lookup failed for subscription %u", MySubscription->oid);

+   rel = table_open(SubscriptionRelationId, RowExclusiveLock);
+   tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,
ObjectIdGetDatum(MySubscription->oid));
+   Assert(HeapTupleIsValid(tup));

2) table_states_not_ready global variable is used immediately after
call to FetchTableStates, we can make FetchTableStates return the
value or get it as an argument to the function and the global
variables can be removed.
+static List *table_states_not_ready = NIL;
+static List *table_states_all = NIL;

Regards,
Vignesh




Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 4:20 PM Bharath Rupireddy
 wrote:
>
> On Wed, Oct 21, 2020 at 3:18 PM Bharath Rupireddy
>  wrote:
> >
> > 17. Remove extra lines after #define IsHeaderLine()
> > (cstate->header_line && cstate->cur_lineno == 1) in copy.h
> >
>
>  I missed one comment:
>
>  18. I think we need to treat the number of parallel workers as an
> integer similar to the parallel option in vacuum.
>
> postgres=# copy t1 from stdin with(parallel '1');  < - we
> should not allow this.
> Enter data to be copied followed by a newline.
>
> postgres=# vacuum (parallel '1') t1;
> ERROR:  parallel requires an integer value
>

I have made the behavior the same as vacuum.
This is addressed in v9 patch shared at [1].
[1] - 
https://www.postgresql.org/message-id/caldanm1caonkfdn6k72dsirpgqngvwxql7tjeihz58opnp9...@mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-10-27 Thread vignesh C
Thanks Heikki for reviewing and providing your comments. Please find
my thoughts below.

On Fri, Oct 23, 2020 at 2:01 PM Heikki Linnakangas  wrote:
>
> I had a brief look at at this patch. Important work! A couple of first
> impressions:
>
> 1. The split between patches
> 0002-Framework-for-leader-worker-in-parallel-copy.patch and
> 0003-Allow-copy-from-command-to-process-data-from-file.patch is quite
> artificial. All the stuff introduced in the first is unused until the
> second patch is applied. The first patch introduces a forward
> declaration for ParallelCopyData(), but the function only comes in the
> second patch. The comments in the first patch talk about
> LINE_LEADER_POPULATING and LINE_LEADER_POPULATED, but the enum only
> comes in the second patch. I think these have to merged into one. If you
> want to split it somehow, I'd suggest having a separate patch just to
> move CopyStateData from copy.c to copy.h. The subsequent patch would
> then be easier to read as you could see more easily what's being added
> to CopyStateData. Actually I think it would be better to have a new
> header file, copy_internal.h, to hold CopyStateData and the other
> structs, and keep copy.h as it is.
>

I have merged 0002 & 0003 patch, I have moved few things like creation
of copy_internal.h, moving of CopyStateData from copy.c into
copy_internal.h into 0001 patch.

> 2. This desperately needs some kind of a high-level overview of how it
> works. What is a leader, what is a worker? Which process does each step
> of COPY processing, like reading from the file/socket, splitting the
> input into lines, handling escapes, calling input functions, and
> updating the heap and indexes? What data structures are used for the
> communication? How does is the work synchronized between the processes?
> There are comments on those individual aspects scattered in the patch,
> but if you're not already familiar with it, you don't know where to
> start. There's some of that in the commit message, but it needs to be
> somewhere in the source code, maybe in a long comment at the top of
> copyparallel.c.
>

Added it in copyparallel.c

> 3. I'm surprised there's a separate ParallelCopyLineBoundary struct for
> every input line. Doesn't that incur a lot of synchronization overhead?
> I haven't done any testing, this is just my gut feeling, but I assumed
> you'd work in batches of, say, 100 or 1000 lines each.
>

Data read from the file will be stored in DSM which is of size 64k *
1024. Leader will parse and identify the line boundary like which line
starts from which data block, what is the starting offset in the data
block, what is the line size, this information will be present in
ParallelCopyLineBoundary. Like you said, each worker processes
WORKER_CHUNK_COUNT 64 lines at a time. Performance test results run
for parallel copy are available at [1]. This is addressed in v9 patch
shared at [2].

[1] 
https://www.postgresql.org/message-id/CALj2ACWeQVd-xoQZHGT01_33St4xPoZQibWz46o7jW1PE3XOqQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/caldanm1caonkfdn6k72dsirpgqngvwxql7tjeihz58opnp9...@mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-10-27 Thread vignesh C
On Wed, Oct 21, 2020 at 3:50 PM Amit Kapila  wrote:
>
> On Wed, Oct 21, 2020 at 3:19 PM Bharath Rupireddy
>  wrote:
> >
> >
> > 9. Instead of calling CopyStringToSharedMemory() for each string
> > variable, can't we just create a linked list of all the strings that
> > need to be copied into shm and call CopyStringToSharedMemory() only
> > once? We could avoid 5 function calls?
> >
>
> If we want to avoid different function calls then can't we just store
> all these strings in a local structure and use it? That might improve
> the other parts of code as well where we are using these as individual
> parameters.
>

I have made one structure SerializedListToStrCState to store all the
variables. The rest of the common variables is directly copied from &
into cstate.

> > 10. Similar to above comment: can we fill all the required
> > cstate->variables inside the function CopyNodeFromSharedMemory() and
> > call it only once? In each worker we could save overhead of 5 function
> > calls.
> >
>
> Yeah, that makes sense.
>

I feel keeping it this way makes the code more readable, and also this
is not in a performance intensive tight loop. I'm  retaining the
change as is unless we feel this will make an impact.

This is addressed in v9 patch shared at [1].
[1] - 
https://www.postgresql.org/message-id/caldanm1caonkfdn6k72dsirpgqngvwxql7tjeihz58opnp9...@mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-10-27 Thread vignesh C
Thanks Ashutosh for reviewing and providing your comments.

On Fri, Oct 23, 2020 at 5:43 PM Ashutosh Sharma  wrote:
>
> Hi Vignesh,
>
> Thanks for the updated patches. Here are some more comments that I can
> find after reviewing your latest patches:
>
> +/*
> + * This structure helps in storing the common data from CopyStateData that 
> are
> + * required by the workers. This information will then be allocated and 
> stored
> + * into the DSM for the worker to retrieve and copy it to CopyStateData.
> + */
> +typedef struct SerializedParallelCopyState
> +{
> +   /* low-level state data */
> +   CopyDestcopy_dest;  /* type of copy source/destination */
> +   int file_encoding;  /* file or remote side's character encoding */
> +   boolneed_transcoding;   /* file encoding diff from server? */
> +   boolencoding_embeds_ascii;  /* ASCII can be non-first byte? */
> +
> ...
> ...
> +
> +   /* Working state for COPY FROM */
> +   AttrNumber  num_defaults;
> +   Oid relid;
> +} SerializedParallelCopyState;
>
> Can the above structure not be part of the CopyStateData structure? I
> am just asking this question because all the fields present in the
> above structure are also present in the CopyStateData structure. So,
> including it in the CopyStateData structure will reduce the code
> duplication and will also make CopyStateData a bit shorter.
>

I have removed the common members from the structure, now there are no
common members between CopyStateData & the new structure. I'm using
CopyStateData to copy to/from directly in the new patch.

> --
>
> +   pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
> +relid);
>
> Do we need to pass cstate->nworkers and relid to BeginParallelCopy()
> function when we are already passing cstate structure, using which
> both of these information can be retrieved ?
>

nworkers need not be passed as you have suggested but relid need to be
passed as we will be setting it to pcdata, modified nworkers as
suggested.

> --
>
> +/* DSM keys for parallel copy.  */
> +#define PARALLEL_COPY_KEY_SHARED_INFO  1
> +#define PARALLEL_COPY_KEY_CSTATE   2
> +#define PARALLEL_COPY_WAL_USAGE3
> +#define PARALLEL_COPY_BUFFER_USAGE 4
>
> DSM key names do not appear to be consistent. For shared info and
> cstate structures, the key name is prefixed with "PARALLEL_COPY_KEY",
> but for WalUsage and BufferUsage structures, it is prefixed with
> "PARALLEL_COPY". I think it would be better to make them consistent.
>

Modified as suggested

> --
>
> if (resultRelInfo->ri_TrigDesc != NULL &&
> (resultRelInfo->ri_TrigDesc->trig_insert_before_row ||
>  resultRelInfo->ri_TrigDesc->trig_insert_instead_row))
> {
> /*
>  * Can't support multi-inserts when there are any BEFORE/INSTEAD OF
>  * triggers on the table. Such triggers might query the table we're
>  * inserting into and act differently if the tuples that have already
>  * been processed and prepared for insertion are not there.
>  */
> insertMethod = CIM_SINGLE;
> }
> else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
>  resultRelInfo->ri_TrigDesc->trig_insert_new_table)
> {
> /*
>  * For partitioned tables we can't support multi-inserts when there
>  * are any statement level insert triggers. It might be possible to
>  * allow partitioned tables with such triggers in the future, but for
>  * now, CopyMultiInsertInfoFlush expects that any before row insert
>  * and statement level insert triggers are on the same relation.
>  */
> insertMethod = CIM_SINGLE;
> }
> else if (resultRelInfo->ri_FdwRoutine != NULL ||
>  cstate->volatile_defexprs)
> {
> ...
> ...
>
> I think, if possible, all these if-else checks in CopyFrom() can be
> moved to a single function which can probably be named as
> IdentifyCopyInsertMethod() and this function can be called in
> IsParallelCopyAllowed(). This will ensure that in case of Parallel
> Copy when the leader has performed all these checks, the worker won't
> do it again. I also feel that it will make the code look a bit
> cleaner.
>

In the recent patch posted we have changed it to simplify the check
for parallel copy, it is not an exact match. I feel this comment is
not applicable on the latest patch

> --
>
> +void
> +ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
> +{
> ...
> ...
> +   InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
> + &walusage[ParallelWorkerNumber]);
> +
> +   MemoryContextSwitchTo(oldcontext);
> +   pfree(cstate);
> +   return;
> +}
>
> It seems like you also need to delete the memory context
> (cstate->copycontext) here.
>

Added it.

> --
>
> +void
> +ExecBeforeStmtTrigger(CopyState cstate)
> +{
> +   ESta

Re: Parallel copy

2020-10-27 Thread vignesh C
On Fri, Oct 23, 2020 at 6:58 PM Ashutosh Sharma  wrote:
>
> >
> > I think, if possible, all these if-else checks in CopyFrom() can be
> > moved to a single function which can probably be named as
> > IdentifyCopyInsertMethod() and this function can be called in
> > IsParallelCopyAllowed(). This will ensure that in case of Parallel
> > Copy when the leader has performed all these checks, the worker won't
> > do it again. I also feel that it will make the code look a bit
> > cleaner.
> >
>
> Just rewriting above comment to make it a bit more clear:
>
> I think, if possible, all these if-else checks in CopyFrom() should be
> moved to a separate function which can probably be named as
> IdentifyCopyInsertMethod() and this function called from
> IsParallelCopyAllowed() and CopyFrom() functions. It will only be
> called from CopyFrom() when IsParallelCopy() returns false. This will
> ensure that in case of Parallel Copy if the leader has performed all
> these checks, the worker won't do it again. I also feel that having a
> separate function containing all these checks will make the code look
> a bit cleaner.
>

In the recent patch posted we have changed it to simplify the check
for parallel copy, it is not an exact match. I feel this comment is
not applicable on the latest patch

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Log message for GSS connection is missing once connection authorization is successful.

2020-10-27 Thread vignesh C
Hi,

Log message for GSS connection is missing once connection
authorization is successful. We have similar log messages for SSL
connections once the connection authorization is successful. This
message will help the user to identify the connection that was
selected from the logfile. I'm not sure if this log message was
intentionally left out due to some reason for GSS.
If the above analysis looks correct, then please find a patch that
adds log for gss connections.

Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From 95c906b9eaf1493ad10ac65d6cf7b27a7dd6acb9 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 28 Oct 2020 08:19:06 +0530
Subject: [PATCH v1] Log message for GSS connection is missing once connection
 authorization is successful.

Log message for GSS connection is missing once connection authorization is
successful. We have similar log message for SSL connections once the connection
authorization is successful. This message will help the user to identify the
connection that was selected from the logfile.
---
 src/backend/utils/init/postinit.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7..0fd38b7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -267,6 +267,21 @@ PerformAuthentication(Port *port)
   be_tls_get_compression(port) ? _("on") : _("off";
 			else
 #endif
+#ifdef ENABLE_GSS
+			if (be_gssapi_get_enc(port))
+ereport(LOG,
+		(port->application_name != NULL
+		 ? errmsg("replication connection authorized: user=%s application_name=%s GSS enabled (gssapi autorization=%s, principal=%s)",
+  port->user_name,
+  port->application_name,
+  be_gssapi_get_auth(port) ? _("on") : _("off"),
+  be_gssapi_get_princ(port))
+		 : errmsg("replication connection authorized: user=%s GSS enabled (gssapi autorization=%s, principal=%s)",
+  port->user_name,
+  be_gssapi_get_auth(port) ? _("on") : _("off"),
+  be_gssapi_get_princ(port;
+			else
+#endif
 ereport(LOG,
 		(port->application_name != NULL
 		 ? errmsg("replication connection authorized: user=%s application_name=%s",
@@ -295,6 +310,20 @@ PerformAuthentication(Port *port)
   be_tls_get_compression(port) ? _("on") : _("off";
 			else
 #endif
+#ifdef ENABLE_GSS
+			if (be_gssapi_get_enc(port))
+ereport(LOG,
+		(port->application_name != NULL
+		 ? errmsg("connection authorized: user=%s database=%s application_name=%s GSS enabled (gssapi autorization=%s, principal=%s)",
+  port->user_name, port->database_name, port->application_name,
+  be_gssapi_get_auth(port) ? _("on") : _("off"),
+  be_gssapi_get_princ(port))
+		 : errmsg("connection authorized: user=%s database=%s GSS enabled (gssapi autorization=%s, principal=%s)",
+  port->user_name, port->database_name,
+  be_gssapi_get_auth(port) ? _("on") : _("off"),
+  be_gssapi_get_princ(port;
+			else
+#endif
 ereport(LOG,
 		(port->application_name != NULL
 		 ? errmsg("connection authorized: user=%s database=%s application_name=%s",
-- 
1.8.3.1



Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-28 Thread vignesh C
Thanks Stephen for your comments.

On Wed, Oct 28, 2020 at 9:44 PM Stephen Frost  wrote:
>
> Greetings,
>
> * vignesh C (vignes...@gmail.com) wrote:
> > Log message for GSS connection is missing once connection
> > authorization is successful. We have similar log messages for SSL
> > connections once the connection authorization is successful. This
> > message will help the user to identify the connection that was
> > selected from the logfile. I'm not sure if this log message was
> > intentionally left out due to some reason for GSS.
> > If the above analysis looks correct, then please find a patch that
> > adds log for gss connections.
> >
> > Thoughts?
>
> I agree with logging the principal and if GSS encryption is being used
> or not as part of the connection authorized message.  Not logging the
> principal isn't great and has been something I've wanted to fix for a
> while, so glad to see someone else is thinking about this.
>
> > From 95c906b9eaf1493ad10ac65d6cf7b27a7dd6acb9 Mon Sep 17 00:00:00 2001
> > From: Vignesh C 
> > Date: Wed, 28 Oct 2020 08:19:06 +0530
> > Subject: [PATCH v1] Log message for GSS connection is missing once 
> > connection
> >  authorization is successful.
> >
> > Log message for GSS connection is missing once connection authorization is
> > successful. We have similar log message for SSL connections once the 
> > connection
> > authorization is successful. This message will help the user to identify the
> > connection that was selected from the logfile.
> > ---
> >  src/backend/utils/init/postinit.c | 29 +
> >  1 file changed, 29 insertions(+)
> >
> > diff --git a/src/backend/utils/init/postinit.c 
> > b/src/backend/utils/init/postinit.c
> > index d4ab4c7..0fd38b7 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -267,6 +267,21 @@ PerformAuthentication(Port *port)
> > 
> > be_tls_get_compression(port) ? _("on") : _("off";
> >   else
> >  #endif
> > +#ifdef ENABLE_GSS
> > + if (be_gssapi_get_enc(port))
>
> This is checking if GSS *encryption* is being used.
>
> > + ereport(LOG,
> > + (port->application_name != 
> > NULL
> > +  ? errmsg("replication 
> > connection authorized: user=%s application_name=%s GSS enabled (gssapi 
> > autorization=%s, principal=%s)",
> > +   
> > port->user_name,
> > +   
> > port->application_name,
> > +   
> > be_gssapi_get_auth(port) ? _("on") : _("off"),
> > +   
> > be_gssapi_get_princ(port))
> > +  : errmsg("replication 
> > connection authorized: user=%s GSS enabled (gssapi autorization=%s, 
> > principal=%s)",
> > +   
> > port->user_name,
> > +   
> > be_gssapi_get_auth(port) ? _("on") : _("off"),
> > +   
> > be_gssapi_get_princ(port;
>
> This is checking if GSS *authentication* was used.
>
> You can certainly have GSS authentication used without encryption, and
> you can (though I'm not sure how useful it really is) have GSS
> encryption with 'trust' authentication, so we should really break this
> out into their own sets of checks, which would look something like:
>
> if (be_gssapi_get_auth(port) || be_gssapi_get_princ(port))
> connection authorized: GSS %s (principal=%s)
>
> With the first %s being: (authentication || encrypted || authenticated and 
> encrypted)
>
> Or something along those lines, I would think.
>
> I don't think 'enabled' is a good term to use here.
>

I have made a v2 patch based on the changes you have suggested. The
patch for the same is attached.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From b067cf823750f200102be0a0cad9a26a08e29a92 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 28 Oct 2020 08:19:06 +0530
Subject: [PATCH v2] Log message for G

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-28 Thread vignesh C
Thanks Bharath for your comments.

On Wed, Oct 28, 2020 at 9:48 AM Bharath Rupireddy
 wrote:
>
> On Wed, Oct 28, 2020 at 8:29 AM vignesh C  wrote:
> >
> > Log message for GSS connection is missing once connection
> > authorization is successful. We have similar log messages for SSL
> > connections once the connection authorization is successful. This
> > message will help the user to identify the connection that was
> > selected from the logfile. I'm not sure if this log message was
> > intentionally left out due to some reason for GSS.
> > If the above analysis looks correct, then please find a patch that
> > adds log for gss connections.
> >
> > Thoughts?
> >
>
> +1 for the idea. This is useful in knowing whether or not the user is
> authenticated using GSS APIs.
>
> Here are few comments on the patch:
>
> 1. How about using(like below) #ifdef, #elif ... #endif directives
> instead of #ifdef, #endif, #ifdef, #endif?
>
> #ifdef USE_SSL
>blah,blah,blah...
> #elif defined(ENABLE_GSS)
>blah,blah,blah...
> #else
>blah,blah,blah...
> #endif
>

I preferred the way it is in the patch to maintain the similar style
that is used in other places like fe-connect.c.

> 2. I think we must use be_gssapi_get_auth(port) instead of
> be_gssapi_get_enc(port) in the if condition, because we log for gss
> authentications irrespective of encoding is enabled or not. Put it
> another way, maybe gss authentications are possible without
> encoding[1]. We can have the information whether the encryption is
> enabled or not in the log message, be_gssapi_get_enc(port) ? _("on") :
> _("off"),.
> #ifdef ENABLE_GSS
> if (be_gssapi_get_enc(port))
> ereport(LOG,
>
> We do not need be_gssapi_get_auth(port) ? _("on") : _("off") this in
> the log message, only in the if condition we need this check.
>
> [1] By looking at the below code it seems that gss authentication
> without encryption is possible.
> #ifdef ENABLE_GSS
> port->gss->auth = true;
> if (port->gss->enc)
> status = pg_GSS_checkauth(port);
> else
> {
> sendAuthRequest(port, AUTH_REQ_GSS, NULL, 0);
> status = pg_GSS_recvauth(port);
> }

Stephen also shared his thoughts for the above changes, I have
provided an updated patch for the same in the previous mail. Please
have a look and let me know if you have any comments.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread vignesh C
On Thu, Oct 29, 2020 at 7:26 PM Stephen Frost  wrote:
>
> Greetings,
>
> * vignesh C (vignes...@gmail.com) wrote:
> > I have made a v2 patch based on the changes you have suggested. The
> > patch for the same is attached.
>
> > From b067cf823750f200102be0a0cad9a26a08e29a92 Mon Sep 17 00:00:00 2001
> > From: Vignesh C 
> > Date: Wed, 28 Oct 2020 08:19:06 +0530
> > Subject: [PATCH v2] Log message for GSS connection is missing once 
> > connection
> >  authorization is successful.
> >
> > Log message for GSS connection is missing once connection authorization is
> > successful. We have similar log message for SSL connections once the 
> > connection
> > authorization is successful. This message will help the user to identify the
> > connection that was selected from the logfile.
>
> Just to be clear- it's not that the message is 'missing', it's just not
> providing the (certainly useful) information about how the connection
> was authorized.  The commit message should make it clear that what we're
> doing here is improving the connection authorization message for GSS
> authenticated or encrypted connections.
>

I have updated the commit message accordingly.

> > diff --git a/src/backend/utils/init/postinit.c 
> > b/src/backend/utils/init/postinit.c
> > index d4ab4c7..7980e92 100644
> > --- a/src/backend/utils/init/postinit.c
> > +++ b/src/backend/utils/init/postinit.c
> > @@ -267,6 +267,21 @@ PerformAuthentication(Port *port)
> > 
> > be_tls_get_compression(port) ? _("on") : _("off";
> >   else
> >  #endif
> > +#ifdef ENABLE_GSS
> > + if (be_gssapi_get_auth(port) || 
> > be_gssapi_get_princ(port))
> > + ereport(LOG,
> > + (port->application_name != 
> > NULL
> > +  ? errmsg("replication 
> > connection authorized: user=%s application_name=%s GSS %s (principal=%s)",
> > +   
> > port->user_name,
> > +   
> > port->application_name,
> > +   
> > be_gssapi_get_auth(port) || be_gssapi_get_enc(port) ? _("on") : _("off"),
> > +   
> > be_gssapi_get_princ(port))
> > +  : errmsg("replication 
> > connection authorized: user=%s GSS %s (principal=%s)",
> > +   
> > port->user_name,
> > +   
> > be_gssapi_get_auth(port) || be_gssapi_get_enc(port) ? _("on") : _("off"),
> > +   
> > be_gssapi_get_princ(port;
> > + else
> > +#endif
>
> No, this isn't what I was suggesting.  "on" and "off" really isn't
> communicating the details about the GSS-using connection.  What I
> suggested before was something like:
>
> errmsg("replication connection authorized: user=%s application_name=%s GSS %s 
> (principal=%s)",
> port->user_name,
> port->application_name,
> (be_gssapi_get_auth(port) && be_gssapi_get_enc(port)) ?  
> "authenticated and encrypted" : be_gssapi_get_auth(port) ?  "authenticated" : 
> "encrypted",
> be_gssapi_get_princ(port))
>
> Though I'll admit that perhaps there's something better which could be
> done here- but just 'on/off' certainly isn't that.  Another option might
> be:
>
> errmsg("replication connection authorized: user=%s application_name=%s GSS 
> authenticated: %s, encrypted: %s, principal: %s",
> port->user_name,
> port->application_name,
> be_gssapi_get_auth(port) ? "yes" : "no",
> be_gssapi_get_enc(port) ? "yes" : "no",
> be_gssapi_get_princ(port))
>

I like the above method that you suggested, I have changed it based on
the above.

> Also, it would be good to see if there's a way to add to the tests we
> have for GSSAPI authentication/encryption to show that we hit each of
> the possible cases and check that we get the correc

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread vignesh C
Thanks for the comments Bharath.

On Thu, Oct 29, 2020 at 12:15 PM Bharath Rupireddy
 wrote:
> 1. Instead of just "on/off" after GSS %s in the log message, wouldn't it be 
> informative if we have authenticated and/or encrypted as suggested by Stephen?
>
> So the log message would look like this:
>
> if(be_gssapi_get_auth(port))
> replication connection authorized: user=bob application_name=foo GSS 
> authenticated (principal=bar)
>
> if(be_gssapi_get_enc(port))
> replication connection authorized: user=bob application_name=foo GSS 
> encrypted (principal=bar)
>
> if(be_gssapi_get_auth(port) && be_gssapi_get_enc(port))
> replication connection authorized: user=bob application_name=foo GSS 
> authenticated and encrypted (principal=bar)
>
> +#ifdef ENABLE_GSS
> +if (be_gssapi_get_auth(port) || be_gssapi_get_princ(port))
> +ereport(LOG,
> +(port->application_name != NULL
> + ? errmsg("replication connection authorized: 
> user=%s application_name=%s GSS %s (principal=%s)",
> +  port->user_name,
> +  port->application_name,
> +  be_gssapi_get_auth(port) || 
> be_gssapi_get_enc(port) ? _("on") : _("off"),
> +  be_gssapi_get_princ(port))
> + : errmsg("replication connection authorized: 
> user=%s GSS %s (principal=%s)",
> +  port->user_name,
> +  be_gssapi_get_auth(port) || 
> be_gssapi_get_enc(port) ? _("on") : _("off"),
> +  be_gssapi_get_princ(port;
> +else
>

This is handled in v3 patch posted at [1].

> 2. I think the log message preparation looks a bit clumsy with ternary 
> operators and duplicate log message texts(I know that we already do this for 
> SSL). Can we have the log message prepared using StringInfoData data 
> structure/APIs and use just a single ereport? This way, that part of the code 
> looks cleaner.
>
> Here's what I'm visualizing:
>
> if (Log_connections)
> {
> StringInfoData msg;
>
> if (am_walsender)
> append("replication connection authorized: user=%s");
> else
> append("connection authorized: user=%s database=%s");
>
> if (port->application_name)
> append("application_name=%s");
>
> #ifdef USE_SSL
> if (port->ssl_in_use)
> append("SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s");
> #elif defined(ENABLE_GSS)
> blah,blah,blah
> #endif
>
> ereport (LOG, msg.data);
> }

This is handled in the v3 patch posted.

>
> 3. +if (be_gssapi_get_auth(port) || be_gssapi_get_princ(port))
>
> If be_gssapi_get_auth(port) returns false, I think there's no way that 
> be_gssapi_get_princ(port) would return a non null value, see the comment. The 
> function be_gssapi_get_princ() returns NULL if the auth is false, so the 
> check if ( be_gssapi_get_princ(port)) would suffice.
>
> gss_name_tname;/* GSSAPI client name */
> char   *princ;/* GSSAPI Principal used for auth, NULL if
>  * GSSAPI auth was not used */
> boolauth;/* GSSAPI Authentication used */
> boolenc;/* GSSAPI encryption in use */
>

This is handled in the v3 patch posted.


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-31 Thread vignesh C
Thanks for the comments Bharath.
On Sat, Oct 31, 2020 at 10:18 AM Bharath Rupireddy
 wrote:
>
> I took a look at v3 patch. Here are some comments.
>
> 1. Why are the input strings(not the newly added GSS log message
> string) to test_access() function are in some places double-quoted and
> in some places single quoted?
>
> 'succeeds with mapping with default gssencmode and host hba',
> 'connection authorized: user=test1 database=postgres
> application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
> principal=test1\@EXAMPLE.COM\)'
> );
> "succeeds with GSS-encrypted access required with host hba",
> 'connection authorized: user=test1 database=postgres
> application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
> principal=test1\@EXAMPLE.COM\)'
> );
>
> And also for
>
> test_access(
> $node,
> 'test1',<<< single quotes
>
> test_access(
> $node,
> "test1",   <<< double quotes
>
> Looks like we use double quoted strings in perl if we have any
> variables inside the string to be replaced by the interpreter or else
> single quoted strings are fine[1]. If this is true, can we make it
> uniform across this file at least?

I have made this uniform across this file.

>
> 2. Instead of using hardcoded values for application_name and
> principal, can we use variables? For application_name we can directly
> use a single variable and use it. I think principal name is a formed
> value, can we use that formed variable?
>
>  application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
> principal=test1\@EXAMPLE.COM\)'
>

Used variables for this.

> 3. Why are we using escape character before ( and @, IIUC, to not let
> interpreter replace it with any value. If this is correct, it doesn't
> make sense here as we are using single quoted strings. The perl
> interpreter replaces the variables only when strings are used in
> double quotes[1].
>
> +'connection authorized: user=test1 database=postgres
> application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
> principal=test1\@EXAMPLE.COM\)'
> +);
>
> I ran the keroberos tests on my dev machine. make check of 001_auth.pl
> is passing.
>

I have changed this within double quotes now as it includes passing of
the variable also. Removed the escape sequence which is not required.

The v4 patch attached has the fixes for this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From aafd8eb3844169c46ede5af7a4e2baa0767712b9 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 30 Oct 2020 17:58:45 +0530
Subject: [PATCH v4] Improving the connection authorization message for GSS
 authenticated/encrypted connections.

Added log message to include GSS authentication, encryption & principal
information. This message will help the user to know if GSS authentication or
encryption was used and which GSS principal was used.
---
 src/backend/utils/init/postinit.c |  80 +-
 src/test/kerberos/t/001_auth.pl   | 118 +++---
 2 files changed, 112 insertions(+), 86 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7..34d3045 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -246,62 +246,38 @@ PerformAuthentication(Port *port)
 
 	if (Log_connections)
 	{
+		StringInfoData logmsg;
+		initStringInfo(&logmsg);
 		if (am_walsender)
-		{
+			appendStringInfo(&logmsg, "replication ");
+
+		appendStringInfo(&logmsg, "connection authorized: user=%s",
+		 port->user_name);
+		if (!am_walsender)
+			appendStringInfo(&logmsg, " database=%s", port->database_name);
+
+		if (port->application_name != NULL)
+			appendStringInfo(&logmsg, " application_name=%s",
+			 port->application_name);
+
 #ifdef USE_SSL
-			if (port->ssl_in_use)
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  port->application_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off"))
-		 : errmsg("replication connection authorized: user=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("o

Re: Split copy.c

2020-11-02 Thread vignesh C
On Mon, Nov 2, 2020 at 2:33 PM Heikki Linnakangas  wrote:
>
> While looking at the parallel copy patches, it started to annoy me how
> large copy.c is. It confuses my little head. (Ok, it's annoyed me many
> times in the past, but I haven't done anything about it.)
>

+1 for having copy from & copy to functionality in separate files.

This is present in both copyfrom.c & copyto.c, can it be removed from
one place & moved to a common header file?
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";

CopyDest was changed to:
typedef enum CopySource
{
COPY_FILE, /* from file (or a piped program) */
COPY_OLD_FE, /* from frontend (2.0 protocol) */
COPY_NEW_FE, /* from frontend (3.0 protocol) */
COPY_CALLBACK /* from callback function */
} CopySource;

typedef enum CopyDest
{
COPY_FILE, /* to file (or a piped program) */
COPY_OLD_FE, /* to frontend (2.0 protocol) */
COPY_NEW_FE, /* to frontend (3.0 protocol) */
} CopyDest;

Should we have one enum or both are required, if both are required we
could think of naming like COPY_TO_FILE, COPY_FROM_FILE, it will make
it more clearer.

There is one warning while applying the v2 patch:
Applying: Split copy.c into copyto.c and copyfrom.c.
/home/vignesh/postgres/postgres/.git/rebase-apply/patch:909: trailing
whitespace.
warning: 1 line adds whitespace errors.

There is one compilation error, may be this Assert is not required:
copyto.c: In function ‘BeginCopyTo’:
copyto.c:477:11: error: ‘is_from’ undeclared (first use in this function)
   Assert(!is_from);

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel INSERT (INTO ... SELECT ...)

2020-11-02 Thread vignesh C
>
> See attached patches.
>

Thanks for providing the patches.
I had reviewed
v6-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch, please find
my comments:
-> commandType is not used, we can remove it.
+ * Prepare for entering parallel mode by assigning a FullTransactionId, to
be
+ * included in the transaction state that is serialized in the parallel
DSM.
+ */
+void PrepareParallelModeForModify(CmdType commandType)
+{
+   Assert(!IsInParallelMode());
+
+   (void)GetCurrentTransactionId();
+}

-> As we support insertion of data from the workers, this comments "but as
of now, only the leader backend writes into a completely new table.  In the
future, we can extend it to allow workers to write into the table" must be
updated accordingly:
+* modify any data using a CTE, or if this is a cursor operation,
or if
+* GUCs are set to values that don't permit parallelism, or if
+* parallel-unsafe functions are present in the query tree.
 *
-* (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
+* (Note that we do allow CREATE TABLE AS, INSERT, SELECT INTO, and
CREATE
 * MATERIALIZED VIEW to use parallel plans, but as of now, only the
leader
 * backend writes into a completely new table.  In the future, we
can
 * extend it to allow workers to write into the table.  However, to
allow

-> Also should we specify insert as "insert into select"

-> We could include a small writeup of the design may be in the commit
message. It will be useful for review.

-> I felt the below two assignment statements can be in the else condition:
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard !=
PROPARALLEL_UNSAFE);
+
+   /*
+* Additional parallel-mode safety checks are required in
order to
+* allow an underlying parallel query to be used for a
+* table-modification command that is supported in
parallel-mode.
+*/
+   if (glob->parallelModeOK &&
+   IsModifySupportedInParallelMode(parse->commandType))
+   {
+   glob->maxParallelHazard =
MaxParallelHazardForModify(parse, &glob->maxParallelHazard);
+   glob->parallelModeOK = (glob->maxParallelHazard !=
PROPARALLEL_UNSAFE);
+   }

something like:
/*
* Additional parallel-mode safety checks are required in order to
* allow an underlying parallel query to be used for a
* table-modification command that is supported in parallel-mode.
*/
if (glob->parallelModeOK &&
IsModifySupportedInParallelMode(parse->commandType))
glob->maxParallelHazard = MaxParallelHazardForModify(parse,
&glob->maxParallelHazard);
else
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE);

-> Comments need slight adjustment, maybe you could run pgindent for the
modified code.
+   /*
+* Supported table-modification commands may require
additional steps
+* prior to entering parallel mode, such as assigning a
FullTransactionId.
+*/

-> In the below, max_parallel_hazard_test will return true for
PROPARALLEL_RESTRICTED also, Is break intentional in that case? As in case
of RI_TRIGGER_FK for PROPARALLEL_RESTRICTED we continue.
+   if
(max_parallel_hazard_test(func_parallel(trigger->tgfoid), context))
+   break;
+
+   /*
+* If the trigger type is RI_TRIGGER_FK, this indicates a
FK exists in
+* the relation, and this would result in creation of new
CommandIds
+* on insert/update/delete and this isn't supported in a
parallel
+* worker (but is safe in the parallel leader).
+*/
+   trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+   if (trigtype == RI_TRIGGER_FK)
+   {
+   context->max_hazard = PROPARALLEL_RESTRICTED;
+   /*
+* As we're looking for the max parallel hazard, we
don't break
+* here; examine any further triggers ...
+*/
+   }

-> Should we switch to non-parallel mode in this case, instead of throwing
error?
+   val = SysCacheGetAttr(CONSTROID, tup,
+   Anum_pg_constraint_conbin,
&isnull);
+   if (isnull)
+   elog(ERROR, "null conbin for constraint
%u", con->oid);
+   conbin = TextDatumGetCString(val);

-> We could include a few tests for this in regression.

-> We might need some documentation update like in
parallel-query.html/parallel-plans.html, etc

Regards,
Vignesh
Enterp

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-11-02 Thread vignesh C
On Sun, Nov 1, 2020 at 3:34 AM Euler Taveira
 wrote:
>
> On Sat, 31 Oct 2020 at 00:34, Bharath Rupireddy 
>  wrote:
>>
>> On Fri, Oct 30, 2020 at 6:35 PM Euler Taveira
>>  wrote:
>> >
>> > + appendStringInfo(&logmsg, "replication ");
>> > +
>> > + appendStringInfo(&logmsg, "connection authorized: user=%s",
>> > + port->user_name);
>> > + if (!am_walsender)
>> > + appendStringInfo(&logmsg, " database=%s", port->database_name);
>> > +
>> > + if (port->application_name != NULL)
>> > + appendStringInfo(&logmsg, " application_name=%s",
>> > + port->application_name);
>> > +
>> >
>> > Your approach breaks localization. You should use multiple errmsg.
>> >
>>
>> IIUC, isn't it enough calling a single errmsg() inside ereport() with
>> the prepared logmsg.data (which is a string)? The errmsg() function
>> will do the required translation of the logmsg.data. Why do we need
>> multiple errmsg() calls? Could you please elaborate a bit on how the
>> way currently it is done in the patch breaks localization?
>>
>
> No. The strings are specified in the appendStringInfo, hence you should add 
> _()
> around the string to be translated. There is nothing to be translated if you
> specify only the format identifier. You can always test if gettext extracts 
> the
> string to be translated by executing 'make update-po' (after specifying
> --enable-nls in the configure).  Search for your string in one of the 
> generated
> files (po/LL.po.new).
>
> You shouldn't split messages like that because not all languages have the same
> order as English. Having said that you risk providing a nonsense translation
> because someone decided to translate pieces of a sentence separately.
>
> +   appendStringInfo(&logmsg, "replication ");
> +
> +   appendStringInfo(&logmsg, "connection authorized: user=%s",
> +port->user_name);
>
> This hunk will break translation. In Portuguese, the adjective "replication" 
> is
> translated after the noun "connection". If you decided to keep this code as 
> is,
> the printed message won't follow the grammar rules. You will have "replicação
> conexão autorizada" instead of "conexão de replicação autorizada". The former
> isn't grammatically correct. Avoid splitting sentences that are translated.
>

Thanks for the explanation, I have attached a v5 patch with the
changes where the translation should not have any problem.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From cc3c4975dbc77991088daf6e47d72a9c3c98ba5e Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 30 Oct 2020 17:58:45 +0530
Subject: [PATCH v5] Improving the connection authorization message for GSS
 authenticated/encrypted connections.

Added log message to include GSS authentication, encryption & principal
information. This message will help the user to know if GSS authentication or
encryption was used and which GSS principal was used.
---
 src/backend/utils/init/postinit.c |  81 ++
 src/test/kerberos/t/001_auth.pl   | 118 +++---
 2 files changed, 113 insertions(+), 86 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7..0e73598 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -246,62 +246,39 @@ PerformAuthentication(Port *port)
 
 	if (Log_connections)
 	{
+		StringInfoData logmsg;
+		initStringInfo(&logmsg);
 		if (am_walsender)
-		{
-#ifdef USE_SSL
-			if (port->ssl_in_use)
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  port->application_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off"))
-		 : errmsg("replication connection authorized: user=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off";
-			else
-#endif
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s",
-

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-11-06 Thread vignesh C
On Thu, Nov 5, 2020 at 9:50 AM Bharath Rupireddy
 wrote:
>
> On Thu, Nov 5, 2020 at 7:55 AM Euler Taveira  
> wrote:
> >
> > No. Don't worry with translations during the development. Make sure to 
> > follow
> > the instructions provided here [1]. Translations are coordinated in a 
> > different
> > mailing list: pgsql-translators [2]. There is a different repository [3] for
> > handling PO files and the updated files are merged by Peter Eisentraut just
> > before each minor/major release. We usually start to update translations 
> > after
> > feature freeze.
> >
>
> Thanks.
>
> On Tue, Nov 3, 2020 at 12:49 PM vignesh C  wrote:
> >
> > Thanks for the explanation, I have attached a v5 patch with the
> > changes where the translation should not have any problem.
> >
>
> I have taken a further look at the V5 patch:
>
> 1. We wait 10sec until the syslogger process logs the expected message, what 
> happens if someone intentionally made the syslogger process to wait for a 
> longer duration?  Will the new tests fail?
> # might need to retry if logging collector process is slow...
> my $max_attempts = 10 * 10;
> my $first_logfile;
> for (my $attempts = 0; $attempts < $max_attempts; $attempts++)
> {
> $first_logfile = slurp_file($node->data_dir . '/' . $lfname);
> last if $first_logfile =~ m/$expect_log_msg /;
> usleep(100_000);
> }
>

Yes the test will fail if it takes more than the max_attempts as there
is a like statement immediately after the loop:
like($first_logfile, qr/\Q$expect_log_msg\E/,
 'found expected log file content');
I have also increased the attempts to 180 seconds just like other
tests to avoid failure in very slow systems.

> 2. I intentionally altered(for testing purpose only) the expected log message 
> input given to test_access(), expecting the tests to fail, but the test 
> succeeded. Am I missing something here? Is it that the syslogger process not 
> logging the message at all or within the 10sec waiting? Do we need to 
> increase the wait duration? Do we need to do something to fail the test when 
> we don't see the expected log message in test_access()?
>
> "cXNnnection authorized: user=..
> "connecTEion authorized: user=
> "connection auTThorized:.
>

Thanks for testing this, I had missed testing this. The expression
matching was not correct. Attached v6 patch which includes the fix for
this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From 0d190bbe888127aed0c8db2e061cb8cdc8b99ee5 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Fri, 30 Oct 2020 17:58:45 +0530
Subject: [PATCH v6] Improving the connection authorization message for GSS
 authenticated/encrypted connections.

Added log message to include GSS authentication, encryption & principal
information. This message will help the user to know if GSS authentication or
encryption was used and which GSS principal was used.
---
 src/backend/utils/init/postinit.c |  81 ++
 src/test/kerberos/t/001_auth.pl   | 117 +++---
 2 files changed, 112 insertions(+), 86 deletions(-)

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index d4ab4c7..0e73598 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -246,62 +246,39 @@ PerformAuthentication(Port *port)
 
 	if (Log_connections)
 	{
+		StringInfoData logmsg;
+		initStringInfo(&logmsg);
 		if (am_walsender)
-		{
-#ifdef USE_SSL
-			if (port->ssl_in_use)
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  port->application_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off"))
-		 : errmsg("replication connection authorized: user=%s SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s)",
-  port->user_name,
-  be_tls_get_version(port),
-  be_tls_get_cipher(port),
-  be_tls_get_cipher_bits(port),
-  be_tls_get_compression(port) ? _("on") : _("off";
-			else
-#endif
-ereport(LOG,
-		(port->application_name != NULL
-		 ? errmsg("replication connection authorized: user=%s application_name=%s",
-  port->user_name,
-  port->application_name)
-		 : errmsg("replication connection authorized:

Re: Parallel copy

2020-11-07 Thread vignesh C
On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie  wrote:
>
> Hi
>
> >
> > my $bytes = $ARGV[0];
> > for(my $i = 0; $i < $bytes; $i+=8){
> >  print "longdata";
> > }
> > print "\n";
> > 
> >
> > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1'
> > with (parallel 2);
> >
> > This gets stuck forever (or at least I didn't have the patience to wait
> > it finish). Both worker processes are consuming 100% of CPU.
>
> I had a look over this problem.
>
> the ParallelCopyDataBlock has size limit:
> uint8   skip_bytes;
> chardata[DATA_BLOCK_SIZE];  /* data read from file */
>
> It seems the input line is so long that the leader process run out of the 
> Shared memory among parallel copy workers.
> And the leader process keep waiting free block.
>
> For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED,
> But leader process won't set the line_state unless it read the whole line.
>
> So it stuck forever.
> May be we should reconsider about this situation.
>
> The stack is as follows:
>
> Leader stack:
> #3  0x0075f7a1 in WaitLatch (latch=, 
> wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=1, 
> wait_event_info=wait_event_info@entry=150994945) at latch.c:411
> #4  0x005a9245 in WaitGetFreeCopyBlock 
> (pcshared_info=pcshared_info@entry=0x7f26d2ed3580) at copyparallel.c:1546
> #5  0x005a98ce in SetRawBufForLoad (cstate=cstate@entry=0x2978a88, 
> line_size=67108864, copy_buf_len=copy_buf_len@entry=65536, 
> raw_buf_ptr=raw_buf_ptr@entry=65536,
> copy_raw_buf=copy_raw_buf@entry=0x7fff4cdc0e18) at copyparallel.c:1572
> #6  0x005a1963 in CopyReadLineText (cstate=cstate@entry=0x2978a88) at 
> copy.c:4058
> #7  0x005a4e76 in CopyReadLine (cstate=cstate@entry=0x2978a88) at 
> copy.c:3863
>
> Worker stack:
> #0  GetLinePosition (cstate=cstate@entry=0x29e1f28) at copyparallel.c:1474
> #1  0x005a8aa4 in CacheLineInfo (cstate=cstate@entry=0x29e1f28, 
> buff_count=buff_count@entry=0) at copyparallel.c:711
> #2  0x005a8e46 in GetWorkerLine (cstate=cstate@entry=0x29e1f28) at 
> copyparallel.c:885
> #3  0x005a4f2e in NextCopyFromRawFields 
> (cstate=cstate@entry=0x29e1f28, fields=fields@entry=0x7fff4cdc0b48, 
> nfields=nfields@entry=0x7fff4cdc0b44) at copy.c:3615
> #4  0x005a50af in NextCopyFrom (cstate=cstate@entry=0x29e1f28, 
> econtext=econtext@entry=0x2a358d8, values=0x2a42068, nulls=0x2a42070) at 
> copy.c:3696
> #5  0x005a5b90 in CopyFrom (cstate=cstate@entry=0x29e1f28) at 
> copy.c:2985
>

Thanks for providing your thoughts. I have analyzed this issue and I'm
working on the fix for this, I will be posting a patch for this
shortly.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-10 Thread vignesh C
ointed by me earlier. There are two things to solve there (a) the
> lower-level code (like heap_* APIs, CommandCounterIncrement, xact.c
> APIs, etc.) have checks which doesn't allow any writes, we need to see
> which of those we can open now (or do some additional work to prevent
> from those checks) after some of the work done for parallel-writes in
> PG-13[1][2], and (b) in which all cases we can parallel-writes
> (parallel copy) is allowed, for example need to identify whether table
> or one of its partitions has any constraint/expression which is
> parallel-unsafe.
>

I have worked to provide a patch for the parallel safety checks. It
checks if parallely copy can be performed, Parallel copy cannot be
performed for the following a) If relation is temporary table b) If
relation is foreign table c) If relation has non parallel safe index
expressions d) If relation has triggers present whose type is of non
before statement trigger type e) If relation has check constraint
which are not parallel safe f) If relation has partition and any
partition has the above type. This patch has the checks for it. This
patch will be used by parallel copy implementation.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From 18b5efabc5cf82870c8b5d015f78f4b7d3fe18ef Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Tue, 10 Nov 2020 18:24:09 +0530
Subject: [PATCH v10 2/6] Check if parallel copy can be performed.

Checks if parallely copy can be performed, Parallel copy cannot be performed
for the following a) If relation is temporary table b) If relation is foreign
table c) If relation has non parallel safe index expressions d) If relation has
triggers present whose type is of non before statement trigger type e) If
relation has check constraint which are not parallel safe f) If relation
has partition and any partition has the above type. This patch has the
checks for it. This patch will be used by parallel copy implementation patch.
---
 src/backend/access/heap/heapam.c |  11 -
 src/backend/access/transam/xact.c|  26 ++-
 src/backend/commands/Makefile|   1 +
 src/backend/commands/copyparallel.c  | 103 +
 src/backend/optimizer/util/clauses.c | 405 +++
 src/include/access/xact.h|   1 +
 src/include/commands/copy.h  |   2 +
 src/include/optimizer/clauses.h  |   1 +
 8 files changed, 534 insertions(+), 16 deletions(-)
 create mode 100644 src/backend/commands/copyparallel.c

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861..1602525 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
 heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	CommandId cid, int options)
 {
-	/*
-	 * To allow parallel inserts, we need to ensure that they are safe to be
-	 * performed in workers. We have the infrastructure to allow parallel
-	 * inserts in general except for the cases where inserts generate a new
-	 * CommandId (eg. inserts into a table having a foreign key column).
-	 */
-	if (IsParallelWorker())
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
 	tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
 	tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
 	tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..d6d449f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -764,18 +764,34 @@ GetCurrentCommandId(bool used)
 	if (used)
 	{
 		/*
-		 * Forbid setting currentCommandIdUsed in a parallel worker, because
-		 * we have no provision for communicating this back to the leader.  We
-		 * could relax this restriction when currentCommandIdUsed was already
-		 * true at the start of the parallel operation.
+		 * If in a parallel worker, only allow setting currentCommandIdUsed
+		 * if currentCommandIdUsed was already true at the start of the
+		 * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise
+		 * forbid setting currentCommandIdUsed because we have no provision
+		 * for communicating this back to the leader.
 		 */
-		Assert(!IsParallelWorker());
+		Assert(!(IsParallelWorker() && !currentCommandIdUsed));
 		currentCommandIdUsed = true;
 	}
 	return currentCommandId;
 }
 
 /*
+ *	SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+	Assert(IsParallelWorker() && !currentCommandIdUsed &&
+		   (currentCommandId != InvalidCommandId));
+
+	currentCommandIdUsed = true;
+}
+
+/*
  *	SetParallelStartTimestamps
  *
  * In

Re: Parallel copy

2020-11-11 Thread vignesh C
On Tue, Nov 10, 2020 at 7:27 PM Amit Kapila  wrote:
>
> On Tue, Nov 10, 2020 at 7:12 PM vignesh C  wrote:
> >
> > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila  wrote:
> > >
> >
> > I have worked to provide a patch for the parallel safety checks. It
> > checks if parallely copy can be performed, Parallel copy cannot be
> > performed for the following a) If relation is temporary table b) If
> > relation is foreign table c) If relation has non parallel safe index
> > expressions d) If relation has triggers present whose type is of non
> > before statement trigger type e) If relation has check constraint
> > which are not parallel safe f) If relation has partition and any
> > partition has the above type. This patch has the checks for it. This
> > patch will be used by parallel copy implementation.
> >
>
> How did you ensure that this is sufficient? For parallel-insert's
> patch we have enabled parallel-mode for Inserts and ran the tests with
> force_parallel_mode to see if we are not missing anything. Also, it
> seems there are many common things here w.r.t parallel-insert patch,
> is it possible to prepare this atop that patch or do you have any
> reason to keep this separate?
>

I have done similar testing for copy too, I had set force_parallel
mode to regress, hardcoded in the code to pick parallel workers for
copy operation and ran make installcheck-world to verify. Many checks
in this patch are common between both patches, but I was not sure how
to handle it as both the projects are in-progress and are being
updated based on the reviewer's opinion. How to handle this?
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:20 PM Heikki Linnakangas  wrote:
>
> On 27/10/2020 15:36, vignesh C wrote:
> > Attached v9 patches have the fixes for the above comments.
>
> I did some testing:
>
> /tmp/longdata.pl:
> 
> #!/usr/bin/perl
> #
> # Generate three rows:
> # foo
> # longdatalongdatalongdata...
> # bar
> #
> # The length of the middle row is given as command line arg.
> #
>
> my $bytes = $ARGV[0];
>
> print "foo\n";
> for(my $i = 0; $i < $bytes; $i+=8){
>  print "longdata";
> }
> print "\n";
> print "bar\n";
> 
>
> postgres=# copy longdata from program 'perl /tmp/longdata.pl 1'
> with (parallel 2);
>
> This gets stuck forever (or at least I didn't have the patience to wait
> it finish). Both worker processes are consuming 100% of CPU.
>

Thanks for identifying this issue, this issue is fixed in v10 patch posted
at [1]
[1]
https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-11-18 Thread vignesh C
On Wed, Oct 28, 2020 at 5:36 PM Hou, Zhijie 
wrote:
>
> Hi
>
> I found some issue in v9-0002
>
> 1.
> +
> +   elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d,
unprocessed lines:%d, offset:%d, line size:%d",
> +write_pos, lineInfo->first_block,
> +
 pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
> +offset, pg_atomic_read_u32(&lineInfo->line_size));
> +
>
> write_pos or other variable to be printed here are type of uint32, I
think it'better to use '%u' in elog msg.
>

Modified it.

> 2.
> +* line_size will be set. Read the line_size again to be
sure if it is
> +* completed or partial block.
> +*/
> +   dataSize = pg_atomic_read_u32(&lineInfo->line_size);
> +   if (dataSize)
>
> It use dataSize( type int ) to get uint32 which seems a little dangerous.
> Is it better to define dataSize uint32 here?
>

Modified it.

> 3.
> Since function with  'Cstate' in name has been changed to 'CState'
> I think we can change function PopulateCommonCstateInfo as well.
>

Modified it.

> 4.
> +   if (pcdata->worker_line_buf_count)
>
> I think some check like the above can be 'if (xxx > 0)', which seems
easier to understand.

Modified it.

Thanks for the comments, these issues are fixed in v10 patch posted at [1]
[1]
https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-11-18 Thread vignesh C
On Thu, Oct 29, 2020 at 2:26 PM Daniel Westermann (DWE)
 wrote:
>
> On 27/10/2020 15:36, vignesh C wrote:
> >> Attached v9 patches have the fixes for the above comments.
>
> >I did some testing:
>
> I did some testing as well and have a cosmetic remark:
>
> postgres=# copy t1 from '/var/tmp/aa.txt' with (parallel 10);
> ERROR:  value 10 out of bounds for option "parallel"
> DETAIL:  Valid values are between "1" and "1024".
> postgres=# copy t1 from '/var/tmp/aa.txt' with (parallel 1000);
> ERROR:  parallel requires an integer value
> postgres=#
>
> Wouldn't it make more sense to only have one error message? The first one 
> seems to be the better message.
>

I had seen similar behavior in other places too:
postgres=# vacuum (parallel 10) t1;
ERROR:  parallel vacuum degree must be between 0 and 1024
LINE 1: vacuum (parallel 10) t1;
^
postgres=# vacuum (parallel 1000) t1;
ERROR:  parallel requires an integer value

I'm not sure if we should fix this.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-18 Thread vignesh C
On Fri, Nov 13, 2020 at 2:25 PM Amit Kapila  wrote:
>
> On Wed, Nov 11, 2020 at 10:42 PM vignesh C  wrote:
> >
> > On Tue, Nov 10, 2020 at 7:27 PM Amit Kapila  wrote:
> > >
> > > On Tue, Nov 10, 2020 at 7:12 PM vignesh C  wrote:
> > > >
> > > > On Tue, Nov 3, 2020 at 2:28 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > >
> > > > I have worked to provide a patch for the parallel safety checks. It
> > > > checks if parallely copy can be performed, Parallel copy cannot be
> > > > performed for the following a) If relation is temporary table b) If
> > > > relation is foreign table c) If relation has non parallel safe index
> > > > expressions d) If relation has triggers present whose type is of non
> > > > before statement trigger type e) If relation has check constraint
> > > > which are not parallel safe f) If relation has partition and any
> > > > partition has the above type. This patch has the checks for it. This
> > > > patch will be used by parallel copy implementation.
> > > >
> > >
> > > How did you ensure that this is sufficient? For parallel-insert's
> > > patch we have enabled parallel-mode for Inserts and ran the tests with
> > > force_parallel_mode to see if we are not missing anything. Also, it
> > > seems there are many common things here w.r.t parallel-insert patch,
> > > is it possible to prepare this atop that patch or do you have any
> > > reason to keep this separate?
> > >
> >
> > I have done similar testing for copy too, I had set force_parallel
> > mode to regress, hardcoded in the code to pick parallel workers for
> > copy operation and ran make installcheck-world to verify. Many checks
> > in this patch are common between both patches, but I was not sure how
> > to handle it as both the projects are in-progress and are being
> > updated based on the reviewer's opinion. How to handle this?
> > Thoughts?
> >
>
> I have not studied the differences in detail but if it is possible to
> prepare it on top of that patch then there shouldn't be a problem. To
> avoid confusion if you want you can always either post the latest
> version of that patch with your patch or point to it.
>

I have made this as a separate patch as of now. I will work on to see
if I can use Greg's changes as it is or if required I will provide a
few review comments on top of Greg's patch so that it is usable for
parallel copy too and later post a separate patch with the changes on
top of it. I will retain it as a separate patch till that time.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel copy

2020-11-18 Thread vignesh C
On Sat, Oct 31, 2020 at 2:07 AM Tomas Vondra 
wrote:
>
> Hi,
>
> I've done a bit more testing today, and I think the parsing is busted in
> some way. Consider this:
>
>  test=# create extension random;
>  CREATE EXTENSION
>
>  test=# create table t (a text);
>  CREATE TABLE
>
>  test=# insert into t select random_string(random_int(10, 256*1024))
from generate_series(1,1);
>  INSERT 0 1
>
>  test=# copy t to '/mnt/data/t.csv';
>  COPY 1
>
>  test=# truncate t;
>  TRUNCATE TABLE
>
>  test=# copy t from '/mnt/data/t.csv';
>  COPY 1
>
>  test=# truncate t;
>  TRUNCATE TABLE
>
>  test=# copy t from '/mnt/data/t.csv' with (parallel 2);
>  ERROR:  invalid byte sequence for encoding "UTF8": 0x00
>  CONTEXT:  COPY t, line 485: "m&\nh%_a"%r]>qtCl:Q5ltvF~;2oS6@HB
>F>og,bD$Lw'nZY\tYl#BH\t{(j~ryoZ08"SGU~.}8CcTRk1\ts$@U3szCC+U1U3i@P..."
>  parallel worker
>
>
> The functions come from an extension I use to generate random data, I've
> pushed it to github [1]. The random_string() generates a random string
> with ASCII characters, symbols and a couple special characters (\r\n\t).
> The intent was to try loading data where a fields may span multiple 64kB
> blocks and may contain newlines etc.
>
> The non-parallel copy works fine, the parallel one fails. I haven't
> investigated the details, but I guess it gets confused about where a
> string starts/end, or something like that.
>

Thanks for identifying this issue, this issue is fixed in v10 patch posted
at [1]
[1]
https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-11-18 Thread vignesh C
On Sat, Nov 7, 2020 at 7:01 PM vignesh C  wrote:
>
> On Thu, Nov 5, 2020 at 6:33 PM Hou, Zhijie 
wrote:
> >
> > Hi
> >
> > >
> > > my $bytes = $ARGV[0];
> > > for(my $i = 0; $i < $bytes; $i+=8){
> > >  print "longdata";
> > > }
> > > print "\n";
> > > 
> > >
> > > postgres=# copy longdata from program 'perl /tmp/longdata.pl
1'
> > > with (parallel 2);
> > >
> > > This gets stuck forever (or at least I didn't have the patience to
wait
> > > it finish). Both worker processes are consuming 100% of CPU.
> >
> > I had a look over this problem.
> >
> > the ParallelCopyDataBlock has size limit:
> > uint8   skip_bytes;
> > chardata[DATA_BLOCK_SIZE];  /* data read from file
*/
> >
> > It seems the input line is so long that the leader process run out of
the Shared memory among parallel copy workers.
> > And the leader process keep waiting free block.
> >
> > For the worker process, it wait util line_state becomes
LINE_LEADER_POPULATED,
> > But leader process won't set the line_state unless it read the whole
line.
> >
> > So it stuck forever.
> > May be we should reconsider about this situation.
> >
> > The stack is as follows:
> >
> > Leader stack:
> > #3  0x0075f7a1 in WaitLatch (latch=,
wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=1,
wait_event_info=wait_event_info@entry=150994945) at latch.c:411
> > #4  0x005a9245 in WaitGetFreeCopyBlock
(pcshared_info=pcshared_info@entry=0x7f26d2ed3580) at copyparallel.c:1546
> > #5  0x005a98ce in SetRawBufForLoad (cstate=cstate@entry=0x2978a88,
line_size=67108864, copy_buf_len=copy_buf_len@entry=65536,
raw_buf_ptr=raw_buf_ptr@entry=65536,
> > copy_raw_buf=copy_raw_buf@entry=0x7fff4cdc0e18) at
copyparallel.c:1572
> > #6  0x005a1963 in CopyReadLineText (cstate=cstate@entry=0x2978a88)
at copy.c:4058
> > #7  0x005a4e76 in CopyReadLine (cstate=cstate@entry=0x2978a88)
at copy.c:3863
> >
> > Worker stack:
> > #0  GetLinePosition (cstate=cstate@entry=0x29e1f28) at
copyparallel.c:1474
> > #1  0x005a8aa4 in CacheLineInfo (cstate=cstate@entry=0x29e1f28,
buff_count=buff_count@entry=0) at copyparallel.c:711
> > #2  0x005a8e46 in GetWorkerLine (cstate=cstate@entry=0x29e1f28)
at copyparallel.c:885
> > #3  0x005a4f2e in NextCopyFromRawFields 
> > (cstate=cstate@entry=0x29e1f28,
fields=fields@entry=0x7fff4cdc0b48, nfields=nfields@entry=0x7fff4cdc0b44)
at copy.c:3615
> > #4  0x005a50af in NextCopyFrom (cstate=cstate@entry=0x29e1f28,
econtext=econtext@entry=0x2a358d8, values=0x2a42068, nulls=0x2a42070) at
copy.c:3696
> > #5  0x005a5b90 in CopyFrom (cstate=cstate@entry=0x29e1f28) at
copy.c:2985
> >
>
> Thanks for providing your thoughts. I have analyzed this issue and I'm
> working on the fix for this, I will be posting a patch for this
> shortly.
>

I have fixed and provided a patch for this at [1]
[1]
https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Printing backtrace of postgres processes

2020-11-21 Thread vignesh C
Hi,

I would like to propose getting the callstack of the postgres process
by connecting to the server. This helps us in diagnosing the problems
from a customer environment in case of hung process or in case of long
running process.
The idea here is to implement & expose pg_print_callstack function,
internally what this function does is, the connected backend will send
SIGUSR1 signal by setting PMSIGNAL_BACKTRACE_EMIT to the postmaster
process. Postmaster process will send a SIGUSR1 signal to the process
by setting PROCSIG_BACKTRACE_PRINT if the process has access to
ProcSignal. As syslogger process & Stats process don't have access to
ProcSignal, multiplexing with SIGUSR1 is not possible for these
processes, hence SIGUSR2 signal will be sent for these processes. Once
the process receives this signal it will log the backtrace of the
process.
Attached is a WIP patch for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From c1006110bdeac2135d1c8e9220f65d50cd49ab63 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 22 Nov 2020 05:58:24 +0530
Subject: [PATCH] Print backtrace of postgres process that are part of this
 instance.

The idea here is to implement & expose pg_print_callstack function, internally
what this function does is, the connected backend will send SIGUSR1 signal by
setting PMSIGNAL_BACKTRACE_EMIT to the postmaster process. Postmaster process
will send SIGUSR1 signal to process by setting PROCSIG_BACKTRACE_PRINT if the
process that have access to ProcSignal. As syslogger process & Stats process
don't have access to ProcSignal, multiplexing with SIGUSR1 is not possible
for these processes, hence SIGUSR2 signal will be sent for these process.
Once the process receives this signal it will log the backtrace of the process.
---
 src/backend/postmaster/pgstat.c   | 19 ++-
 src/backend/postmaster/postmaster.c   | 20 
 src/backend/postmaster/syslogger.c| 16 +++-
 src/backend/storage/ipc/procsignal.c  | 28 
 src/backend/storage/ipc/signalfuncs.c | 22 ++
 src/backend/tcop/postgres.c   | 31 +++
 src/include/catalog/pg_proc.dat   |  6 +-
 src/include/storage/pmsignal.h|  2 ++
 src/include/storage/procsignal.h  |  2 ++
 src/include/tcop/tcopprot.h   |  2 ++
 10 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e76e627..bd38264 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -62,6 +62,7 @@
 #include "storage/pg_shmem.h"
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
+#include "tcop/tcopprot.h"
 #include "utils/ascii.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -372,6 +373,8 @@ static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
+static void sigUsr2Handler(SIGNAL_ARGS);
+
 /* 
  * Public functions called from postmaster follow
  * 
@@ -4666,7 +4669,7 @@ PgstatCollectorMain(int argc, char *argv[])
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
 	pqsignal(SIGUSR1, SIG_IGN);
-	pqsignal(SIGUSR2, SIG_IGN);
+	pqsignal(SIGUSR2, sigUsr2Handler);
 	/* Reset some signals that are accepted by postmaster but not here */
 	pqsignal(SIGCHLD, SIG_DFL);
 	PG_SETMASK(&UnBlockSig);
@@ -7242,3 +7245,17 @@ pgstat_count_slru_truncate(int slru_idx)
 {
 	slru_entry(slru_idx)->m_truncate += 1;
 }
+
+/*
+ * sigUsr2Handler
+ *
+ * handle SIGUSR2 signal to print call stack of pgstat process.
+ */
+static void
+sigUsr2Handler(SIGNAL_ARGS)
+{
+	int			save_errno = errno;
+
+	LogBackTrace();
+	errno = save_errno;
+}
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b7799ed..dd7c930 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5149,6 +5149,26 @@ sigusr1_handler(SIGNAL_ARGS)
 		StartWorkerNeeded = true;
 	}
 
+	/* Process backtrace emit signal. */
+	if (CheckPostmasterSignal(PMSIGNAL_BACKTRACE_EMIT))
+	{
+		EmitProcSignalPrintCallStack();
+
+		/*
+		 * Pgstat process & syslogger process do not have access to ProcSignal,
+		 * multiplexing with SIGUSR1 is not possible for these processes so send
+		 * SIGUSR2 signal for them as multiplexing with SIGUSR1 is not possible.
+		 */
+		if (PgStatPID)
+			kill(PgStatPID, SIGUSR2);
+
+		if (SysLoggerPID)
+			kill(SysLoggerPID, SIGUSR2);
+
+		/* Print call stack for postmaster process. */
+		LogBackTrace();
+	}
+
 	/*
 	 * RECOVERY_STARTED and BEGIN_HOT_STAN

Re: Printing backtrace of postgres processes

2020-11-29 Thread vignesh C
On Sun, Nov 22, 2020 at 11:55 AM Tom Lane  wrote:
>
> vignesh C  writes:
> > The idea here is to implement & expose pg_print_callstack function,
> > internally what this function does is, the connected backend will send
> > SIGUSR1 signal by setting PMSIGNAL_BACKTRACE_EMIT to the postmaster
> > process. Postmaster process will send a SIGUSR1 signal to the process
> > by setting PROCSIG_BACKTRACE_PRINT if the process has access to
> > ProcSignal. As syslogger process & Stats process don't have access to
> > ProcSignal, multiplexing with SIGUSR1 is not possible for these
> > processes, hence SIGUSR2 signal will be sent for these processes. Once
> > the process receives this signal it will log the backtrace of the
> > process.
>
> Surely this is *utterly* unsafe.  You can't do that sort of stuff in
> a signal handler.
>
> It might be all right to set a flag that would cause the next
> CHECK_FOR_INTERRUPTS to print a backtrace, but I'm not sure
> how useful that really is.
>
> The proposed postmaster.c addition seems quite useless, as there
> is exactly one stack trace it could ever log.
>
> I would like to see some discussion of the security implications
> of such a feature, as well.  ("There aren't any" is the wrong
> answer.)

Hi Hackers,

Any thoughts on the security implication for this feature.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-12 Thread vignesh C
On Mon, Jan 11, 2021 at 11:45 AM Bharath Rupireddy
 wrote:
>
> On Sun, Jan 10, 2021 at 11:21 PM vignesh C  wrote:
> > On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
> >  wrote:
> > > I think this feature can be useful, in case a user has a lot of tables
> > > to publish inside a schema. Having said that, I wonder if this feature
> > > mandates users to create the same schema with same
> > > permissions/authorizations manually on the subscriber, because logical
> > > replication doesn't propagate any ddl's so are the schema or schema
> > > changes? Or is it that the list of tables from the publisher can go
> > > into a different schema on the subscriber?
> > >
> >
> > DDL's will not be propagated to the subscriber. Users have to create
> > the schema & tables in the subscriber. No change in
> > Permissions/authorizations handling, it will be the same as the
> > existing behavior for relations.
>
> Looks like the existing behaviour already requires users to create the
> schema on the subscriber when publishing the tables from that schema.
> Otherwise, an error is thrown on the subscriber [1].
>
> [1] on publisher:
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
> CREATE PUBLICATION testpub FOR TABLE myschema.t1;
>
> on subscriber:
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> ERROR:  schema "myschema" does not exist
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> NOTICE:  created replication slot "testsub" on publisher
> CREATE SUBSCRIPTION
>

Yes this feature will also have the same behavior, DDL creation should
be taken care of by DBA similar to how it is handled may be using
pg_dump or use sql scripts/statements to update.

> > > Since the schema can have other objects such as data types, functions,
> > > operators, I'm sure with your feature, non-table objects will be
> > > skipped.
> > >
> >
> > Yes, only table data will be sent to subscribers, non-table objects
> > will be skipped.
>
> Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
> for all the tables in the database, does this i.e. skips non-table
> objects and temporary tables, foreign tables and so on. So, your
> feature also can behave the same way, but within the scope of the
> given schema/s.
>

Yes, it will support only normal tables. Non table objects, foreign
tables & temporary tables will not be supported.

> > > As Amit pointed out earlier, the behaviour when schema dropped, I
> > > think we should also consider when schema is altered, say altered to a
> > > different name, maybe we should change that in the publication too.
> > >
> >
> > I agree that when schema is altered the renamed schema should be
> > reflected in the publication.
>
> I think, it's not only making sure that the publisher side has the new
> altered schema, but also the subscriber needs those alters. Having
> said that, since these alters come under DDL changes and in logical
> replication we don't publish the scheme changes to the subscriber, we
> may not need to anything extra for informing the schema alters to the
> subscriber from the publisher, the users might have to do the same
> schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
> REFRESH PUBLICATION;  should work for them? If this understanding is
> correct, then we should document this.
>

Yes, alter schema changes will be reflected in the publication, the
corresponding change needs to be done by the user on the subscriber
side. Once a user does ALTER SUBSCRIPTION testsub REFRESH PUBLICATION,
the new altered schema changes will be reflected in the subscriber. I
will update the documentation that user need to take care for
subscription refresh.

> > > In general, what happens if we have some temporary tables or foreign
> > > tables inside the schema, will they be allowed to send the data to
> > > subscribers?
> > >
> >
> > Temporary tables & foreign tables will not be added to the publications.
>
> Yes the existing logical replication framework doesn't allow
> replication of temporary, unlogged, foreign tables and other non-table
> relations such as materialized views, indexes etc [1]. The CREATE
> PUBLICATION statement either fails in check_pu

Re: Added schema level support for publication.

2021-01-15 Thread vignesh C
Thanks Rahila for your comments, please find my thoughts below.

On Tue, Jan 12, 2021 at 5:16 PM Rahila Syed  wrote:
>
> Hi Vignesh,
>
> I had a look at the patch, please consider following comments.
>
> On Thu, Jan 7, 2021 at 10:03 PM vignesh C  wrote:
>>
>> Hi,
>>
>> This feature adds schema option while creating publication. Users will
>> be able to specify one or more schemas while creating publication,
>> when the user specifies schema option, then the data changes for the
>> tables present in the schema specified by the user will be replicated
>> to the subscriber. Few examples have been listed below:
>>
>> Create a publication that publishes all changes for all the tables
>> present in production schema:
>> CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA
production;
>>
> Should it be FOR TABLES IN SCHEMA instead of FOR ALL TABLES SCHEMA?
>

For adding tables into publication we have syntax like:
CREATE PUBLICATION mypub FOR TABLE tbl1, tbl2;
For all tables we have syntax like:
CREATE PUBLICATION mypub FOR ALL TABLES;

Initial syntax that I proposed was:
CREATE PUBLICATION production_publication *FOR ALL TABLES SCHEMA*
production;

I feel the below syntax is better, as it is consistent with others:
CREATE PUBLICATION mypub *FOR SCHEMA* sch1, sch2;

>>
>> Create a publication that publishes all changes for all the tables
>> present in marketing and sales schemas:
>> CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing,
sales;
>>
>> Add some schemas to the publication:
>> ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june,
sales_june;
>>
> As per current implementation this command fails even if one of the
schemas does not
> exist. I think this is counterintuitive, it should throw a warning and
continue adding the rest.
>

We have the similar behavior in case of adding non-existent table while
creating a publication:
CREATE PUBLICATION mypub3 FOR TABLE non_existent_table;
ERROR:  relation "non_existent_table" does not exist
I feel we can keep the behavior similarly to maintain the consistency.

>>
>> Drop some schema from the publication:
>> ALTER PUBLICATION production_quarterly_publication DROP SCHEMA
production_july;
>>
> Same for drop schema, if one of these schemas does not exist in
publication,
> the entire DROP operation is aborted.

We have similar behavior in case of dropping non-existent table while
altering publication
alter publication mypub5 drop table test1,testx;
ERROR:  relation "testx" does not exist
I feel we can keep the behavior similarly to maintain the consistency.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Printing backtrace of postgres processes

2021-01-16 Thread vignesh C
On Sat, Jan 16, 2021 at 1:40 AM Andres Freund  wrote:
>
> On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > On 2020-12-08 10:38, vignesh C wrote:
> > > I have implemented printing of backtrace based on handling it in
> > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > getting backtrace of any particular process based on the suggestions.
> > > Attached patch has the implementation for the same.
> > > Thoughts?
> >
> > Are we willing to use up a signal for this?
>
> Why is a full signal needed? Seems the procsignal infrastructure should
> suffice?

Most of the processes have access to ProcSignal, for these processes
printing of callstack signal was handled by using ProcSignal. Pgstat
process & syslogger process do not have access to ProcSignal,
multiplexing with SIGUSR1 is not possible for these processes. So I
handled the printing of callstack for pgstat process & syslogger using
the SIGUSR2 signal.
This is because shared memory is detached before pgstat & syslogger
process is started by using the below:
/* Drop our connection to postmaster's shared memory, as well */
dsm_detach_all();
PGSharedMemoryDetach();

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Printing backtrace of postgres processes

2021-01-17 Thread vignesh C
On Sat, Jan 16, 2021 at 11:10 PM Andres Freund  wrote:
>
> Hi,
>
> On Sat, Jan 16, 2021, at 09:34, vignesh C wrote:
> > On Sat, Jan 16, 2021 at 1:40 AM Andres Freund  wrote:
> > >
> > > On 2021-01-15 09:53:05 +0100, Peter Eisentraut wrote:
> > > > On 2020-12-08 10:38, vignesh C wrote:
> > > > > I have implemented printing of backtrace based on handling it in
> > > > > CHECK_FOR_INTERRUPTS. This patch also includes the change to allow
> > > > > getting backtrace of any particular process based on the suggestions.
> > > > > Attached patch has the implementation for the same.
> > > > > Thoughts?
> > > >
> > > > Are we willing to use up a signal for this?
> > >
> > > Why is a full signal needed? Seems the procsignal infrastructure should
> > > suffice?
> >
> > Most of the processes have access to ProcSignal, for these processes
> > printing of callstack signal was handled by using ProcSignal. Pgstat
> > process & syslogger process do not have access to ProcSignal,
> > multiplexing with SIGUSR1 is not possible for these processes. So I
> > handled the printing of callstack for pgstat process & syslogger using
> > the SIGUSR2 signal.
> > This is because shared memory is detached before pgstat & syslogger
> > process is started by using the below:
> > /* Drop our connection to postmaster's shared memory, as well */
> > dsm_detach_all();
> > PGSharedMemoryDetach();
>
> Sure. But why is it important enough to support those that we are willing to 
> dedicate a signal to the task? Their backtraces aren't often interesting, so 
> I think we should just ignore them here.

Thanks for your comments Andres, I will ignore it for the processes
which do not have access to ProcSignal. I will make the changes and
post a patch for this soon.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Added schema level support for publication.

2021-01-18 Thread vignesh C
On Sat, Jan 9, 2021 at 5:21 PM vignesh C  wrote:
>
> On Fri, Jan 8, 2021 at 4:32 PM Amit Kapila  wrote:
> >
> > On Thu, Jan 7, 2021 at 10:03 PM vignesh C  wrote:
> > >
> > > This feature adds schema option while creating publication. Users will
> > > be able to specify one or more schemas while creating publication,
> > > when the user specifies schema option, then the data changes for the
> > > tables present in the schema specified by the user will be replicated
> > > to the subscriber. Few examples have been listed below:
> > >
> > > Create a publication that publishes all changes for all the tables
> > > present in production schema:
> > > CREATE PUBLICATION production_publication FOR ALL TABLES SCHEMA 
> > > production;
> > >
> > > Create a publication that publishes all changes for all the tables
> > > present in marketing and sales schemas:
> > > CREATE PUBLICATION sales_publication FOR ALL TABLES SCHEMA marketing, 
> > > sales;
> > >
> > > Add some schemas to the publication:
> > > ALTER PUBLICATION sales_publication ADD SCHEMA marketing_june, sales_june;
> > >
> > > Drop some schema from the publication:
> > > ALTER PUBLICATION production_quarterly_publication DROP SCHEMA 
> > > production_july;
> > >
> > > Attached is a POC patch for the same. I felt this feature would be quite 
> > > useful.
> > >
> >
> > What do we do if the user Drops the schema? Do we automatically remove
> > it from the publication?
> >
> I have not yet handled this scenario yet, I will handle this and
> adding of tests in the next patch.
>

I have handled the above scenario(drop schema should automatically
remove the schema entry from publication schema relation) & addition
of tests in the new v2 patch attached.
Thoughts?


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From d14e47368091d81f22ab11b94ba0716e0d918471 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 18 Jan 2021 18:08:51 +0530
Subject: [PATCH v2]  Added schema level support for publication.

This patch adds schema level support for publication.  User can specify multiple
schemas with schema option. When user specifies schema option, then the tables
present in the schema specified will be selected by publisher for sending the
data to subscriber.
---
 doc/src/sgml/ref/alter_publication.sgml  |  45 +++-
 doc/src/sgml/ref/create_publication.sgml |  31 ++-
 src/backend/catalog/Makefile |   4 +-
 src/backend/catalog/aclchk.c |   2 +
 src/backend/catalog/dependency.c |   9 +
 src/backend/catalog/objectaddress.c  | 138 +
 src/backend/catalog/pg_publication.c | 134 +++-
 src/backend/commands/alter.c |   1 +
 src/backend/commands/event_trigger.c |   4 +
 src/backend/commands/publicationcmds.c   | 266 +++-
 src/backend/commands/seclabel.c  |   1 +
 src/backend/commands/tablecmds.c |   1 +
 src/backend/parser/gram.y|  76 ---
 src/backend/utils/cache/syscache.c   |  23 +++
 src/bin/pg_dump/common.c |   3 +
 src/bin/pg_dump/pg_backup_archiver.c |   3 +-
 src/bin/pg_dump/pg_dump.c| 155 +-
 src/bin/pg_dump/pg_dump.h|  17 ++
 src/bin/pg_dump/pg_dump_sort.c   |   7 +
 src/bin/psql/describe.c  | 110 +-
 src/include/catalog/dependency.h |   1 +
 src/include/catalog/pg_publication.h |  16 +-
 src/include/catalog/pg_publication_schema.h  |  49 +
 src/include/commands/publicationcmds.h   |   1 +
 src/include/nodes/parsenodes.h   |   3 +
 src/include/utils/syscache.h |   2 +
 src/test/regress/expected/object_address.out |   6 +-
 src/test/regress/expected/publication.out| 298 ++-
 src/test/regress/expected/sanity_check.out   |   1 +
 src/test/regress/sql/object_address.sql  |   3 +
 src/test/regress/sql/publication.sql |  87 +++-
 31 files changed, 1401 insertions(+), 96 deletions(-)
 create mode 100644 src/include/catalog/pg_publication_schema.h

diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index faa114b..d6199af 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -24,6 +24,9 @@ PostgreSQL documentation
 ALTER PUBLICATION name ADD TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name SET TABLE [ ONLY ] table_name [ * ] [, ...]
 ALTER PUBLICATION name DROP TABLE [ ONLY ] table_name [ * ] [, ...]
+ALTER PUBLICATION name ADD SCHEMA schema_name [, ...]
+ALTER PUBL

  1   2   3   4   5   6   7   8   9   10   >