Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Sat, Feb 19, 2022 at 10:07 PM Andres Freund  wrote:
>
>
> I also still think that _worker shouldn't be part of any of the naming
> here.
>

Okay, the other options that comes to mind for this are:
pg_subscription_replication_error, or
pg_subscription_lreplication_error, or pg_subscription_lrep_error? We
can use similar naming at another place (view) if required.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Sat, Feb 19, 2022 at 10:35 PM David G. Johnston
 wrote:
>
> On Sat, Feb 19, 2022 at 9:37 AM Andres Freund  wrote:
>>
>> IMO the type of information you'd want for apply failures is substantially
>>
>> different enough from worker failures that I don't really see the temptation
>> to put them in the same table.
>>
>
> It's an error message and a transaction LSN in both cases right now, along 
> with knowledge of whether said transaction only affects a single table (relid 
> is not null) or not (relid is null).  Do you have a concrete idea in mind 
> that would make this separation need more obvious?
>

I would also like to mention that in some cases, sync workers also
behaves like apply worker (after initial sync till it catches up with
the apply worker) and try to stream and apply changes similar to apply
worker. The error during that phase will be no different than the
apply worker. One idea to make the separation more obvious is to
introduce 'worker_type' column similar to backend_type in
pg_stat_activity which will tell whether it is an apply worker or a
table sync worker.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-20 Thread Amit Kapila
On Mon, Feb 21, 2022 at 11:04 AM Andres Freund  wrote:
>
> On 2022-02-21 12:56:46 +0900, Masahiko Sawada wrote:
>
> > To take a precedent, we used to store accumulative statistics such as
> > spill_txns to pg_stat_replication, but then for the same reason we moved
> > those statistics to the new stats view, pg_stat_replication_slot. New
> > subscription statistics that we're introducing are cumulative statistics
> > whereas pg_stat_subscription is a dynamic statistics view.
>
> I'm happy to have cumulative subscriber stats somewhere in pgstats. But it
> shouldn't be split by worker or relation, and it shouldn't contain
> non-cumulative error information.
>

Fair enough. Then, how about the following keeping the following information:

* subid (subscription id)
* subname (subscription name)
* sync_error_count/sync_failure_count (number of timed table sync failed)
* apply_error_count/apply_failure_count (number of times apply failed)
* sync_success_count (number of table syncs successfully finished)
* apply_commit_count (number of transactions applied successfully)
* apply_rollback_count (number of transactions explicitly rolled back)
* stats_reset (Time at which these statistics were last reset)

The view name could be pg_stat_subscription_lrep,
pg_stat_logical_replication, or something on those lines.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-21 Thread Amit Kapila
On Mon, Feb 21, 2022 at 1:18 PM Andres Freund  wrote:
>
> On 2022-02-21 12:39:31 +0530, Amit Kapila wrote:
> > Fair enough. Then, how about the following keeping the following 
> > information:
>
> Mostly sounds good.
>
>
> > * subid (subscription id)
> > * subname (subscription name)
>
> Coming from catalog, via join, I assume?
>

The subname would be from pg_subscription catalog similar to what we
are doing now for pg_stat_subscription_workers.

>
> > * sync_error_count/sync_failure_count (number of timed table sync failed)
> > * apply_error_count/apply_failure_count (number of times apply failed)
>
> Yep.
>
>
> > * sync_success_count (number of table syncs successfully finished)
>
> This one I'm not quite convinced by. You can't rely on precise counts with
> pgstats and we should be able to get a better idea from somewhere more
> permanent which relations succeeded?  But it also doesn't do much harm, so ...
>

We can get precise information from pg_subscription_rel (rels that are
in ready/finish_copy state) but OTOH, during refresh some of the rels
would have been dropped or if a user creates/refreshes publication
with copy_data = false, then we won't get information about how many
table syncs succeeded? I have also kept this to make the sync
information look consistent considering we have sync_failure_count.

>
> > * apply_commit_count (number of transactions applied successfully)
> > * apply_rollback_count (number of transactions explicitly rolled back)
>
> What does "explicit" mean here?
>

It is for the Rollback Prepared case and probably for streaming of
in-progress transactions that eventually get rolled back.

>
> > * stats_reset (Time at which these statistics were last reset)
> >
> > The view name could be pg_stat_subscription_lrep,
> > pg_stat_logical_replication, or something on those lines.
>
> pg_stat_subscription_stats :)
>

Having *stat* two times in the name sounds slightly odd to me but let
us see what others think. One more option could be
pg_stat_subscription_replication.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-21 Thread Amit Kapila
On Mon, Feb 21, 2022 at 11:21 AM Masahiko Sawada  wrote:
>
> >
> > I am aware about the discussions on the parent view for the first
> > case and its design issues, but it does not change the fact that we'd
> > better address the second case on HEAD IMO.
> >
> > Thoughts?
>
> Agreed.
>

+1. How about attached?

-- 
With Regards,
Amit Kapila.


fix_pgproc_get_repl_slot_1.patch
Description: Binary data


Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-21 Thread Amit Kapila
On Mon, Feb 21, 2022 at 4:56 PM Michael Paquier  wrote:
>
> On Mon, Feb 21, 2022 at 04:19:59PM +0530, Amit Kapila wrote:
> > On Mon, Feb 21, 2022 at 11:21 AM Masahiko Sawada  
> > wrote:
> >> Agreed.
> >>
> >
> > +1. How about attached?
>
> That's the same thing as what I sent upthread, so that's correct to
> me, except that I have fixed both functions :)
>

Sorry, I hadn't looked at your patch.

> You are not touching pg_stat_get_subscription_worker() because the
> plan is to revert it from HEAD?  I have not followed the other
> discussion closely.
>

It is still under discussion. I'll take the necessary action along
with other things related to that view based on the conclusion on that
thread.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-21 Thread Amit Kapila
On Mon, Feb 21, 2022 at 9:37 PM David G. Johnston
 wrote:
>
> On Mon, Feb 21, 2022 at 2:19 AM Amit Kapila  wrote:
>>
>> On Mon, Feb 21, 2022 at 1:18 PM Andres Freund  wrote:
>>
>> > > The view name could be pg_stat_subscription_lrep,
>> > > pg_stat_logical_replication, or something on those lines.
>> >
>> > pg_stat_subscription_stats :)
>> >
>>
>> Having *stat* two times in the name sounds slightly odd to me but let
>> us see what others think. One more option could be
>> pg_stat_subscription_replication.
>>
>
> Agreed.
>
> pg_stat_subscription_activity
>
> We already have pg_stat_activity (which may be an argument against the 
> suggestion...)
>

I don't know if that can be an argument against it but one can imagine
that we record other subscription changes like (change of
publications, etc.). I personally feel it may be better to add
'_replication' in some way like pg_stat_sub_replication_activity but I
am fine either way.

--
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-22 Thread Amit Kapila
On Tue, Feb 22, 2022 at 11:15 AM Masahiko Sawada  wrote:
>
> I've attached a patch that changes pg_stat_subscription_workers view.
> It removes non-cumulative values such as error details such as
> error-XID and the error message from the view, and consequently the
> view now has only cumulative statistics counters: apply_error_count
> and sync_error_count. Since the new view name is under discussion I
> temporarily chose pg_stat_subscription_activity.
>

Few comments:
=
1.
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -637,11 +637,9 @@ REVOKE EXECUTE ON FUNCTION
pg_stat_reset_single_table_counters(oid) FROM public;

 REVOKE EXECUTE ON FUNCTION
pg_stat_reset_single_function_counters(oid) FROM public;

-REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
-
-REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid) FROM public;
+REVOKE EXECUTE ON FUNCTION
pg_stat_reset_single_subscription_counters(oid) FROM public;

-REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_worker(oid,
oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;

Is there a need to change anything about
pg_stat_reset_replication_slot() in this patch?

2. Do we still need to use LATERAL in the view's query?

3. Can we send error stats pgstat_report_stat() as that will be called
via proc_exit() path. We can set the phase (apply/sync) in
apply_error_callback_arg and then use that to send the appropriate
message. I think this will obviate the need for try..catch.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-02-22 Thread Amit Kapila
On Tue, Feb 22, 2022 at 4:47 AM Euler Taveira  wrote:
>
> On Thu, Feb 17, 2022, at 3:36 AM, Amit Kapila wrote:
>
> As there is a new version, I would like to wait for a few more days
> before committing. I am planning to commit this early next week (by
> Tuesday) unless others or I see any more things that can be improved.
>
> Amit, I don't have additional comments or suggestions. Let's move on. Next
> topic. :-)
>

Pushed!

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-22 Thread Amit Kapila
On Tue, Feb 22, 2022 at 8:02 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 22, 2022 at 6:53 PM Amit Kapila  wrote:
> >
>
> > 3. Can we send error stats pgstat_report_stat() as that will be called
> > via proc_exit() path. We can set the phase (apply/sync) in
> > apply_error_callback_arg and then use that to send the appropriate
> > message. I think this will obviate the need for try..catch.
>
> If we use pgstat_report_stat() to send subscription stats messages,
> all processes end up going through that path. It might not bring
> overhead in practice but I'd like to avoid it.
>

I am not sure about overhead but I see another problem if we use that
approach. In the exit path, logicalrep_worker_onexit() will get called
before pgstat_report_stat() and that will clear the
MyLogicalRepWorker->subid, so we won't know the id for which to send
stats. So, the way patch is doing seems reasonable to me unless
someone has better ideas.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-22 Thread Amit Kapila
On Tue, Feb 22, 2022 at 8:15 AM Michael Paquier  wrote:
>
> On Mon, Feb 21, 2022 at 05:00:43PM +0530, Amit Kapila wrote:
> > On Mon, Feb 21, 2022 at 4:56 PM Michael Paquier  wrote:
> >
> > It is still under discussion. I'll take the necessary action along
> > with other things related to that view based on the conclusion on that
> > thread.
>
> Sounds good to me.  The patch you are proposing upthread is OK for
> me.
>

Thanks, so you are okay with me pushing that patch just to HEAD. We
don't want to backpatch this to 14 as this is a catalog change and
won't cause any user-visible issue, is that correct?

-- 
With Regards,
Amit Kapila.




Re: Failed transaction statistics to measure the logical replication progress

2022-02-22 Thread Amit Kapila
On Tue, Feb 22, 2022 at 6:45 AM tanghy.f...@fujitsu.com
 wrote:
>
> I found a problem when using it. When a replication workers exits, the
> transaction stats should be sent to stats collector if they were not sent 
> before
> because it didn't reach PGSTAT_STAT_INTERVAL. But I saw that the stats weren't
> updated as expected.
>
> I looked into it and found that the replication worker would send the
> transaction stats (if any) before it exits. But it got invalid subid in
> pgstat_send_subworker_xact_stats(), which led to the following result:
>
> postgres=# select pg_stat_get_subscription_worker(0, null);
>  pg_stat_get_subscription_worker
> -
>  (0,,2,0,00,"",)
> (1 row)
>
> I think that's because subid has already been cleaned when trying to send the
> stats. I printed the value of before_shmem_exit_list, the functions in this 
> list
> would be called in shmem_exit() when the worker exits.
> logicalrep_worker_onexit() would clean up the worker info (including subid), 
> and
> pgstat_shutdown_hook() would send stats if any.  logicalrep_worker_onexit() 
> was
> called before calling pgstat_shutdown_hook().
>

Yeah, I think that is a problem and maybe we can think of solving it
by sending the stats via logicalrep_worker_onexit before subid is
cleared but not sure that is a good idea. I feel we need to go back to
the idea of v21 for sending stats instead of using pgstat_report_stat.
I think the one thing which we could improve is to avoid trying to
send it each time before receiving each message by walrcv_receive and
rather try to send it before we try to wait (WaitLatchOrSocket).
Trying after each message doesn't seem to be required and could lead
to some overhead as well. What do you think?

-- 
With Regards,
Amit Kapila.




Re: Logical replication timeout problem

2022-02-23 Thread Amit Kapila
On Tue, Feb 22, 2022 at 9:17 AM wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Feb 18, 2022 at 10:51 AM Ajin Cherian  wrote:
> > Some comments:
> Thanks for your review.
>
> >  I see you only track skipped Inserts/Updates and Deletes. What about
> > DDL operations that are skipped, what about truncate.
> > What about changes made to unpublished tables? I wonder if you could
> > create a test script that only did DDL operations
> > and truncates, would this timeout happen?
> According to your suggestion, I tested with DDL and truncate.
> While testing, I ran only 20,000 DDLs and 10,000 truncations in one
> transaction.
> If I set wal_sender_timeout and wal_receiver_timeout to 30s, it will time out.
> And if I use the default values, it will not time out.
> IMHO there should not be long transactions that only contain DDL and
> truncation. I'm not quite sure, do we need to handle this kind of use case?
>

I think it is better to handle such cases as well and changes related
to unpublished tables as well. BTW, it seems Kuroda-San has also given
some comments [1] which I am not sure are addressed.

I think instead of keeping the skipping threshold w.r.t
wal_sender_timeout, we can use some conservative number like 1 or
so which we are sure won't impact performance and won't lead to
timeouts.

*
+ /*
+ * skipped_changes_count is reset when processing changes that do not need to
+ * be skipped.
+ */
+ skipped_changes_count = 0

When the skipped_changes_count is reset, the sendTime should also be
reset. Can we reset it whenever the UpdateProgress function is called
with send_keep_alive as false?

[1] - 
https://www.postgresql.org/message-id/TYAPR01MB5866BD2248EF82FF432FE599F52D9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-02-23 Thread Amit Kapila
On Wed, Feb 23, 2022 at 4:54 AM Tomas Vondra
 wrote:
>
> 7) This updates psql to do roughly the same thing as for tables, so \dRp
> now list sequences added either directly or through schema, so you might
> get footer like this:
>
>   \dRp+ testpub_mix
>   ...
>   Tables:
>   "public.testpub_tbl1"
>   Tables from schemas:
>   "pub_test"
>   Sequences:
>   "public.testpub_seq1"
>   Sequences from schemas:
>   "pub_test"
>
> Maybe it's a bit too verbose, though. It also addes "All sequences" and
> "Sequences" columns into the publication description, but I don't think
> that can be done much differently.
>
> FWIW I had to switch the describeOneTableDetails() chunk dealing with
> sequences from printQuery() to printTable() in order to handle dynamic
> footers.
>
> There's also a change in \dn+ because a schema may be included in one
> publication as "FOR ALL SEQUENCES IN SCHEMA" and in another publication
> with "FOR ALL TABLES IN SCHEMA". So I modified the output to
>
>   \dn+ pub_test1
>   ...
>   Publications:
>   "testpub_schemas" (sequences)
>   "testpub_schemas" (tables)
>
> But maybe it'd be better to aggregate this into a single line like
>
>   \dn+ pub_test1
>   ...
>   Publications:
>   "testpub_schemas" (tables, sequences)
>
> Opinions?
>

I think the second one (aggregated) might be slightly better as that
will lead to a lesser number of lines when there are multiple such
publications but it should be okay if you and others prefer first.

> 8) There's a shortcoming in the current grammar, because you can specify
> either
>
>   CREATE PUBLICATION p FOR ALL TABLES;
>
> or
>
>   CREATE PUBLICATION p FOR ALL SEQUENCES;
>
> but it's not possible to do
>
>   CREATE PUBLICATION p FOR ALL TABLES AND FOR ALL SEQUENCES;
>
> which seems like a fairly reasonable thing users might want to do.
>

Isn't it better to support this with a syntax as indicated by Tom in
one of his earlier emails on this topic [1]? IIUC, it would be as
follows:

CREATE PUBLICATION p FOR ALL TABLES, ALL SEQUENCES;

> The problem is that "FOR ALL TABLES" (and same for sequences) is
> hard-coded in the grammar, not defined as PublicationObjSpec. This also
> means you can't do
>
>   ALTER PUBLICATION p ADD ALL TABLES;
>
> AFAICS there are two ways to fix this - adding the combinations into the
> definition of CreatePublicationStmt, or adding FOR ALL TABLES (and
> sequences) to PublicationObjSpec.
>

I can imagine that adding to PublicationObjSpec will look compatible
with existing code but maybe another way will also be okay.

[1] - https://www.postgresql.org/message-id/877603.1629120678%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-02-23 Thread Amit Kapila
On Thu, Feb 24, 2022 at 7:43 AM Shinoda, Noriyoshi (PN Japan FSIP)
 wrote:
>
> Hi,
> Thank you for developing of the great feature.
> If multiple tables are specified when creating a PUBLICATION,
> is it supposed that the WHERE clause condition is given to only one table?
>

You can give it for multiple tables. See below as an example:

> --- operation log ---
> postgres=> CREATE TABLE data1(c1 INT PRIMARY KEY, c2 VARCHAR(10));
> CREATE TABLE
> postgres=> CREATE TABLE data2(c1 INT PRIMARY KEY, c2 VARCHAR(10));
> CREATE TABLE
> postgres=> CREATE PUBLICATION pub1 FOR TABLE data1,data2 WHERE (c1 < 1000);
> CREATE PUBLICATION

postgres=# CREATE PUBLICATION pub_data_1 FOR TABLE data1 WHERE (c1 >
10), data2 WHERE (c1 < 1000);
CREATE PUBLICATION

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-02-23 Thread Amit Kapila
On Thu, Feb 24, 2022 at 7:57 AM Peter Smith  wrote:
>
> I noticed that there was a build-farm failure on the machine 'komodoensis' [1]
>
> #   Failed test 'check replicated rows to tab_rowfilter_toast'
> #   at t/028_row_filter.pl line 687.
> #  got: ''
> # expected: 't|1'
> # Looks like you failed 1 test of 20.
> [18:21:24] t/028_row_filter.pl 
> Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/20 subtests
>
> That failure looks intermittent because from the history you can see
> the same machine already passed multiple times in this test case.
>
> When I investigated the test case I noticed there seems to be a
> missing "catchup" ($node_publisher->wait_for_catchup($appname);), so
> sometimes if the replication happens too slowly then the expected row
> might not be found on the subscriber side.
>

Your analysis seems correct to me and it is evident from the result as
well. Reviewing the test, it seems other similar places already have
the catchup but it is missed after this update test.

> I will post a patch to fix this shortly.
>

Thanks.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-23 Thread Amit Kapila
On Thu, Feb 24, 2022 at 7:03 AM Masahiko Sawada  wrote:
>
> > ~~~
> >
> > 6.  doc/src/sgml/monitoring.sgml - why two counters?
> >
> > Please forgive this noob question...
> >
> > I see there are 2 error_count columns (one for each kind of worker)
> > but I was wondering why it is useful for users to be able to
> > distinguish if the error came from the tablesync workers or from the
> > apply workers? Do you have any example?
> >
> > Also, IIRC sometimes the tablesync might actually do a few "apply"
> > changes itself... so the distinction may become a bit fuzzy...
>
> I think that the tablesync phase and the apply phase can fail for
> different reasons. So these values would be a good indicator for users
> to check if each phase works fine.
>
> After more thoughts, I think it's better to increment sync_error_count
> also when a tablesync worker fails while applying the changes.
>

This sounds reasonable to me because even if we are applying the
changes in tablesync worker, it is only for that particular table. So,
it seems okay to increment it under category with the description:
"Number of times the error occurred during the initial table
synchronization".

-- 
With Regards,
Amit Kapila.




Re: speed up a logical replica setup

2022-02-23 Thread Amit Kapila
On Mon, Feb 21, 2022 at 5:41 PM Euler Taveira  wrote:
>
> Logical replication has been used to migration with minimal downtime. However,
> if you are dealing with a big database, the amount of required resources (disk
> -- due to WAL retention) increases as the backlog (WAL) increases. Unless you
> have a generous amount of resources and can wait for long period of time until
> the new replica catches up, creating a logical replica is impracticable on
> large databases.
>
> The general idea is to create and convert a physical replica or a base backup
> (archived WAL files available) into a logical replica. The initial data copy
> and catchup tends to be faster on a physical replica. This technique has been
> successfully used in pglogical_create_subscriber [1].
>

Sounds like a promising idea.

> A new tool called pg_subscriber does this conversion and is tightly integrated
> with Postgres.
>
> DESIGN
>
> The conversion requires 8 steps.
>
> 1. Check if the target data directory has the same system identifier than the
> source data directory.
> 2. Stop the target server if it is running as a standby server. (Modify
> recovery parameters requires a restart.)
> 3. Create one replication slot per specified database on the source server. 
> One
> additional replication slot is created at the end to get the consistent LSN
> (This consistent LSN will be used as (a) a stopping point for the recovery
> process and (b) a starting point for the subscriptions).
>

What is the need to create an extra slot other than the slot for each
database? Can't we use the largest LSN returned by slots as the
recovery-target-lsn and starting point for subscriptions?

How, these additional slots will get freed or reused when say the
server has crashed/stopped after creating the slots but before
creating the subscriptions? Users won't even know the names of such
slots as they are internally created.

> 4. Write recovery parameters into the target data directory and start the
> target server (Wait until the target server is promoted).
> 5. Create one publication (FOR ALL TABLES) per specified database on the 
> source
> server.
> 6. Create one subscription per specified database on the target server (Use
> replication slot and publication created in a previous step. Don't enable the
> subscriptions yet).
> 7. Sets the replication progress to the consistent LSN that was got in a
> previous step.
> 8. Enable the subscription for each specified database on the target server.
>
> This tool does not take a base backup. It can certainly be included later.
> There is already a tool do it: pg_basebackup.
>

The backup will take the backup of all the databases present on the
source server. Do we need to provide the way/recommendation to remove
the databases that are not required?

Can we see some numbers with various sizes of databases (cluster) to
see how it impacts the time for small to large size databases as
compared to the traditional method? This might help giving users
advice on when to use this tool?


-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  wrote:
>
> Here are some comments:
>
> Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
>

I have given this comment to move the related code to separate
functions to slightly simplify ApplyWorkerMain() code but if you don't
like we can move it back. I am not sure I like the new function names
in the patch though.

> ---
> +   /*
> +* Log the error that caused DisableSubscriptionOnError to be called. 
> We
> +* do this immediately so that it won't be lost if some other internal
> +* error occurs in the following code.
> +*/
> +   EmitErrorReport();
> +   AbortOutOfAnyTransaction();
> +   FlushErrorState();
>
> Do we need to hold interrupts during cleanup here?
>

I think so. We do prevent interrupts via
HOLD_INTERRUPTS/RESUME_INTERRUPTS during cleanup.


-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 2:24 PM Peter Smith  wrote:
>
> 9. src/backend/postmaster/pgstat.c - pgstat_recv_subscription_purge
>
> static void
> pgstat_recv_subscription_purge(PgStat_MsgSubscriptionPurge *msg, int len)
> {
> /* Return if we don't have replication subscription statistics */
> if (subscriptionStatHash == NULL)
> return;
>
> /* Remove from hashtable if present; we don't care if it's not */
> (void) hash_search(subscriptionStatHash, (void *) &(msg->m_subid),
>HASH_REMOVE, NULL);
> }
>
> SUGGESTION
> Wouldn't the above code be simpler written like:
>
> if (subscriptionStatHash)
> {
> /* Remove from hashtable if present; we don't care if it's not */
> (void) hash_search(subscriptionStatHash, (void *) &(msg->m_subid),
>HASH_REMOVE, NULL);
> }
> ~~~
>

I think we can write that way as well but I would prefer the way it is
currently in the patch as we use a similar pattern in nearby code (ex.
pgstat_recv_resetreplslotcounter) and at other places in the code base
as well.


> 10. src/backend/replication/logical/worker.c
>
> (from my previous [Peter-v1] #9)
>
> >> + /* Report the worker failed during table synchronization */
> >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, false);
> >>
> >> and
> >>
> >> + /* Report the worker failed during the application of the change */
> >> + pgstat_report_subscription_error(MyLogicalRepWorker->subid, true);
> >>
> >>
> >> Why don't these use MySubscription->oid instead of 
> >> MyLogicalRepWorker->subid?
>
> > It's just because we used to use MyLogicalRepWorker->subid, is there
> > any particular reason why we should use MySubscription->oid here?
>
> I felt MySubscription->oid is a more natural and more direct way of
> expressing the same thing.
>
> Consider:  "the oid of the current subscription" versus "the oid of
> the subscription of the current worker". IMO the first one is simpler.
> YMMV.
>

I think we can use either but maybe MySubscription->oid would be
slightly better here as the same is used in nearby code as well.

> Also, it is shorter :)
>
> ~~~
>
> 11. src/include/pgstat.h  - enum order
>
> (follow-on from my previous v1 review comment #10)
>
> >I assume you're concerned about binary compatibility or something. I
> >think it should not be a problem since both
> >PGSTAT_MTYPE_SUBWORKERERROR and PGSTAT_MTYPE_SUBSCRIPTIONPURGE are
> >introduced to PG15.
>
> Yes, maybe it is OK for those ones. But now in v2 there is a new
> PGSTAT_MTYPE_RESETSUBCOUNTER.
>
> Shouldn't at least that one be put at the end for the same reason?
>
> ~~~
>

I don't see the reason to put that at end. It is better to add it near
to similar RESET enums.

>
> 13. src/test/subscription/t/026_worker_stats.pl - missing test?
>
> Shouldn't there also be some test to reset the counters to confirm
> that they really do get reset to zero?
>
> ~~~
>

I think we avoid writing tests for stats for each and every case as it
is not reliable in nature (the message can be lost). If we can find a
reliable way then it is okay.

-- 
With Regards,
Amit Kapila.




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-24 Thread Amit Kapila
On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
 wrote:
>
> I'm trying to understand why hash indexes are built primarily in shared
> buffers except when allocating a new splitpoint's worth of bucket pages
> -- which is done with smgrextend() directly in _hash_alloc_buckets().
>
> Is this just so that the value returned by smgrnblocks() includes the
> new splitpoint's worth of bucket pages?
>
> All writes of tuple data to pages in this new splitpoint will go
> through shared buffers (via hash_getnewbuf()).
>
> I asked this and got some thoughts from Robert in [1], but I still don't
> really get it.
>
> When a new page is needed during the hash index build, why can't
> _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
>

We allocate the chunk of pages (power-of-2 groups) at the time of
split which allows them to appear consecutively in an index. This
helps us to compute the physical block number from bucket number
easily (BUCKET_TO_BLKNO mapping) with some minimal control
information.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 12:03 PM Michael Paquier  wrote:
>
> On Wed, Feb 23, 2022 at 09:52:02AM +0530, Amit Kapila wrote:
> > Thanks, so you are okay with me pushing that patch just to HEAD.
>
> Yes, I am fine with that.  I am wondering about patching the second
> function though, to avoid any risk of forgetting it, but I am fine to
> leave that to your judgement.
>

The corresponding patch with other changes is not very far from being
ready to commit. So, will do it along with that.

> > We don't want to backpatch this to 14 as this is a catalog change and
> > won't cause any user-visible issue, is that correct?
>
> Yup, that's a HEAD-only cleanup, I am afraid.
>

Thanks, Done!

-- 
With Regards,
Amit Kapila.




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-24 Thread Amit Kapila
On Fri, Feb 25, 2022 at 8:54 AM Amit Kapila  wrote:
>
> On Fri, Feb 25, 2022 at 4:41 AM Melanie Plageman
>  wrote:
> >
> > I'm trying to understand why hash indexes are built primarily in shared
> > buffers except when allocating a new splitpoint's worth of bucket pages
> > -- which is done with smgrextend() directly in _hash_alloc_buckets().
> >
> > Is this just so that the value returned by smgrnblocks() includes the
> > new splitpoint's worth of bucket pages?
> >
> > All writes of tuple data to pages in this new splitpoint will go
> > through shared buffers (via hash_getnewbuf()).
> >
> > I asked this and got some thoughts from Robert in [1], but I still don't
> > really get it.
> >
> > When a new page is needed during the hash index build, why can't
> > _hash_expandtable() just call ReadBufferExtended() with P_NEW instead of
> > _hash_getnewbuf()? Does it have to do with the BUCKET_TO_BLKNO mapping?
> >
>
> We allocate the chunk of pages (power-of-2 groups) at the time of
> split which allows them to appear consecutively in an index.
>

I think allocating chunks of pages via "ReadBufferExtended() with
P_NEW" will be time-consuming as compared to what we do now.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 6:30 PM Masahiko Sawada  wrote:
>
> On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  
> > wrote:
> > >
> > > Here are some comments:
> > >
> > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> > >
> >
> > I have given this comment to move the related code to separate
> > functions to slightly simplify ApplyWorkerMain() code but if you don't
> > like we can move it back. I am not sure I like the new function names
> > in the patch though.
>
> Okay, I'm fine with moving this code but perhaps we can find a better
> function name as "Wrapper" seems slightly odd to me.
>

Agreed.

> For example,
> start_table_sync_start() and start_apply_changes() or something (it
> seems we use the snake case for static functions in worker.c).
>

I am fine with something on these lines, how about start_table_sync()
and start_apply() respectively?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada  wrote:
>
> >
> > 6. src/backend/postmaster/pgstat.c - function name
> >
> > +/* --
> > + * pgstat_reset_subscription_counter() -
> > + *
> > + * Tell the statistics collector to reset a single subscription
> > + * counter, or all subscription counters (when subid is InvalidOid).
> > + *
> > + * Permission checking for this function is managed through the normal
> > + * GRANT system.
> > + * --
> > + */
> > +void
> > +pgstat_reset_subscription_counter(Oid subid)
> >
> > SUGGESTION (function name)
> > "pgstat_reset_subscription_counter" -->
> > "pgstat_reset_subscription_counters" (plural)
>
> Fixed.
>

We don't use the plural form in other similar cases like
pgstat_reset_replslot_counter, pgstat_reset_slru_counter, so why do it
differently here?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Amit Kapila
On Thu, Feb 24, 2022 at 9:20 PM Masahiko Sawada  wrote:
>

I have reviewed the latest version and made a few changes along with
fixing some of the pending comments by Peter Smith. The changes are as
follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
that is not required now; (b) changed the struct name
PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
similar to DropDb; (c) changed the view name to
pg_stat_subscription_stats, we can reconsider it in future if there is
a consensus on some other name, accordingly changed the reset function
name to pg_stat_reset_subscription_stats; (d) moved some of the newly
added subscription stats functions adjacent to slots to main the
consistency in code; (e) changed comments at few places; (f) added
LATERAL back to system_views query as we refer pg_subscription's oid
in the function call, previously that was not clear.

Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.


v4-0001-Reconsider-pg_stat_subscription_workers-view.patch
Description: Binary data


Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-25 Thread Amit Kapila
On Fri, Feb 25, 2022 at 7:26 AM Peter Smith  wrote:
>
> Below are my review comments for the v3 patch.
>
...
> 2. doc/src/sgml/monitoring.sgml
>
> +  
> pg_stat_subscription_activitypg_stat_subscription_activity
> +  One row per subscription, showing statistics about subscription
> +  activity.
> +  See 
> +  pg_stat_subscription_activity
> for details.
>
>   
>
> Currently these stats are only about errors. These are not really
> statistics about "activity" though. Probably it is better just to
> avoid that word altogether?
>
> SUGGESTIONS
>
> e.g.1. "One row per subscription, showing statistics about
> subscription activity." --> "One row per subscription, showing
> statistics about errors."
>

I preferred this one and made another change suggested by you in the
latest version posted by me. Thanks!

-- 
With Regards,
Amit Kapila.




Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-25 Thread Amit Kapila
On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
 wrote:
>
> Since _hash_alloc_buckets() WAL-logs the last page of the
> splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> page of the splitpoint doesn't end up having any tuples added to it
> during the index build and the redo pointer is moved past the WAL for
> this page and then later there is a crash sometime before this page
> makes it to permanent storage. Does it matter that this page is lost? If
> not, then why bother WAL-logging it?
>

I think we don't care if the page is lost before we update the
meta-page in the caller because we will try to reallocate in that
case. But we do care after meta page update (having the updated
information about this extension via different masks) in which case we
won't lose this last page because it would have registered the sync
request for it via sgmrextend before meta page update.

-- 
With Regards,
Amit Kapila.




Re: Fix typo in logicalfuncs.c - :%s/private date/Private data

2022-02-25 Thread Amit Kapila
On Fri, Feb 25, 2022 at 8:39 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> Here's a tiny patch to do $subject.
>

LGTM. I'll push this in some time.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Saturday, February 26, 2022 11:51 AM Amit Kapila  
> wrote:
> > I have reviewed the latest version and made a few changes along with fixing
> > some of the pending comments by Peter Smith. The changes are as
> > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that is
> > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge
> > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed the
> > view name to pg_stat_subscription_stats, we can reconsider it in future if 
> > there
> > is a consensus on some other name, accordingly changed the reset function
> > name to pg_stat_reset_subscription_stats; (d) moved some of the newly
> > added subscription stats functions adjacent to slots to main the 
> > consistency in
> > code; (e) changed comments at few places; (f) added LATERAL back to
> > system_views query as we refer pg_subscription's oid in the function call,
> > previously that was not clear.
> >
> > Do let me know what you think of the attached?
> Hi, thank you for updating the patch !
>
>
> I have a couple of comments on v4.
>
> (1)
>
> I'm not sure if I'm correct, but I'd say the sync_error_count
> can come next to the subname as the order of columns.
> I felt there's case that the column order is somewhat
> related to the time/processing order (I imagined
> pg_stat_replication's LSN related columns).
> If this was right, table sync related column could be the
> first column as a counter within this patch.
>

I am not sure if there is such a correlation but even if it is there
it doesn't seem to fit here completely as sync errors can happen after
apply errors in multiple ways like via Alter Subscription ... Refresh
...

So, I don't see the need to change the order here. What do you or others think?

>
> (2) doc/src/sgml/monitoring.sgml
>
> +Resets statistics for a single subscription shown in the
> +pg_stat_subscription_stats view to zero. If
> +the argument is NULL, reset statistics for all
> +subscriptions.
> 
>
> I felt we could improve the first sentence.
>
> From:
> Resets statistics for a single subscription shown in the..
>
> To(idea1):
> Resets statistics for a single subscription defined by the argument to zero.
>

Okay, I can use this one.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada  wrote:
>
> On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  wrote:
> >
> > >
> > > (2) doc/src/sgml/monitoring.sgml
> > >
> > > +Resets statistics for a single subscription shown in the
> > > +pg_stat_subscription_stats view to 
> > > zero. If
> > > +the argument is NULL, reset statistics for all
> > > +subscriptions.
> > > 
> > >
> > > I felt we could improve the first sentence.
> > >
> > > From:
> > > Resets statistics for a single subscription shown in the..
> > >
> > > To(idea1):
> > > Resets statistics for a single subscription defined by the argument to 
> > > zero.
> > >
> >
> > Okay, I can use this one.
>
> Are you going to remove the part "shown in the
> pg_stat_subsctiption_stats view"? I think it's better to keep it in
> order to make it clear which statistics the function resets as we have
> pg_stat_subscription and pg_stat_subscription_stats.
>

How about the following:
"Resets statistics for a single subscription defined by the argument
shown in the pg_stat_subscription_stats view
to zero. If the argument is NULL, reset statistics
for all subscriptions."

-- 
With Regards,
Amit Kapila.




Re: Commitfest manager for 2022-03

2022-02-27 Thread Amit Kapila
On Sun, Feb 27, 2022 at 11:37 PM Euler Taveira  wrote:
>
> On Sat, Feb 26, 2022, at 9:37 PM, Justin Pryzby wrote:
>
> |https://commitfest.postgresql.org/37/2906/
> |Row filtering for logical replication
> If I'm not wrong, this is merged and should be closed?
>
> I think Amit forgot to mark it as committed. Done.
>

I was waiting for some build farm cycles to complete but it is fine to
mark it now.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:59 AM Peter Smith  wrote:
>
>
> 2. doc/src/sgml/monitoring.sgml
>
> +Resets statistics for a single subscription shown in the
> +pg_stat_subscription_stats view to zero. If
> +the argument is NULL, reset statistics for all
> +subscriptions.
> 
>
> SUGGESTED (simpler description, more similar to 
> pg_stat_reset_replication_slot)
> Reset statistics to zero for a single subscription. If the argument is
> NULL, reset statistics for all subscriptions.
>

As discussed, it is better to keep the view name in this description
important as we have another view (pg_stat_susbcription) as well. So,
I am planning to retain the current wording.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:49 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, February 28, 2022 11:34 AM Amit Kapila  
> wrote:
> > On Sat, Feb 26, 2022 at 1:35 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Saturday, February 26, 2022 11:51 AM Amit Kapila
> >  wrote:
> > > > I have reviewed the latest version and made a few changes along with
> > > > fixing some of the pending comments by Peter Smith. The changes are
> > > > as
> > > > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as
> > > > that is not required now; (b) changed the struct name
> > > > PgStat_MsgSubscriptionPurge to PgStat_MsgSubscriptionDrop to make it
> > > > similar to DropDb; (c) changed the view name to
> > > > pg_stat_subscription_stats, we can reconsider it in future if there
> > > > is a consensus on some other name, accordingly changed the reset
> > > > function name to pg_stat_reset_subscription_stats; (d) moved some of
> > > > the newly added subscription stats functions adjacent to slots to
> > > > main the consistency in code; (e) changed comments at few places;
> > > > (f) added LATERAL back to system_views query as we refer
> > pg_subscription's oid in the function call, previously that was not clear.
> > > >
> > > > Do let me know what you think of the attached?
> > > Hi, thank you for updating the patch !
> > > I have a couple of comments on v4.
> > >
> > > (1)
> > >
> > > I'm not sure if I'm correct, but I'd say the sync_error_count can come
> > > next to the subname as the order of columns.
> > > I felt there's case that the column order is somewhat related to the
> > > time/processing order (I imagined pg_stat_replication's LSN related
> > > columns).
> > > If this was right, table sync related column could be the first column
> > > as a counter within this patch.
> > >
> >
> > I am not sure if there is such a correlation but even if it is there it 
> > doesn't seem
> > to fit here completely as sync errors can happen after apply errors in 
> > multiple
> > ways like via Alter Subscription ... Refresh ...
> >
> > So, I don't see the need to change the order here. What do you or others 
> > think?
> In the alter subscription case, any errors after the table sync would 
> increment
> apply_error_count.
>

Sure, but the point I was trying to explain was that there is no
certainty in the order of these errors.

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-02-27 Thread Amit Kapila
On Mon, Feb 28, 2022 at 8:17 AM Masahiko Sawada  wrote:
>
> On Mon, Feb 28, 2022 at 11:33 AM Amit Kapila  wrote:
> >
> > >
> > > (2) doc/src/sgml/monitoring.sgml
> > >
> > > +Resets statistics for a single subscription shown in the
> > > +pg_stat_subscription_stats view to 
> > > zero. If
> > > +the argument is NULL, reset statistics for all
> > > +subscriptions.
> > > 
> > >
> > > I felt we could improve the first sentence.
> > >
> > > From:
> > > Resets statistics for a single subscription shown in the..
> > >
> > > To(idea1):
> > > Resets statistics for a single subscription defined by the argument to 
> > > zero.
> > >
> >
> > Okay, I can use this one.
>
> Are you going to remove the part "shown in the
> pg_stat_subsctiption_stats view"? I think it's better to keep it in
> order to make it clear which statistics the function resets as we have
> pg_stat_subscription and pg_stat_subscription_stats.
>

I decided to keep this part of the docs as it is and fixed a few other
minor comments raised by you and Peter. Additionally, I have bumped
the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there
are any other major comments.

-- 
With Regards,
Amit Kapila.


v5-0001-Reconsider-pg_stat_subscription_workers-view.patch
Description: Binary data


Re: why do hash index builds use smgrextend() for new splitpoint pages

2022-02-27 Thread Amit Kapila
On Sat, Feb 26, 2022 at 9:17 PM Melanie Plageman
 wrote:
>
> On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila  wrote:
> >
> > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman
> >  wrote:
> > >
> > > Since _hash_alloc_buckets() WAL-logs the last page of the
> > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last
> > > page of the splitpoint doesn't end up having any tuples added to it
> > > during the index build and the redo pointer is moved past the WAL for
> > > this page and then later there is a crash sometime before this page
> > > makes it to permanent storage. Does it matter that this page is lost? If
> > > not, then why bother WAL-logging it?
> > >
> >
> > I think we don't care if the page is lost before we update the
> > meta-page in the caller because we will try to reallocate in that
> > case. But we do care after meta page update (having the updated
> > information about this extension via different masks) in which case we
> > won't lose this last page because it would have registered the sync
> > request for it via sgmrextend before meta page update.
>
> and could it happen that during smgrextend() for the last page, a
> checkpoint starts and finishes between FileWrite() and
> register_dirty_segment(), then index build finishes, and then a crash
> occurs before another checkpoint completes the pending fsync for that
> last page?
>

Yeah, this seems to be possible and then the problem could be that
index's idea and smgr's idea for EOF could be different which could
lead to a problem when we try to get a new page via _hash_getnewbuf().
If this theory turns out to be true then probably, we can get an error
either because of disk full or the index might request a block that is
beyond EOF as determined by RelationGetNumberOfBlocksInFork() in
_hash_getnewbuf().

Can we try to reproduce this scenario with the help of a debugger to
see if we are missing something?

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-02-28 Thread Amit Kapila
On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
 wrote:
>
> On 2/10/22 19:17, Tomas Vondra wrote:
> > I've polished & pushed the first part adding sequence decoding
> > infrastructure etc. Attached are the two remaining parts.
> >
> > I plan to wait a day or two and then push the test_decoding part. The
> > last part (for built-in replication) will need more work and maybe
> > rethinking the grammar etc.
> >
>
> I've pushed the second part, adding sequences to test_decoding.
>

The test_decoding is failing randomly in the last few days. I am not
completely sure but they might be related to this work. The two of
these appears to be due to the same reason:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07

TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
"reorderbuffer.c", Line: 1173, PID: 35013)
0   postgres0x00593de0 ExceptionalCondition + 160\\0

Another:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2022-02-16%2006%3A21%3A48

--- 
/home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
2022-02-14 20:19:14.0 +
+++ 
/home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
2022-02-16 07:42:18.0 +
@@ -126,6 +126,7 @@
   table public.replication_example: INSERT: id[integer]:4
somedata[integer]:3 text[character varying]:null
testcolumn1[integer]:null
   table public.replication_example: INSERT: id[integer]:5
somedata[integer]:4 text[character varying]:null
testcolumn1[integer]:2 testcolumn2[integer]:1
   COMMIT
+  sequence public.replication_example_id_seq: transactional:0
last_value: 38 log_cnt: 0 is_called:1
   BEGIN
   table public.replication_example: INSERT: id[integer]:6
somedata[integer]:5 text[character varying]:null
testcolumn1[integer]:3 testcolumn2[integer]:null
   COMMIT
@@ -133,7 +134,7 @@
   table public.replication_example: INSERT: id[integer]:7
somedata[integer]:6 text[character varying]:null
testcolumn1[integer]:4 testcolumn2[integer]:null
   table public.replication_example: INSERT: id[integer]:8
somedata[integer]:7 text[character varying]:null
testcolumn1[integer]:5 testcolumn2[integer]:null
testcolumn3[integer]:1
   COMMIT
- (15 rows)
+ (16 rows)

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-02-28 Thread Amit Kapila
On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  wrote:
>
> We've added some information such as the command and the timestamp to
> the error context message by commit abc0910e2. This patch adds further
> information to it: replication origin name and commit-LSN.
>
> This will be helpful for users to set the origin name and LSN to
> pg_replication_origin_advance().
>

+1. This will make the use of pg_replication_origin_advance() relatively easy.

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-03-01 Thread Amit Kapila
On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila  wrote:
>
> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>  wrote:
> >
> > On 2/10/22 19:17, Tomas Vondra wrote:
> > > I've polished & pushed the first part adding sequence decoding
> > > infrastructure etc. Attached are the two remaining parts.
> > >
> > > I plan to wait a day or two and then push the test_decoding part. The
> > > last part (for built-in replication) will need more work and maybe
> > > rethinking the grammar etc.
> > >
> >
> > I've pushed the second part, adding sequences to test_decoding.
> >
>
> The test_decoding is failing randomly in the last few days. I am not
> completely sure but they might be related to this work. The two of
> these appears to be due to the same reason:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>
> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> "reorderbuffer.c", Line: 1173, PID: 35013)
> 0   postgres0x00593de0 ExceptionalCondition + 
> 160\\0
>

While reviewing the code for this, I noticed that in
sequence_decode(), we don't call ReorderBufferProcessXid to register
the first known lsn in WAL for the current xid. The similar functions
logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
even if they decide not to queue or send the change. Is there a reason
for not doing the same here? However, I am not able to deduce any
scenario where lack of this will lead to such an Assertion failure.
Any thoughts?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Amit Kapila
On Mon, Feb 28, 2022 at 1:15 PM tanghy.f...@fujitsu.com
 wrote:
>
> On Mon, Feb 28, 2022 12:32 PM Amit Kapila  wrote:
> >
> > I decided to keep this part of the docs as it is and fixed a few other
> > minor comments raised by you and Peter. Additionally, I have bumped
> > the PGSTAT_FILE_FORMAT_ID. I'll push this patch tomorrow unless there
> > are any other major comments.
> >
>
> Thanks for your patch. I have finished the review/test for this patch.
> The patch LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: pg_stat_get_replication_slot and pg_stat_get_subscription_worker incorrectly marked as proretset

2022-03-01 Thread Amit Kapila
On Fri, Feb 25, 2022 at 8:57 AM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 12:03 PM Michael Paquier  wrote:
> >
> > On Wed, Feb 23, 2022 at 09:52:02AM +0530, Amit Kapila wrote:
> > > Thanks, so you are okay with me pushing that patch just to HEAD.
> >
> > Yes, I am fine with that.  I am wondering about patching the second
> > function though, to avoid any risk of forgetting it, but I am fine to
> > leave that to your judgement.
> >
>
> The corresponding patch with other changes is not very far from being
> ready to commit. So, will do it along with that.
>

This is done as part of commit:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7a85073290856554416353a89799a4c04d09b74b


-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Mon, Feb 28, 2022 at 5:46 PM Masahiko Sawada  wrote:
>
> I've attached two patches: the first one changes
> apply_error_callback() so that it uses complete sentences with if-else
> blocks in order to have a translation work,
>

This is an improvement over what we have now but I think this is still
not completely correct as per message translation rules:
+ else
+ errcontext("processing remote data during \"%s\" in transaction %u at %s",
+logicalrep_message_type(errarg->command),
+errarg->remote_xid,
+(errarg->ts != 0) ? timestamptz_to_str(errarg->ts) : "(not-set)");

As per guidelines [1][2], we don't prefer to construct messages at
run-time aka we can do better for the following part: +(errarg->ts
!= 0) ? timestamptz_to_str(errarg->ts) : "(not-set)". I think we need
to use if-else here to split it further. If you agree, then the same
needs to be dealt with in other parts of the patch as well.

[1] - https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
[2] - Do not construct sentences at run-time, like:
printf("Files were %s.\n", flag ? "copied" : "removed");
The word order within the sentence might be different in other languages.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
>
> On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  
> wrote:
> >
> > The errcontext message would become like follows:
> >
> > *Before
> > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > DETAIL:  Key (c)=(1) already exists.
> > CONTEXT:  processing remote data during "INSERT" for replication
> > target relation "public.test" in transaction 726 at 2022-02-28
> > 20:59:56.005909+09
> >
> > * After
> > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > DETAIL:  Key (c)=(1) already exists.
> > CONTEXT:  processing remote data during "INSERT" for replication
> > target relation "public.test" in transaction 726 committed at LSN
> > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > origin "pg_16395"
> >
> > I'm a bit concerned that the message may be too long.
>
> If you are willing to use abbreviations instead of full
> words/sentences perhaps you can shorten the long CONTEXT part like
> below?
>
> Before:
> CONTEXT:  processing remote data during "INSERT" for replication
> target relation "public.test" in transaction 726 committed at LSN
> 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> replication origin "pg_16395"
>
> After:
> CONTEXT: processing remote data during "INSERT" for replication target
> relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> 20:58:27.964238+09, origin "pg_16395")
>

I am wondering whether we can avoid having a timestamp in the message?
If one wants, it can be retrieved from the errors otherwise as well.
For example, I see the below error in my machine:
2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
violates unique constraint "t1_pkey"
2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
during "INSERT" for replication target relation "public.t1" in
transaction 724 at 2022-02-26 07:45:09.083848+05:30

Now, here, won't the time at the starting of CONTEXT serves our
purpose. If we can remove the timestamp, I think the message won't
appear too long. What do you think?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 10:39 AM Andres Freund  wrote:
>
> Working on rebasing shared memory stats over this. Feels *much* better so far.
>

Good to hear that it helps. BTW, there is another patch [1] that
extends this view. I think that patch is still not ready but once it
is ready (which I expect to happen sometime in this CF), it might be
good if you would be able to check whether it has any major problem
with your integration.

> While rebasing, I was wondering why pgstat_reset_subscription_counter() has
> "all subscription counters" support. We don't have a function to reset all
> function stats or such either.
>

We have similar thing for srlu (pg_stat_reset_slru) and slots
(pg_stat_reset_replication_slot). For functions and tables, one can
use pg_stat_reset. Similarly, we have pg_stat_reset_shared() which
reset stats like WAL. This matches more with slru/slots, so we
providied it via pg_stat_reset_subscription_stats.

[1] - 
https://www.postgresql.org/message-id/TYWPR01MB8362B30A904274A911C0B1CCED039%40TYWPR01MB8362.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-01 Thread Amit Kapila
On Wed, Mar 2, 2022 at 9:33 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 2, 2022 at 12:21 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 2, 2022 at 8:25 AM Peter Smith  wrote:
> > >
> > > On Mon, Feb 28, 2022 at 11:16 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > The errcontext message would become like follows:
> > > >
> > > > *Before
> > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > DETAIL:  Key (c)=(1) already exists.
> > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > target relation "public.test" in transaction 726 at 2022-02-28
> > > > 20:59:56.005909+09
> > > >
> > > > * After
> > > > ERROR:  duplicate key value violates unique constraint "test_pkey"
> > > > DETAIL:  Key (c)=(1) already exists.
> > > > CONTEXT:  processing remote data during "INSERT" for replication
> > > > target relation "public.test" in transaction 726 committed at LSN
> > > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from replication
> > > > origin "pg_16395"
> > > >
> > > > I'm a bit concerned that the message may be too long.
> > >
> > > If you are willing to use abbreviations instead of full
> > > words/sentences perhaps you can shorten the long CONTEXT part like
> > > below?
> > >
> > > Before:
> > > CONTEXT:  processing remote data during "INSERT" for replication
> > > target relation "public.test" in transaction 726 committed at LSN
> > > 0/14BFA88 and timestamp 2022-02-28 20:58:27.964238+09 from
> > > replication origin "pg_16395"
> > >
> > > After:
> > > CONTEXT: processing remote data during "INSERT" for replication target
> > > relation "public.test" (txid 726, LSN 0/14BFA88, ts 2022-02-28
> > > 20:58:27.964238+09, origin "pg_16395")
> > >
> >
> > I am wondering whether we can avoid having a timestamp in the message?
> > If one wants, it can be retrieved from the errors otherwise as well.
> > For example, I see the below error in my machine:
> > 2022-02-26 07:45:25.092 IST [17644] ERROR:  duplicate key value
> > violates unique constraint "t1_pkey"
> > 2022-02-26 07:45:25.092 IST [17644] DETAIL:  Key (c1)=(1) already exists.
> > 2022-02-26 07:45:25.092 IST [17644] CONTEXT:  processing remote data
> > during "INSERT" for replication target relation "public.t1" in
> > transaction 724 at 2022-02-26 07:45:09.083848+05:30
> >
> > Now, here, won't the time at the starting of CONTEXT serves our
> > purpose. If we can remove the timestamp, I think the message won't
> > appear too long. What do you think?
>
> The time of the CONTEXT log message and the time in the message would
> largely vary when the subscriber is much behind the publisher. But I
> basically agree that the timestamp in the message might not be
> important, at least for now. If we will support conflict resolution
> that resolves based on the commit timestamp in the future, we might
> want it again.
>

Possible, but let's remove it for now as it will simplify the message
and the need for additional branches. What do you think?

-- 
With Regards,
Amit Kapila.




Re: PG DOCS - logical replication filtering

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 2:37 PM Aleksander Alekseev
 wrote:
>
>
> I see that a large part of the documentation is commented and marked as TBA 
> (Column Filters, Combining Different Kinds of Filters). Could you please 
> clarify if it's a work-in-progress patch? If it's not, I believe the 
> commented part should be removed before committing.
>

I think we can remove any Column Filters related information
(placeholders), if that patch gets committed, we can always extend the
existing docs.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-02 Thread Amit Kapila
 Create Subscription for pub1 on node1: Create Subscription sub1_2
Connection '' Publication pub1 WITH (only_local =
true);

Node-1:
Begin;
# Disallow truncates to be published
Alter Publication pub1 Set (publish = 'insert, update, delete');
Truncate t1;
Create Subscription sub1 Connection '' Publication
pub1 WITH (only_local = true, copy_data = 'force');
Alter Publication pub1 Set (publish = 'insert, update, delete, truncate');
Commit;

I think this will allow the bi-directional replication between two
nodes. In this scheme, the user needs to manually perform some steps
including truncate of the table on one of the nodes which she might or
might not like but at least there will be a way to set up a
bi-directional replication on two nodes for same table operations
which is not possible now.

I think one can even imagine using and extending this functionality so
that users don't need to perform TRUNCATE on one of the nodes. Say, in
the above case for tablesync phase, we make both nodes to start a
transaction, create a slot on another node (with USE_SNAPSHOT option),
and then allow copy from another node. I think it will be important to
allow copy on each node once the slots are created and the initial
snapshot is established.

For more than two nodes, I think we can suggest having either of the
option-1 or 2 for setup. But, there could be other ways as well
depending on how the user wants to do the setup.

Thoughts?

-- 
With Regards,
Amit Kapila.




Re: Design of pg_stat_subscription_workers vs pgstats

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 1:26 PM Andres Freund  wrote:
>
> > > While rebasing, I was wondering why pgstat_reset_subscription_counter() 
> > > has
> > > "all subscription counters" support. We don't have a function to reset all
> > > function stats or such either.
> > >
> >
> > We have similar thing for srlu (pg_stat_reset_slru) and slots
> > (pg_stat_reset_replication_slot).
>
> Neither should have been added imo. We're already at 9 different reset
> functions.
>

As per [1], we have 7.

>
> And having pg_stat_reset_shared() with variable
> 'reset' systems but then also pg_stat_reset_slru() and
> pg_stat_reset_subscription_stats() is absurd.
>

I don't know. I feel if for some subsystem, we have a way to reset a
single counter like for slru or slots, one would prefer to use the
same function to reset all stats of that subsytem. Now, for WAL,
bgwriter, etc., we don't want to reset any specific counter, so doing
it via a shared function is okay but not for others.

[1] - 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-STATS-FUNCTIONS

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 5:42 PM Euler Taveira  wrote:
>
> On Wed, Mar 2, 2022, at 8:45 AM, Tomas Vondra wrote:
>
> While working on the column filtering patch, which touches about the
> same places, I noticed two minor gaps in testing:
>
> 1) The regression tests do perform multiple ALTER PUBLICATION commands,
> tweaking the row filter. But there are no checks the row filter was
> actually modified / stored in the catalog. It might be just thrown away
> and no one would notice.
>
> The test that row filter was modified is available in a previous section. The
> one that you modified (0001) is testing the supported objects.
>

Right. But if Tomas thinks it is good to add for these ones as well
then I don't mind.

> 153 ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000 AND 
> e < 2000);
> 154 \dRp+ testpub5
> 155 ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
> 156 \dRp+ testpub5
> 157 -- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE 
> expression)
> 158 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300 AND e 
> < 500);
> 159 \dRp+ testpub5
>
> IIRC this test was written before adding the row filter information into the
> psql. We could add \d+ testpub_rf_tbl3 before and after the modification.
>


Agreed. We can use \d instead of \d+ as row filter is available with \d.

> 2) There are no pg_dump tests.
>
> WFM.
>

This is a miss. I feel we can add a few more.

-- 
With Regards,
Amit Kapila.




Re: PG DOCS - logical replication filtering

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 8:00 PM Peter Eisentraut
 wrote:
>
> On 02.03.22 05:47, Peter Smith wrote:
> > This patch introduces a new "Filtering" page to give a common place
> > where all kinds of logical replication filtering can be described.
> > (e.g. It is envisaged that a "Column Filters" section can be added
> > sometime in the future).
>
> The pending feature to select a subset of table columns to replicate is
> not "column filtering".  The thread might still be still called that,
> but we have changed the patch to not use that terminology.
>
> Filtering is a dynamic action based on actual values.  The row filtering
> feature does that.  The column list feature is a static DDL-time
> configuration.  It is no more filtering than specifying a list of tables
> in a publication is table filtering.
>
> So please consider organizing the documentation differently to not
> create this confusion.
>

+1. I think Row Filters can directly be a section just before
Conflicts on the logical replication page [1].

Some comments on the patch:
1. I think we can extend/add the example to have filters on more than
one table. This has been noticed multiple times during development
that people are not very clear on it.
2. I think we can add an example or two for row filters actions (like
Insert, Update).
3.
 Publications can choose to limit the changes they produce to
any combination of INSERT, UPDATE,
-   DELETE, and TRUNCATE,
similar to how triggers are fired by
-   particular event types.  By default, all operation types are replicated.
+   DELETE, and TRUNCATE by using
+   operation filters.

By this, one can imply that row filters are used for Truncate as well
but that is not true. I know that that patch later specifies that "Row
filters have no effect for TRUNCATE commands." but
the above modification is not very clear.

[1] - https://www.postgresql.org/docs/devel/logical-replication.html

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-02 Thread Amit Kapila
On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila  wrote:
>
> I've attached updated patches.
>

The first patch LGTM. Some comments on the second patch:

1.
@@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg)
  myslotname = MemoryContextStrdup(ApplyContext, syncslotname);

  pfree(syncslotname);
+
+ /*
+ * Allocate the origin name in long-lived context for error context
+ * message
+ */
+ ReplicationOriginNameForTablesync(MySubscription->oid,
+   MyLogicalRepWorker->relid,
+   originname,
+   sizeof(originname));
+ apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
+originname);

Can we assign this in LogicalRepSyncTableStart() where we already
forming the origin name? That will avoid this extra call to
ReplicationOriginNameForTablesync.

2.
@@ -3657,30 +3679,37 @@ apply_error_callback(void *arg)
  errcontext("processing remote data during \"%s\"",
 logicalrep_message_type(errarg->command));
  else
- errcontext("processing remote data during \"%s\" in transaction %u",
+ errcontext("processing remote data during \"%s\" in transaction %u
committed at LSN %X/%X from replication origin \"%s\"",
 logicalrep_message_type(errarg->command),
-errarg->remote_xid);
+errarg->remote_xid,
+LSN_FORMAT_ARGS(errarg->commit_lsn),
+errarg->origin_name);

Won't we set the origin name before the command? If so, it can be used
in the first message as well and we can change the condition in the
beginning such that if the origin or command is not set then we can
return without adding additional context information.

Isn't this error message used for rollback prepared failure as well?
If so, do we need to say "... finished at LSN ..." instead of "...
committed at LSN ..."?

The other point about this message is that saying " from
replication origin ..." sounds more like remote information similar to
publication but the origin is of the local node. How about slightly
changing it to "processing remote data for replication origin \"%s\"
during \"%s\" in transaction ..."?

3.
+
+   The LSN of the transaction that contains the change violating the
constraint and
+   the replication origin name can be found from those outputs (LSN
0/14C0378 and
+   replication origin pg_16395 in the above case).
The transaction
+   can be skipped by calling the 
pg_replication_origin_advance() function with
-   a node_name corresponding to the subscription name,
-   and a position.  The current position of origins can be seen in the
-   
+   the node_name and the next LSN of the commit LSN
+   (i.e., 0/14C0379) from those outputs.

After node_name, can we specify origin_name similar to what we do for
LSN as that will make things more clear? Also, shall we mention that
users need to disable subscriptions to perform replication origin
advance?

I think for prepared transactions, the user might need to use it twice
because after skipping prepared xact, it will get an error again
during the processing of commit prepared (no such prepare exists). I
thought of mentioning it but felt that might lead to specifying too
many details which can confuse users as well. What do you think?

4. There are places in the patch like apply_handle_stream_start()
which sets commit_lsn in callback arg as InvalidXLogRecPtr but the
callback function seems to be assuming that it is always a valid
value. Shouldn't we need to avoid appending LSN for such cases?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-03 Thread Amit Kapila
On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
>
> I’ve considered a plan for the skipping logical replication
> transaction feature toward PG15. Several ideas and patches have been
> proposed here and another related thread[1][2] for the skipping
> logical replication transaction feature as follows:
>
> A. Change pg_stat_subscription_workers (committed 7a8507329085)
> B. Add origin name and commit-LSN to logical replication worker
> errcontext (proposed[2])
> C. Store error information (e.g., the error message and commit-LSN) to
> the system catalog
> D. Introduce ALTER SUBSCRIPTION SKIP
> E. Record the skipped data somewhere: server logs or a table
>
> Given the remaining time for PG15, it’s unlikely to complete all of
> them for PG15 by the feature freeze. The most realistic plan for PG15
> in my mind is to complete B and D. With these two items, the LSN of
> the error-ed transaction is shown in the server log, and we can ask
> users to check server logs for the LSN and use it with ALTER
> SUBSCRIPTION SKIP command.
>

It makes sense to me to try to finish B and D from the above list for
PG-15. I can review the patch for D in detail if others don't have an
objection to it.

Peter E., others, any opinion on this matter?

> If the community agrees with B+D, we will
> have a user-visible feature for PG15 which can be further
> extended/improved in PG16 by adding C and E.

Agreed.

>
> I've attached an updated patch for D and here is the summary:
>
> * Introduce a new command ALTER SUBSCRIPTION ... SKIP (lsn =
> '0/1234'). The user can get the commit-LSN of the transaction in
> question from the server logs thanks to B[2].
> * The user-specified LSN (say skip-LSN) is stored in the
> pg_subscription catalog.
> * The apply worker skips the whole transaction if the transaction's
> commit-LSN exactly matches to skip-LSN.
> * The skip-LSN has an effect on only the first non-empty transaction
> since the worker started to apply changes. IOW it's cleared after
> either skipping the whole transaction or successfully committing a
> non-empty transaction, preventing the skip-LSN to remain in the
> catalog. Also, since the latter case means that the user set the wrong
> skip-LSN we clear it with a warning.
>

As this will be displayed only in server logs and by background apply
worker, should it be LOG or WARNING?

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-03-03 Thread Amit Kapila
On Thu, Mar 3, 2022 at 11:18 AM shiy.f...@fujitsu.com
 wrote:
>
> On Thu, Mar 3, 2022 10:40 AM Amit Kapila  wrote:
> >
> > On Wed, Mar 2, 2022 at 5:42 PM Euler Taveira  wrote:
> > >
> > > On Wed, Mar 2, 2022, at 8:45 AM, Tomas Vondra wrote:
> > >
> > > While working on the column filtering patch, which touches about the
> > > same places, I noticed two minor gaps in testing:
> > >
> > > 1) The regression tests do perform multiple ALTER PUBLICATION commands,
> > > tweaking the row filter. But there are no checks the row filter was
> > > actually modified / stored in the catalog. It might be just thrown away
> > > and no one would notice.
> > >
> > > The test that row filter was modified is available in a previous section. 
> > > The
> > > one that you modified (0001) is testing the supported objects.
> > >
> >
> > Right. But if Tomas thinks it is good to add for these ones as well
> > then I don't mind.
> >
> > > 153 ALTER PUBLICATION testpub5 ADD TABLE testpub_rf_tbl3 WHERE (e > 1000
> > AND e < 2000);
> > > 154 \dRp+ testpub5
> > > 155 ALTER PUBLICATION testpub5 DROP TABLE testpub_rf_tbl2;
> > > 156 \dRp+ testpub5
> > > 157 -- remove testpub_rf_tbl1 and add testpub_rf_tbl3 again (another WHERE
> > expression)
> > > 158 ALTER PUBLICATION testpub5 SET TABLE testpub_rf_tbl3 WHERE (e > 300
> > AND e < 500);
> > > 159 \dRp+ testpub5
> > >
> > > IIRC this test was written before adding the row filter information into 
> > > the
> > > psql. We could add \d+ testpub_rf_tbl3 before and after the modification.
> > >
> >
> >
> > Agreed. We can use \d instead of \d+ as row filter is available with \d.
> >
> > > 2) There are no pg_dump tests.
> > >
> > > WFM.
> > >
> >
> > This is a miss. I feel we can add a few more.
> >
>
> Agree that we can add some tests, attach the patch which fixes these two 
> points.
>

LGTM. I'll push this tomorrow unless Tomas or Euler feels otherwise.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-03 Thread Amit Kapila
On Fri, Mar 4, 2022 at 6:40 AM Masahiko Sawada  wrote:
>
> Attached updated version patches.
>

The patch looks mostly good to me. Few minor comments:
1. I think we can have an Assert for errarg->origin_name in
apply_error_callback after checking the command as this function
assumes that it will always be set.
2. I suggest minor changes in the documentation change:
When a conflict produces an error, the replication won't proceed, and
the apply worker will emit the following kind of message to the
subscriber's server log:
+
+ERROR:  duplicate key value violates unique constraint "test_pkey"
+DETAIL:  Key (c)=(1) already exists.
+CONTEXT:  processing remote data during "INSERT" for replication
target relation "public.test" in transaction 725 committed at LSN
0/14BFA88 from replication origin "pg_16395"
+

The LSN of the transaction that contains the change violating the
constraint and the replication origin name can be found from the
server log (LSN 0/14C0378 and replication origin
pg_16395 in the above case).  To skip the
transaction, the subscription needs to be disabled temporarily by
ALTER SUBSCRIPTION ... DISABLE first. Then, the
transaction can be skipped by calling the 
pg_replication_origin_advance() function
with the node_name (i.e.,
pg_16395) and the next LSN of the commit LSN (i.e.,
LSN 0/14C0379).

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-03 Thread Amit Kapila
On Fri, Mar 4, 2022 at 10:53 AM Masahiko Sawada  wrote:
>
> On Fri, Mar 4, 2022 at 11:27 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > (4) one confirmation
> >
> > We don't have a TAP test of pg_replication_origin_advance()
> > for v3, that utilizes this new log in a logical replication setup.
> > This is because existing tests for this function (in test_decoding) is only 
> > for permission check
> > and argument validation, and we're just changing error message itself.
> > Is this correct ?
>
> Yeah, I think it’s a good idea to add test in general but I don't
> think we should include the tests for skipping a transaction by using
> pg_replication_origin() in this patch because it's an existing way and
> upcoming ALTER SUBSCRIPTION SKIP patch includes the tests and it's
> more appropriate way. But if others also think it should do that too
> along with this update, I'm happy to add tests.
>

I also don't see a reason why this patch should add any tests related
to origin.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-03 Thread Amit Kapila
On Fri, Mar 4, 2022 at 11:45 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, March 4, 2022 2:23 PM Masahiko Sawada  
> wrote:
> > I've attached updated patches.
> Hi, thank you for updating the patch.
>
> One comment on v4.
>
> In v4-0002, we introduce 'commit_lsn' in the ApplyErrorCallbackArg.
> This member is set for prepare, rollback prepared and stream_abort as well.
> The new log message format is useful when we have a prepare transaction
> that keeps failing on the subscriber and want to know the target transaction
> for the pg_replication_origin_advance(), right ?
> If this is true, I wasn't sure if the name 'commit_lsn' is the
> most accurate name for this variable. Should we adjust the name a bit ?
>

If we want to change this variable name, the other options could be
'end_lsn', or 'finish_lsn'.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-04 Thread Amit Kapila
ary key (a,b)) default;
insert into parent values (1,1);

-- subscriber --
create table parent (a int, b int) partition by range (a);
create table child partition of parent default;
create subscription sub connection 'port=5432 dbname=postgres'
publication pub; update child set b=1 where a=1;
alter table parent add primary key (a,b);

-- publisher --
delete from parent;

Column "b" is part of replica identity in the child table, but it is
filtered, which caused an error on the subscriber side.

ERROR:  publisher did not send replica identity column expected by the
logical replication target relation "public.parent"
CONTEXT:  processing remote data during "DELETE" for replication
target relation "public.parent" in transaction 723 at 2022-03-04
15:15:39.776949+08

2.2
It is likely that a table to be attached also has a partition.

e.g.
-- publisher --
create table t1 (a int, b int) partition by range (a);
create publication pub for table t1(b) with(publish_via_partition_root=true);
create table t2 (a int, b int) partition by range (a);
create table t3 (a int primary key, b int);
alter table t2 attach partition t3 default;
alter table t1 attach partition t2 default;
insert into t1 values (1,1);

-- subscriber --
create table t1 (a int, b int) partition by range (a);
create table t2 (a int, b int) partition by range (a);
create table t3 (a int, b int);
alter table t2 attach partition t3 default;
alter table t1 attach partition t2 default;
create subscription sub connection 'port=5432 dbname=postgres' publication pub;
update t1 set a=1 where b=1;
alter table t1 add primary key (a);

-- publisher --
delete from t1;

Column "a" is part of replica identity in table t3, but t3's ancestor
t1 is published with column "a" filtered, which caused an error on the
subscriber side.

ERROR:  publisher did not send replica identity column expected by the
logical replication target relation "public.t1"
CONTEXT:  processing remote data during "DELETE" for replication
target relation "public.t1" in transaction 726 at 2022-03-04
14:40:29.297392+08

3.
Using "alter publication pub set(publish='...'); "

e.g.
-- publisher --
create table tbl(a int primary key, b int); create publication pub for
table tbl(b) with(publish='insert'); insert into tbl values (1,1);

-- subscriber --
create table tbl(a int, b int);
create subscription sub connection 'port=5432 dbname=postgres' publication pub;

-- publisher --
alter publication pub set(publish='insert,update');

-- subscriber --
update tbl set a=1 where b=1;
alter table tbl add primary key (b);

-- publisher --
update tbl set a=2 where a=1;

Updates are replicated, and the column "a" is part of replica
identity, but it is filtered, which caused an error on the subscriber
side.

ERROR:  publisher did not send replica identity column expected by the
logical replication target relation "public.tbl"
CONTEXT:  processing remote data during "UPDATE" for replication
target relation "public.tbl" in transaction 723 at 2022-03-04
11:56:33.905843+08

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-04 Thread Amit Kapila
On Fri, Mar 4, 2022 at 6:02 PM Euler Taveira  wrote:
>
> On Fri, Mar 4, 2022, at 2:54 AM, Amit Kapila wrote:
>
> The LSN of the transaction that contains the change violating the
> constraint and the replication origin name can be found from the
> server log (LSN 0/14C0378 and replication origin
> pg_16395 in the above case).  To skip the
> transaction, the subscription needs to be disabled temporarily by
> ALTER SUBSCRIPTION ... DISABLE first. Then, the
> transaction can be skipped by calling the  linkend="pg-replication-origin-advance">
> pg_replication_origin_advance() function
> with the node_name (i.e.,
> pg_16395) and the next LSN of the commit LSN (i.e.,
> LSN 0/14C0379).
>
> You could also add:
>
> After that the replication can be resumed by ALTER SUBSCRIPTION ...
> ENABLE.
>
> Let's provide complete instructions.
>

+1. That sounds reasonable to me.

-- 
With Regards,
Amit Kapila.




Re: row filtering for logical replication

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 12:50 AM Tomas Vondra
 wrote:
>
> On 3/3/22 21:07, Euler Taveira wrote:
> > On Thu, Mar 3, 2022, at 7:47 AM, Amit Kapila wrote:
> >> LGTM. I'll push this tomorrow unless Tomas or Euler feels otherwise.
> > Sounds good to me.
> >
>
> +1
>

Thanks, Pushed 
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ceb57afd3ce177e897cb4c5b44aa683fc0036782).

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 9:26 AM Dilip Kumar  wrote:
>
> On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> > Here there are two problems for the user: a) incremental
> > synchronization of table sending both local data and replicated data
> > by walsender b) Table synchronization of table using copy command
> > sending both local data and replicated data
> >
> > For the first problem "Incremental synchronization of table by
> > Walsender" can be solved by:
> > Currently the locally generated data does not have replication origin
> > associated and the data that has originated from another instance will
> > have a replication origin associated. We could use this information to
> > differentiate locally generated data and replicated data and send only
> > the locally generated data. This "only_local" could be provided as an
> > option while subscription is created:
> > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > PUBLICATION pub1 with (only_local = on);
>
> I haven't yet gone through the patch, but I have a question about the
> idea.  Suppose I want to set up a logical replication like,
> node1->node2->node3->node1.  So how would I create the subscriber at
> node1?  only_local=on or off?.  I mean on node1, I want the changes
> from node3 which are generated on node3 or which are replicated from
> node2 but I do not want changes that are replicated from node1 itself?
>  So if I set only_local=on then node1 will not get the changes
> replicated from node2, is that right? and If I set only_local=off then
> it will create the infinite loop again?  So how are we protecting
> against this case?
>

In the above topology if you want local changes from both node3 and
node2 then I think the way to get that would be you have to create two
subscriptions on node1. The first one points to node2 (with
only_local=off) and the second one points to node3 (with only_local
=off). Will that address your case or am I missing something?

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 6:36 AM Masahiko Sawada  wrote:
>
> Thank you for the comment. +1.
>
> I've attached updated patches.
>

Pushed the first patch. Fixed one typo in the second patch and
slightly changed the commit message, otherwise, it looks good to me.
I'll push this tomorrow unless there are more comments.

-- 
With Regards,
Amit Kapila.


v6-0001-Add-the-additional-information-to-the-logical-rep.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-03-06 Thread Amit Kapila
On Mon, Mar 7, 2022 at 10:05 AM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 9:26 AM Dilip Kumar  wrote:
> >
> > On Wed, Feb 23, 2022 at 11:59 AM vignesh C  wrote:
> > > Here there are two problems for the user: a) incremental
> > > synchronization of table sending both local data and replicated data
> > > by walsender b) Table synchronization of table using copy command
> > > sending both local data and replicated data
> > >
> > > For the first problem "Incremental synchronization of table by
> > > Walsender" can be solved by:
> > > Currently the locally generated data does not have replication origin
> > > associated and the data that has originated from another instance will
> > > have a replication origin associated. We could use this information to
> > > differentiate locally generated data and replicated data and send only
> > > the locally generated data. This "only_local" could be provided as an
> > > option while subscription is created:
> > > ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=5433'
> > > PUBLICATION pub1 with (only_local = on);
> >
> > I haven't yet gone through the patch, but I have a question about the
> > idea.  Suppose I want to set up a logical replication like,
> > node1->node2->node3->node1.  So how would I create the subscriber at
> > node1?  only_local=on or off?.  I mean on node1, I want the changes
> > from node3 which are generated on node3 or which are replicated from
> > node2 but I do not want changes that are replicated from node1 itself?
> >  So if I set only_local=on then node1 will not get the changes
> > replicated from node2, is that right? and If I set only_local=off then
> > it will create the infinite loop again?  So how are we protecting
> > against this case?
> >
>
> In the above topology if you want local changes from both node3 and
> node2 then I think the way to get that would be you have to create two
> subscriptions on node1. The first one points to node2 (with
> only_local=off) and the second one points to node3 (with only_local
> =off).
>

Sorry, I intend to say 'only_local=on' at both places in my previous email.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Amit Kapila
On Mon, Mar 7, 2022 at 4:55 AM Peter Smith  wrote:
>
> On Fri, Mar 4, 2022 at 5:55 PM Masahiko Sawada  wrote:
> >
> > ---
> > +/*
> > + * First, ensure that we log the error message so
> > that it won't be
> > + * lost if some other internal error occurs in the
> > following code.
> > + * Then, abort the current transaction and send the
> > stats message of
> > + * the table synchronization failure in an idle state.
> > + */
> > +HOLD_INTERRUPTS();
> > +EmitErrorReport();
> > +FlushErrorState();
> > +AbortOutOfAnyTransaction();
> > +RESUME_INTERRUPTS();
> > +pgstat_report_subscription_error(MySubscription->oid, 
> > false);
> > +
> > +if (MySubscription->disableonerr)
> > +{
> > +DisableSubscriptionOnError();
> > +proc_exit(0);
> > +}
> > +
> > +PG_RE_THROW();
> >
> > If the disableonerr is false, the same error is reported twice. Also,
> > the code in PG_CATCH() in both start_apply() and start_table_sync()
> > are almost the same. Can we create a common function to do post-error
> > processing?
> >
> > The worker should exit with return code 1.
> >
> > I've attached a fixup patch for changes to worker.c for your
> > reference. Feel free to adopt the changes.
>
> The way that common function is implemented required removal of the
> existing PG_RE_THROW logic, which in turn was only possible using
> special knowledge that this just happens to be the last try/catch
> block for the apply worker.
>

I think we should re_throw the error in case we have not handled it by
disabling the subscription (in which case we can exit with success
code (0)).

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila
On Mon, Mar 7, 2022 at 12:48 PM vignesh C  wrote:
>
> On Mon, Mar 7, 2022 at 11:45 AM Peter Smith  wrote:
> >
> > On Mon, Mar 7, 2022 at 4:20 PM vignesh C  wrote:
> > >
> > > On Mon, Mar 7, 2022 at 10:26 AM Peter Smith  wrote:
> > > >
> > > > Hi Vignesh, I also have not looked at the patch yet, but I have what
> > > > seems like a very fundamental (and possibly dumb) question...
> > > >
> > > > Basically, I do not understand the choice of syntax for setting things 
> > > > up.
> > > >
> > > > IMO that "only-local" option sounds very similar to the other
> > > > PUBLICATION ("publish") options which decide the kinds of things that
> > > > will be published. So it feels more natural for me to think of the
> > > > publisher as being the one to decide what will be published.
> > > >
> > > > e.g.
> > > >
> > > > option 1:
> > > > CREATE PUBLICATION p1 FOR TABLE t1;
> > > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
> > > >
> > > > option 2:
> > > > CREATE PUBLICATION p1 FOR TABLE t1 WEHRE (publish = 'only_local');
> > > > CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
> > > >
> > > > ~~
> > > >
> > > > IIUC the patch is using option 1. My first impression was it feels
> > > > back-to-front for the SUBSCRIPTION telling the PUBLICATION what to
> > > > publish.
> > > >
> > > > So, why does the patch use syntax option 1?
> > >
> > > I felt the advantage with keeping it at the subscription side is that,
> > > the subscriber from one node can subscribe with only_local option on
> > > and a different subscriber from a different node can subscribe with
> > > only_local option as off. This might not be possible with having the
> > > option at publisher side. Having it at the subscriber side might give
> > > more flexibility for the user.
> > >
> >
> > OK.  Option 2 needs two publications for that scenario. IMO it's more
> > intuitive this way, but maybe you wanted to avoid the extra
> > publications?
>
> Yes, I wanted to avoid the extra publication creation that you pointed
> out. Option 1 can handle this scenario without creating the extra
> publications:
> node0: CREATE PUBLICATION p1 FOR TABLE t1;
> node1: CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = on);
> node2:  CREATE SUBSCRIPTION s1 ... FOR PUBLICATION p1 with (only_local = off);
>
> I'm ok with both the approaches, now that this scenario can be handled
> by using both the options. i.e providing only_local option as an
> option while creating publication or providing only_local option as an
> option while creating subscription as Peter has pointed out at [1].
> option 1:
> CREATE PUBLICATION p1 FOR TABLE t1;
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1 WITH (only_local = true);
>
> option 2:
> CREATE PUBLICATION p1 FOR TABLE t1 WITH (publish = 'only_local');
> CREATE SUBSCRITION s1 ... FOR PUBLICATION p1;
>
> Shall we get a few opinions on this and take it in that direction?
>

Allowing multiple publications will lead to a lot of duplicate data in
catalogs like pg_publication_rel, so, I don't see how option-2 can be
beneficial?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila
On Mon, Mar 7, 2022 at 1:11 PM Dilip Kumar  wrote:
>
> On Mon, Mar 7, 2022 at 10:15 AM Amit Kapila  wrote:
> > > > I haven't yet gone through the patch, but I have a question about the
> > > > idea.  Suppose I want to set up a logical replication like,
> > > > node1->node2->node3->node1.  So how would I create the subscriber at
> > > > node1?  only_local=on or off?.  I mean on node1, I want the changes
> > > > from node3 which are generated on node3 or which are replicated from
> > > > node2 but I do not want changes that are replicated from node1 itself?
> > > >  So if I set only_local=on then node1 will not get the changes
> > > > replicated from node2, is that right? and If I set only_local=off then
> > > > it will create the infinite loop again?  So how are we protecting
> > > > against this case?
> > > >
> > >
> > > In the above topology if you want local changes from both node3 and
> > > node2 then I think the way to get that would be you have to create two
> > > subscriptions on node1. The first one points to node2 (with
> > > only_local=off) and the second one points to node3 (with only_local
> > > =off).
> > >
> >
> > Sorry, I intend to say 'only_local=on' at both places in my previous email.
>
> Hmm okay, so for this topology we will have to connect node1 directly
> to node2 as well as to node3 but can not cascade the changes.  I was
> wondering can it be done without using the extra connection between
> node2 to node1?  I mean instead of making this a boolean flag that
> whether we want local change or remote change, can't we control the
> changes based on the origin id?  Such that node1 will get the local
> changes of node3 but with using the same subscription it will get
> changes from node3 which are originated from node2 but it will not
> receive the changes which are originated from node1.
>

Good point. I think we can provide that as an additional option to
give more flexibility but we won't be able to use it for initial sync
where we can't differentiate between data from different origins.
Also, I think as origins are internally generated, we may need some
better way to expose it to users so that they can specify it as an
option. Isn't it better to provide first some simple way like a
boolean option so that users have some way to replicate the same table
data among different nodes without causing an infinite loop and then
extend it as you are suggesting or may be in some other ways as well?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila
On Mon, Mar 7, 2022 at 5:01 PM Ashutosh Bapat
 wrote:
>
> Hi Vignesh,
> I agree with Peter's comment that the changes to
> FilterRemoteOriginData() should be part of FilterByOrigin()
>
> Further, I wonder why "onlylocal_data" is a replication slot's
> property. A replication slot tracks the progress of replication and it
> may be used by different receivers with different options. I could
> start one receiver which wants only local data, say using
> "pg_logical_slot_get_changes" and later start another receiver which
> fetches all the data starting from where the first receiver left. This
> option prevents such flexibility.
>
> As discussed earlier in the thread, local_only can be property of
> publication or subscription, depending upon the use case, but I can't
> see any reason that it should be tied to a replication slot.
>

I thought it should be similar to 'streaming' option of subscription
but may be Vignesh has some other reason which makes it different.

> I have a similar question for "two_phase" but the ship has sailed and
> probably it makes some sense there which I don't know.
>

two_phase is different from some of the other subscription options
like 'streaming' such that it can be enabled only at the time of slot
and subscription creation, we can't change/specify it via
pg_logical_slot_get_changes. This is to avoid the case where we won't
know at the time of the commit prepared whether the prepare for the
transaction has already been sent. For the same reason, we need to
also know the 'two_phase_at' information.

> As for publication vs subscription, I think both are useful cases.
> 1. It will be a publication's property, if we want the node to not
> publish any data that it receives from other nodes for a given set of
> tables.
> 2. It will be the subscription's property, if we want the subscription
> to decide whether it wants to fetch the data changed on only upstream
> or other nodes as well.
>

I think it could be useful to allow it via both publication and
subscription but I guess it is better to provide it via one way
initially just to keep things simple and give users some way to deal
with such cases. I would prefer to allow it via subscription initially
for the reasons specified by Vignesh in his previous email [1]. Now,
if we think those all are ignorable things and it is more important to
allow this option first by publication or we must allow it via both
publication and subscription then it makes sense to change it.


[1] - 
https://www.postgresql.org/message-id/CALDaNm3jkotRhKfCqu5CXOf36_yiiW_cYE5%3DbG%3Dj6N3gOWJkqw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila
On Tue, Mar 8, 2022 at 10:31 AM Peter Smith  wrote:
>
> IIUC the new option may be implemented subscriber-side and/or
> publisher-side and/or both, and the subscriber-side option may be
> "enhanced" in future to prevent cycles. And probably there are more
> features I don't know about or that have not yet been thought of.
>
> ~~
>
> Even if the plan is only to implement just one part now and then add
> more later, I think there still should be some consideration for what
> you expect all possible future options to look like, because that may
> affect current implementation choices.
>
> The point is:
>
> - we should take care so don't accidentally end up with an option that
> turned out to be inconsistent looking on the subscriber-side /
> publisher-side.
>
> - we should try to avoid accidentally painting ourselves into a corner
> (e.g. stuck with a boolean option that cannot be enhanced later on)
>

Agreed. I think it is important to see we shouldn't do something which
is not extendable in future or paint us in the corner. But, as of now,
with the current proposal, the main thing we should consider is
whether exposing the boolean option is okay or shall we do something
different so that we can extend it later to specific origin ids?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Amit Kapila
On Tue, Mar 8, 2022 at 9:37 AM Peter Smith  wrote:
>
> Please find below some review comments for v29.
>
> ==
>
> 1. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> +/*
> + * Abort and cleanup the current transaction, then do post-error processing.
> + * This function must be called in a PG_CATCH() block.
> + */
> +static void
> +worker_post_error_processing(void)
>
> The function comment and function name are too vague/generic. I guess
> this is a hang-over from Sawada-san's proposed patch, but now since
> this is only called when disabling the subscription so both the
> comment and the function name should say that's what it is doing...
>
> e.g. rename to DisableSubscriptionOnError() or something similar.
>
> ~~~
>
> 2. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> + /* Notify the subscription has been disabled */
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" has been be disabled
> due to an error",
> +MySubscription->name));
>
>   proc_exit(0);
>  }
>
> I know this is common code, but IMO it would be better to do the
> proc_exit(0); from the caller in the PG_CATCH. Then I think the code
> will be much easier to read the throw/exit logic, rather than now
> where it is just calling some function that never returns...
>
> Alternatively, if you want the code how it is, then the function name
> should give some hint that it is never going to return - e.g.
> DisableSubscriptionOnErrorAndExit)
>

I think we are already in error so maybe it is better to name it as
DisableSubscriptionAndExit.

Few other comments:
=
1.
DisableSubscription()
{
..
+
+ LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);

Why do we need AccessExclusiveLock here? The Alter/Drop Subscription
takes AccessExclusiveLock, so AccessShareLock should be sufficient
unless we have a reason to use AccessExclusiveLock lock. The other
similar usages in this file (pg_subscription.c) also take
AccessShareLock.

2. Shall we mention this feature in conflict handling docs [1]:
Now:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first.

After:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first or alternatively,
the subscription can be used with the disable_on_error option.

Feel free to use something on the above lines, if you agree.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-03-07 Thread Amit Kapila
On Tue, Mar 8, 2022 at 12:17 PM Peter Smith  wrote:
>
> On Tue, Mar 8, 2022 at 4:21 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 8, 2022 at 10:31 AM Peter Smith  wrote:
> > >
> > > IIUC the new option may be implemented subscriber-side and/or
> > > publisher-side and/or both, and the subscriber-side option may be
> > > "enhanced" in future to prevent cycles. And probably there are more
> > > features I don't know about or that have not yet been thought of.
> > >
> > > ~~
> > >
> > > Even if the plan is only to implement just one part now and then add
> > > more later, I think there still should be some consideration for what
> > > you expect all possible future options to look like, because that may
> > > affect current implementation choices.
> > >
> > > The point is:
> > >
> > > - we should take care so don't accidentally end up with an option that
> > > turned out to be inconsistent looking on the subscriber-side /
> > > publisher-side.
> > >
> > > - we should try to avoid accidentally painting ourselves into a corner
> > > (e.g. stuck with a boolean option that cannot be enhanced later on)
> > >
> >
> > Agreed. I think it is important to see we shouldn't do something which
> > is not extendable in future or paint us in the corner. But, as of now,
> > with the current proposal, the main thing we should consider is
> > whether exposing the boolean option is okay or shall we do something
> > different so that we can extend it later to specific origin ids?
> >
>
> I was wondering, assuming later there is an enhancement to detect and
> prevent cycles using origin ids, then what really is the purpose of
> the option?
>

We can't use origin ids during the initial sync as we can't
differentiate between data from local or remote node. Also, how will
we distinguish between different origins? AFAIU, all the changes
applied on any particular node use the same origin.

> I can imagine if a cycle happens should probably log some one-time
> WARNING (just to bring it to the user's attention in case it was
> caused by some accidental misconfiguration) but apart from that why
> would this need to be an option at all - i.e. is there a use-case
> where a user would NOT want to prevent a recursive error?
>

I think there will be plenty of such setups where there won't be a
need for any such option like where users won't need bi-directional
replication for the same table.

-- 
With Regards,
Amit Kapila.




Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

2022-03-08 Thread Amit Kapila
On Mon, Mar 7, 2022 at 10:06 AM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 6:36 AM Masahiko Sawada  wrote:
> >
> > Thank you for the comment. +1.
> >
> > I've attached updated patches.
> >
>
> Pushed the first patch. Fixed one typo in the second patch and
> slightly changed the commit message, otherwise, it looks good to me.
> I'll push this tomorrow unless there are more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
 wrote:
>
> Kindly have a look at v30.
>

Review comments:
===
1.
+ ereport(LOG,
+ errmsg("logical replication subscription \"%s\" has been be disabled
due to an error",

Typo.
/been be/been

2. Is there a reason the patch doesn't allow workers to restart via
maybe_reread_subscription() when this new option is changed, if so,
then let's add a comment for the same? We currently seem to be
restarting the worker on any change via Alter Subscription. If we
decide to change it for this option as well then I think we need to
accordingly update the current comment: "Exit if any parameter that
affects the remote connection was changed." to something like "Exit if
any parameter that affects the remote connection or a subscription
option was changed..."

3.
  if (fout->remoteVersion >= 15)
- appendPQExpBufferStr(query, " s.subtwophasestate\n");
+ appendPQExpBufferStr(query, " s.subtwophasestate,\n");
  else
  appendPQExpBuffer(query,
-   " '%c' AS subtwophasestate\n",
+   " '%c' AS subtwophasestate,\n",
LOGICALREP_TWOPHASE_STATE_DISABLED);

+ if (fout->remoteVersion >= 15)
+ appendPQExpBuffer(query, " s.subdisableonerr\n");
+ else
+ appendPQExpBuffer(query,
+   " false AS subdisableonerr\n");

It is better to combine these parameters. I see there is a similar
coding pattern for 14 but I think that is not required.

4.
+$node_subscriber->safe_psql('postgres', qq(ALTER SUBSCRIPTION sub ENABLE));
+
+# Wait for the data to replicate.
+$node_subscriber->poll_query_until(
+ 'postgres', qq(
+SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr
+WHERE sr.srsubstate IN ('s', 'r') AND sr.srrelid = 'tbl'::regclass));

See other scripts like t/015_stream.pl and wait for data replication
in the same way. I think there are two things to change: (a) In the
above query, we use NOT IN at other places (b) use
$node_publisher->wait_for_catchup before this query.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada  wrote:
>
> ---
> It might have already been discussed but the worker disables the
> subscription on an error but doesn't work for a fatal. Is that
> expected or should we handle that too?
>

I am not too sure about handling FATALs with this feature because this
is mainly to aid in resolving conflicts due to various constraints. It
might be okay to retry in case of FATAL which is possibly due to some
system resource error. OTOH, if we see that it will be good to disable
for a FATAL error as well then I think we can use
PG_ENSURE_ERROR_CLEANUP construct. What do you think?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Tue, Mar 8, 2022 at 6:53 PM Amit Kapila  wrote:
>
> On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Kindly have a look at v30.
> >
>
> Review comments:
> ===
>
Few comments on test script:
===
1.
+# This tests the uniqueness violation will cause the subscription
+# to fail during initial synchronization and make it disabled.

/This tests the/This tests that the

2.
+$node_publisher->safe_psql('postgres',
+ qq(CREATE PUBLICATION pub FOR TABLE tbl));
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+CREATE SUBSCRIPTION sub
+CONNECTION '$publisher_connstr'
+PUBLICATION pub WITH (disable_on_error = true)));

Please check other test scripts like t/015_stream.pl or
t/028_row_filter.pl and keep the indentation of these commands
similar. It looks odd and inconsistent with other tests. Also, we can
use double-quotes instead of qq so as to be consistent with other
scripts. Please check other similar places and make them consistent
with other test script files.

3.
+# Initial synchronization failure causes the subscription
+# to be disabled.

Here and in other places in test scripts, the comment lines seem too
short to me. Normally, we can keep it at the 80 char limit but this
appears too short.

4.
+# Delete the data from the subscriber and recreate the unique index.
+$node_subscriber->safe_psql(
+ 'postgres', q(
+DELETE FROM tbl;
+CREATE UNIQUE INDEX tbl_unique ON tbl (i)));

In other tests, we are executing single statements via safe_psql. I
don't see a problem with this but also don't see a reason to deviate
from the normal pattern.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Wed, Mar 9, 2022 at 11:22 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 9, 2022 at 12:37 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada  
> > wrote:
> > >
> > > ---
> > > It might have already been discussed but the worker disables the
> > > subscription on an error but doesn't work for a fatal. Is that
> > > expected or should we handle that too?
> > >
> >
> > I am not too sure about handling FATALs with this feature because this
> > is mainly to aid in resolving conflicts due to various constraints. It
> > might be okay to retry in case of FATAL which is possibly due to some
> > system resource error. OTOH, if we see that it will be good to disable
> > for a FATAL error as well then I think we can use
> > PG_ENSURE_ERROR_CLEANUP construct. What do you think?
>
> I think that since FATAL raised by logical replication workers (e.g.,
> terminated by DDL or out of memory etc?) is normally not a repeatable
> error, it's reasonable to retry in this case.
>

Yeah, I think we can add a comment in the code for this so that future
readers know that this has been done deliberately.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-09 Thread Amit Kapila
On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
 wrote:
>
> On 3/4/22 11:42, Amit Kapila wrote:
> > On Wed, Mar 2, 2022 at 5:43 PM Tomas Vondra
> >  wrote:
> >>
> >> Attached is an updated patch, addressing most of the issues reported so
> >> far. There are various minor tweaks, but the main changes are:
> > ...
> >>
> >> 3) checks of column filter vs. publish_via_partition_root and replica
> >> identity, following the same logic as the row-filter patch (hopefully,
> >> it touches the same places, using the same logic, ...)
> >>
> >> That means - with "publish_via_partition_root=false" it's not allowed to
> >> specify column filters on partitioned tables, only for leaf partitions.
> >>
> >> And we check column filter vs. replica identity when adding tables to
> >> publications, or whenever we change the replica identity.
> >>
> >
> > This handling is different from row filter work and I see problems
> > with it.
>
> By different, I assume you mean I tried to enfoce the rules in ALTER
> PUBLICATION and other ALTER commands, instead of when modifying the
> data?
>

Yes.

> OK, I reworked this to do the same thing as the row filtering patch.
>

Thanks, I'll check this.

> > The column list validation w.r.t primary key (default replica
> > identity) is missing. The handling of column list vs. partitions has
> > multiple problems: (a) In attach partition, the patch is just checking
> > ancestors for RI validation but what if the table being attached has
> > further subpartitions; (b) I think the current locking also seems to
> > have problems because it is quite possible that while it validates the
> > ancestors here, concurrently someone changes the column list. I think
> > it won't be enough to just change the locking mode because with the
> > current patch strategy during attach, we will be first taking locks
> > for child tables of current partition and then parent tables which can
> > pose deadlock hazards.
> > > The columns list validation also needs to be done when we change
> > publication action.
> >
> I believe those issues should be solved by adopting the same approach as
> the row-filtering patch, right?
>

Right.

>
> > Some other miscellaneous comments:
> > =
> > *
> > In get_rel_sync_entry(), the handling for partitioned tables doesn't
> > seem to be correct. It can publish a different set of columns based on
> > the order of publications specified in the subscription.
> >
> > For example:
> > 
> > create table parent (a int, b int, c int) partition by range (a);
> > create table test_part1 (like parent);
> > alter table parent attach partition test_part1 for values from (1) to (10);
> >
> > create publication pub for table parent(a) with 
> > (PUBLISH_VIA_PARTITION_ROOT);
> > create publication pub2 for table test_part1(b);
> > ---
> >
> > Now, depending on the order of publications in the list while defining
> > subscription, the column list will change
> > 
> > create subscription sub connection 'port=1 dbname=postgres'
> > publication pub, pub2;
> >
> > For the above, column list will be: (a)
> >
> > create subscription sub connection 'port=1 dbname=postgres'
> > publication pub2, pub;
> >
> > For this one, the column list will be: (a, b)
> > 
> >
> > To avoid this, the column list should be computed based on the final
> > publish_as_relid as we are doing for the row filter.
> >
>
> Hmm, yeah. That seems like a genuine problem - it should not depend on
> the order of publications in the subscription, I guess.
>
> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
> the problem is that we initialize publish_as_relid=relid before the loop
> over publications, and then just update it. So the first iteration
> starts with relid, but the second iteration ends with whatever value is
> set by the first iteration (e.g. the root).
>
> So with the example you posted, we start with
>
>   publish_as_relid = relid = test_part1
>
> but then if the first publication is pubviaroot=true, we update it to
> parent. And in the second iteration, we fail to find the column filter,
> because "parent" (publish_as_relid) is not part of the pub2.
>
> If we do it in the other order, we leave the publish_as_relid value as
> is (and find the filter), and then update it in the second iteration
> (and find the column fil

Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 2:21 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 8, 2022 10:23 PM Amit Kapila  
> > wrote:
> > > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> >
> >
> > > 2. Is there a reason the patch doesn't allow workers to restart via
> > > maybe_reread_subscription() when this new option is changed, if so, then 
> > > let's
> > > add a comment for the same? We currently seem to be restarting the worker 
> > > on
> > > any change via Alter Subscription. If we decide to change it for this 
> > > option as
> > > well then I think we need to accordingly update the current comment: 
> > > "Exit if
> > > any parameter that affects the remote connection was changed." to 
> > > something
> > > like "Exit if any parameter that affects the remote connection or a 
> > > subscription
> > > option was changed..."
> > I thought it's ok without the change at the beginning, but I was wrong.
> > To make this new option aligned with others, I should add one check
> > for this feature. Fixed.
>
> Why do we need to restart the apply worker when disable_on_error is
> changed? It doesn't affect the remote connection at all. I think it
> can be changed without restarting like synchronous_commit option.
>

oh right, I thought that how will we update its value in
MySubscription after a change but as we re-read the pg_subscription
table for the current subscription and update MySubscription, I feel
we don't need to restart it. I haven't tested it but it should work
without a restart.

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 4:14 AM Tomas Vondra
 wrote:
>
> On 3/7/22 22:11, Tomas Vondra wrote:
> >
> > I've pushed this simple fix. Not sure it'll fix the assert failures on
> > skink/locust, though. Given the lack of information it'll be difficult
> > to verify. So let's wait a bit.
> >
>
> I've done about 5000 runs of 'make check' in test_decoding, on two rpi
> machines (one armv7, one aarch64). Not a single assert failure :-(
>
> How come skink/locust hit that in just a couple runs?
>

Is it failed after you pushed a fix? I don't think so or am I missing
something? I feel even if doesn't occur again it would have been
better if we had some theory on how it occurred in the first place
because that would make us feel more confident that we won't have any
related problem left.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:
>
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
>  wrote:
>
> > OK, I reworked this to do the same thing as the row filtering patch.
> >
>
> Thanks, I'll check this.
>

Some assorted comments:
=
1. We don't need to send a column list for the old tuple in case of an
update (similar to delete). It is not required to apply a column
filter for those cases because we ensure that RI must be part of the
column list for updates and deletes.
2.
+ /*
+ * Check if all columns referenced in the column filter are part of
+ * the REPLICA IDENTITY index or not.

I think this comment is reverse. The rule we follow here is that
attributes that are part of RI must be there in a specified column
list. This is used at two places in the patch.
3. get_rel_sync_entry()
{
/* XXX is there a danger of memory leak here? beware */
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ for (int i = 0; i < nelems; i++)
...
}

Similar to the row filter, I think we need to use
entry->cache_expr_cxt to allocate this. There are other usages of
CacheMemoryContext in this part of the code but I think those need to
be also changed and we can do that as a separate patch. If we do the
suggested change then we don't need to separately free columns.
4. I think we don't need the DDL changes in AtExecDropColumn. Instead,
we can change the dependency of columns to NORMAL during publication
commands.
5. There is a reference to check_publication_columns but that function
is removed from the patch.
6.
/*
* If we know everything is replicated and the row filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->rf_valid_for_update && !pubdesc->rf_valid_for_delete)
break;

/*
* If we know everything is replicated and the column filter is invalid
* for update and delete, there is no point to check for other
* publications.
*/
if (pubdesc->pubactions.pubinsert && pubdesc->pubactions.pubupdate &&
pubdesc->pubactions.pubdelete && pubdesc->pubactions.pubtruncate &&
!pubdesc->cf_valid_for_update && !pubdesc->cf_valid_for_delete)
break;

Can we combine these two checks?

I feel this patch needs a more thorough review.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi, attached v32 removed my additional code for maybe_reread_subscription.
>

Thanks, the patch looks good to me. I have made minor edits in the
attached. I am planning to commit this early next week (Monday) unless
there are any other major comments.

-- 
With Regards,
Amit Kapila.


v33-0001-Optionally-disable-subscriptions-on-error.patch
Description: Binary data


Re: Skipping logical replication transactions on subscriber side

2022-03-10 Thread Amit Kapila
On Tue, Mar 1, 2022 at 8:31 PM Masahiko Sawada  wrote:
>
> I've attached an updated patch along with two patches for cfbot tests
> since the main patch (0003) depends on the other two patches. Both
> 0001 and 0002 patches are the same ones I attached on another
> thread[2].
>

Few comments on 0003:
=
1.
+ 
+  
+   subskiplsn pg_lsn
+  
+  
+   Commit LSN of the transaction whose changes are to be skipped,
if a valid
+   LSN; otherwise 0/0.
+  
+ 

Can't this be prepared LSN or rollback prepared LSN? Can we say
Finish/End LSN and then add some details which all LSNs can be there?

2. The conflict resolution explanation needs an update after the
latest commits and we should probably change the commit LSN
terminology as mentioned in the previous point.

3. The text in alter_subscription.sgml looks a bit repetitive to me
(similar to what we have in logical-replication.sgml related to
conflicts). Here also we refer to only commit LSN which needs to be
changed as mentioned in the previous two points.

4.
if (strcmp(lsn_str, "none") == 0)
+ {
+ /* Setting lsn = NONE is treated as resetting LSN */
+ lsn = InvalidXLogRecPtr;
+ }
+ else
+ {
+ /* Parse the argument as LSN */
+ lsn = DatumGetTransactionId(DirectFunctionCall1(pg_lsn_in,
+ CStringGetDatum(lsn_str)));
+
+ if (XLogRecPtrIsInvalid(lsn))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid WAL location (LSN): %s", lsn_str)));

Is there a reason that we don't want to allow setting 0
(InvalidXLogRecPtr) for skip LSN?

5.
+# The subscriber will enter an infinite error loop, so we don't want
+# to overflow the server log with error messages.
+$node_subscriber->append_conf(
+ 'postgresql.conf',
+ qq[
+wal_retrieve_retry_interval = 2s
+]);

Can we change this test to use disable_on_error feature? I am thinking
if the disable_on_error feature got committed first, maybe we can have
one test file for this and disable_on_error feature (something like
conflicts.pl).

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-10 Thread Amit Kapila
On Fri, Mar 11, 2022 at 12:44 AM Tomas Vondra
 wrote:
>
> On 3/10/22 04:09, Amit Kapila wrote:
> > On Wed, Mar 9, 2022 at 3:33 PM Amit Kapila  wrote:
> >>
> >> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
> >>  wrote:
> >>
> >>> OK, I reworked this to do the same thing as the row filtering patch.
> >>>
> >>
> >> Thanks, I'll check this.
> >>
> >
> > Some assorted comments:
> > =
> > 1. We don't need to send a column list for the old tuple in case of an
> > update (similar to delete). It is not required to apply a column
> > filter for those cases because we ensure that RI must be part of the
> > column list for updates and deletes.
>
> I'm not sure which part of the code does this refer to?
>

The below part:
@@ -464,11 +473,11 @@ logicalrep_write_update(StringInfo out,
TransactionId xid, Relation rel,
  pq_sendbyte(out, 'O'); /* old tuple follows */
  else
  pq_sendbyte(out, 'K'); /* old key follows */
- logicalrep_write_tuple(out, rel, oldslot, binary);
+ logicalrep_write_tuple(out, rel, oldslot, binary, columns);
  }

I think here instead of columns, the patch needs to send NULL as it is
already doing in logicalrep_write_delete.

> > 2.
> > + /*
> > + * Check if all columns referenced in the column filter are part of
> > + * the REPLICA IDENTITY index or not.
> >
> > I think this comment is reverse. The rule we follow here is that
> > attributes that are part of RI must be there in a specified column
> > list. This is used at two places in the patch.
>
> Yeah, you're right. Will fix.
>
> > 3. get_rel_sync_entry()
> > {
> > /* XXX is there a danger of memory leak here? beware */
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + for (int i = 0; i < nelems; i++)
> > ...
> > }
> >
> > Similar to the row filter, I think we need to use
> > entry->cache_expr_cxt to allocate this. There are other usages of
> > CacheMemoryContext in this part of the code but I think those need to
> > be also changed and we can do that as a separate patch. If we do the
> > suggested change then we don't need to separately free columns.
>
> I agree a shorter-lived context would be better than CacheMemoryContext,
> but "expr" seems to indicate it's for the expression only, so maybe we
> should rename that.
>

Yeah, we can do that. How about rel_entry_cxt or something like that?
The idea is that eventually, we should move a few other things of
RelSyncEntry like attrmap where we are using CacheMemoryContext under
this context.

> But do we really want a memory context for every
> single entry?
>

Any other better idea?

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-10 Thread Amit Kapila
On Fri, Mar 11, 2022 at 7:26 AM Tomas Vondra
 wrote:
>
> But this does not address tablesync.c :-( That still copies everything,
> because it decides to sync both rels (test_pub_part_1, test_pub_part_2),
> with it's row filter. On older releases this would fail, because we'd
> start two workers:
>

Yeah, this is because of the existing problem where we sync both rels
instead of one. We have fixed some similar existing problems earlier.
Hou-San has reported a similar case in another email [1].

>
> But I find this really weird - I think it's reasonable to expect the
> sync to produce the same result as if the data was inserted and
> replicated, and this just violates that.
>
> Shouldn't tablesync calculate a list of relations in a way that prevents
> such duplicate / overlapping syncs?
>

Yes, I think it is better to fix it separately than to fix it along
with row filter or column filter work.

>
In any case, this sync issue looks
> entirely unrelated to the column filtering patch.
>

Right.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB5716DC2982CC735FDE388804940B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-11 Thread Amit Kapila
On Fri, Mar 11, 2022 at 7:26 AM Tomas Vondra
 wrote:
>
> On 3/10/22 20:10, Tomas Vondra wrote:
> >
> >
> > FWIW I think the reason is pretty simple - pgoutput_row_filter_init is
> > broken. It assumes you can just do this
> >
> > rftuple = SearchSysCache2(PUBLICATIONRELMAP,
> >   ObjectIdGetDatum(entry->publish_as_relid),
> >   ObjectIdGetDatum(pub->oid));
> >
> > if (HeapTupleIsValid(rftuple))
> > {
> > /* Null indicates no filter. */
> > rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
> >   Anum_pg_publication_rel_prqual,
> >   &pub_no_filter);
> > }
> > else
> > {
> > pub_no_filter = true;
> > }
> >
> >
> > and pub_no_filter=true means there's no filter at all. Which is
> > nonsense, because we're using publish_as_relid here - the publication
> > may not include this particular ancestor, in which case we need to just
> > ignore this publication.
> >
> > So yeah, this needs to be reworked.
> >
>
> I spent a bit of time looking at this, and I think a minor change in
> get_rel_sync_entry() fixes this - it's enough to ensure rel_publications
> only includes publications that actually include publish_as_relid.
>

Thanks for looking into this. I think in the first patch before
calling get_partition_ancestors() we need to ensure it is a partition
(the call expects that) and pubviaroot is true. I think it would be
good if we can avoid an additional call to get_partition_ancestors()
as it could be costly. I wonder why it is not sufficient to ensure
that publish_as_relid exists after ancestor in ancestors list before
assigning the ancestor to publish_as_relid? This only needs to be done
in case of (if (!publish)). I have not tried this so I could be wrong.

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-03-11 Thread Amit Kapila
On Tue, Mar 8, 2022 at 11:59 PM Tomas Vondra
 wrote:
>
> On 3/7/22 22:25, Tomas Vondra wrote:
> >>
> >> Interesting. I can think of one reason that might cause this - we log
> >> the first sequence increment after a checkpoint. So if a checkpoint
> >> happens in an unfortunate place, there'll be an extra WAL record. On
> >> slow / busy machines that's quite possible, I guess.
> >>
> >
> > I've tweaked the checkpoint_interval to make checkpoints more aggressive
> > (set it to 1s), and it seems my hunch was correct - it produces failures
> > exactly like this one. The best fix probably is to just disable decoding
> > of sequences in those tests that are not aimed at testing sequence decoding.
> >
>
> I've pushed a fix for this, adding "include-sequences=0" to a couple
> test_decoding tests, which were failing with concurrent checkpoints.
>
> Unfortunately, I realized we have a similar issue in the "sequences"
> tests too :-( Imagine you do a series of sequence increments, e.g.
>
>   SELECT nextval('s') FROM generate_sequences(1,100);
>
> If there's a concurrent checkpoint, this may add an extra WAL record,
> affecting the decoded output (and also the data stored in the sequence
> relation itself). Not sure what to do about this ...
>

I am also not sure what to do for it but maybe if in some way we can
increase checkpoint timeout or other parameters for these tests then
it would reduce the chances of such failures. The other idea could be
to perform checkpoint before the start of tests to reduce the
possibility of another checkpoint.

-- 
With Regards,
Amit Kapila.




Re: logical decoding and replication of sequences

2022-03-11 Thread Amit Kapila
On Fri, Mar 11, 2022 at 5:04 PM Amit Kapila  wrote:
>
> On Tue, Mar 8, 2022 at 11:59 PM Tomas Vondra
>  wrote:
> >
> > On 3/7/22 22:25, Tomas Vondra wrote:
> > >>
> > >> Interesting. I can think of one reason that might cause this - we log
> > >> the first sequence increment after a checkpoint. So if a checkpoint
> > >> happens in an unfortunate place, there'll be an extra WAL record. On
> > >> slow / busy machines that's quite possible, I guess.
> > >>
> > >
> > > I've tweaked the checkpoint_interval to make checkpoints more aggressive
> > > (set it to 1s), and it seems my hunch was correct - it produces failures
> > > exactly like this one. The best fix probably is to just disable decoding
> > > of sequences in those tests that are not aimed at testing sequence 
> > > decoding.
> > >
> >
> > I've pushed a fix for this, adding "include-sequences=0" to a couple
> > test_decoding tests, which were failing with concurrent checkpoints.
> >
> > Unfortunately, I realized we have a similar issue in the "sequences"
> > tests too :-( Imagine you do a series of sequence increments, e.g.
> >
> >   SELECT nextval('s') FROM generate_sequences(1,100);
> >
> > If there's a concurrent checkpoint, this may add an extra WAL record,
> > affecting the decoded output (and also the data stored in the sequence
> > relation itself). Not sure what to do about this ...
> >
>
> I am also not sure what to do for it but maybe if in some way we can
> increase checkpoint timeout or other parameters for these tests then
> it would reduce the chances of such failures. The other idea could be
> to perform checkpoint before the start of tests to reduce the
> possibility of another checkpoint.
>

One more thing, I notice while checking the commit for this feature is
that the below include seems to be out of order:
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -42,6 +42,7 @@
 #include "replication/reorderbuffer.h"
 #include "replication/snapbuild.h"
 #include "storage/standby.h"
+#include "commands/sequence.h"

-- 
With Regards,
Amit Kapila.




Re: Issue with pg_stat_subscription_stats

2022-03-11 Thread Amit Kapila
On Sat, Mar 12, 2022 at 2:14 AM Melanie Plageman
 wrote:
>
> So, I noticed that pg_stat_reset_subscription_stats() wasn't working
> properly, and, upon further investigation, I'm not sure the view
> pg_stat_subscription_stats is being properly populated.
>

I have tried the below scenario based on this:
Step:1 Create some data that generates conflicts and lead to apply
failures and then check in the view:

postgres=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count | stats_reset
---+-+---+--+-
 16389 | sub1| 4 |0 |
(1 row)

Step-2: Reset the view
postgres=# select * from pg_stat_reset_subscription_stats(16389);
 pg_stat_reset_subscription_stats
--

(1 row)

Step-3: Again, check the view:
postgres=# select * from pg_stat_subscription_stats;
 subid | subname | apply_error_count | sync_error_count |   stats_reset
---+-+---+--+--
 16389 | sub1| 0 |0 | 2022-03-12
08:21:39.156971+05:30
(1 row)

The stats_reset time seems to be populated. Similarly, I have tried by
passing NULL to pg_stat_reset_subscription_stats and it works. I think
I am missing something here, can you please explain the exact
scenario/steps where you observed that this API is not working.


-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-11 Thread Amit Kapila
On Fri, Mar 11, 2022 at 6:20 PM Tomas Vondra
 wrote:
>
> On 3/11/22 10:52, Amit Kapila wrote:
> > On Fri, Mar 11, 2022 at 7:26 AM Tomas Vondra
> >  wrote:
> >>
> >> On 3/10/22 20:10, Tomas Vondra wrote:
> >>>
> >>>
> >>> FWIW I think the reason is pretty simple - pgoutput_row_filter_init is
> >>> broken. It assumes you can just do this
> >>>
> >>> rftuple = SearchSysCache2(PUBLICATIONRELMAP,
> >>>   ObjectIdGetDatum(entry->publish_as_relid),
> >>>   ObjectIdGetDatum(pub->oid));
> >>>
> >>> if (HeapTupleIsValid(rftuple))
> >>> {
> >>> /* Null indicates no filter. */
> >>> rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
> >>>   Anum_pg_publication_rel_prqual,
> >>>   &pub_no_filter);
> >>> }
> >>> else
> >>> {
> >>> pub_no_filter = true;
> >>> }
> >>>
> >>>
> >>> and pub_no_filter=true means there's no filter at all. Which is
> >>> nonsense, because we're using publish_as_relid here - the publication
> >>> may not include this particular ancestor, in which case we need to just
> >>> ignore this publication.
> >>>
> >>> So yeah, this needs to be reworked.
> >>>
> >>
> >> I spent a bit of time looking at this, and I think a minor change in
> >> get_rel_sync_entry() fixes this - it's enough to ensure rel_publications
> >> only includes publications that actually include publish_as_relid.
> >>
> >
> > Thanks for looking into this. I think in the first patch before
> > calling get_partition_ancestors() we need to ensure it is a partition
> > (the call expects that) and pubviaroot is true.
>
> Does the call really require that?
>

There may not be any harm but I have mentioned it because (a) the
comments atop get_partition_ancestors(...it should only be called when
it is known that the relation is a partition.) indicates the same; (b)
all existing callers seems to use it only for partitions.

> Also, I'm not sure why we'd need to
> look at pubviaroot - that's already considered earlier when calculating
> publish_as_relid, here we just need to know the relationship of the two
> OIDs (if one is ancestor/child of the other).
>

I thought of avoiding calling get_partition_ancestors when pubviaroot
is not set. It will unnecessary check the whole hierarchy for
partitions even when it is not required. I agree that this is not a
common code path but still felt why do it needlessly?

> > I think it would be
> > good if we can avoid an additional call to get_partition_ancestors()
> > as it could be costly.
>
> Maybe. OTOH we only should do this only very rarely anyway.
>
> > I wonder why it is not sufficient to ensure
> > that publish_as_relid exists after ancestor in ancestors list before
> > assigning the ancestor to publish_as_relid? This only needs to be done
> > in case of (if (!publish)). I have not tried this so I could be wrong.
> >
>
> I'm not sure what exactly are you proposing. Maybe try coding it? That's
> probably faster than trying to describe what the code might do ...
>

Okay, please find attached. I have done basic testing of this, if we
agree with this approach then this will require some more testing.

-- 
With Regards,
Amit Kapila.


v1-0001-fixup-publish_as_relid.patch
Description: Binary data


v1-0002-fixup-row-filter-publications.patch
Description: Binary data


Re: Column Filtering in Logical Replication

2022-03-14 Thread Amit Kapila
On Mon, Mar 14, 2022 at 2:37 AM Tomas Vondra
 wrote:
>
> On 3/12/22 05:30, Amit Kapila wrote:
> >> ...
> >
> > Okay, please find attached. I have done basic testing of this, if we
> > agree with this approach then this will require some more testing.
> >
>
> Thanks, the proposed changes seem like a clear improvement, so I've
> added them, with some minor tweaks (mostly to comments).
>

One minor point: Did you intentionally remove
list_free(rel_publications) before resetting the list from the second
patch? The memory for rel_publications is allocated in
TopTransactionContext, so a large transaction touching many relations
will only free this at end of the transaction which may not be a big
deal as we don't do this every time. We free this list a few lines
down in successful case so this appears slightly odd to me but I am
fine if you think it doesn't matter.


-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-14 Thread Amit Kapila
On Thu, Mar 10, 2022 at 12:04 PM Amit Kapila  wrote:
>
> On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Hi, attached v32 removed my additional code for maybe_reread_subscription.
> >
>
> Thanks, the patch looks good to me. I have made minor edits in the
> attached. I am planning to commit this early next week (Monday) unless
> there are any other major comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-14 Thread Amit Kapila
On Mon, Mar 14, 2022 at 5:42 PM Tomas Vondra
 wrote:
>
> On 3/14/22 12:12, houzj.f...@fujitsu.com wrote:
> > On Monday, March 14, 2022 5:08 AM Tomas Vondra 
> >  wrote:
>
> Anyway, the fix does not address tablesync, as explained in [1]. I'm not
> sure what to do about it - in principle, we could calculate which
> relations to sync, and then eliminate "duplicates" (i.e. relations where
> we are going to sync an ancestor).
>

As mentioned in my previous email [1], this appears to be a base code
issue (even without row filter or column filter work), so it seems
better to deal with it separately. It has been reported separately as
well [2] where we found some similar issues.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1LSb-xrvGEm3ShaRA%3DMkdii2d%2B4vqh9DGPvVDA%2BD9ibYw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/os0pr01mb5716dc2982cc735fde38880494...@os0pr01mb5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Issue with pg_stat_subscription_stats

2022-03-14 Thread Amit Kapila
On Sun, Mar 13, 2022 at 1:45 AM Andres Freund  wrote:
>
> On 2022-03-12 08:28:35 +0530, Amit Kapila wrote:
> > On Sat, Mar 12, 2022 at 2:14 AM Melanie Plageman
> >  wrote:
> > >
> > > So, I noticed that pg_stat_reset_subscription_stats() wasn't working
> > > properly, and, upon further investigation, I'm not sure the view
> > > pg_stat_subscription_stats is being properly populated.
> > >
> >
> > I have tried the below scenario based on this:
> > Step:1 Create some data that generates conflicts and lead to apply
> > failures and then check in the view:
>
> I think the problem is present when there was *no* conflict
> previously. Because nothing populates the stats entry without an error, the
> reset doesn't have anything to set the stats_reset field in, which then means
> that the stats_reset field is NULL even though stats have been reset.
>
> I'll just repeat what I've said before: Making variable numbered stats
> individiually resettable is a bad idea.
>

IIUC correctly, we are doing this via
pg_stat_reset_single_table_counters(),
pg_stat_reset_single_function_counters(),
pg_stat_reset_replication_slot(), pg_stat_reset_subscription_stats().
So, if we want to do something in this regrard then it is probably
better to do for all.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-14 Thread Amit Kapila
On Mon, Mar 14, 2022 at 4:42 PM houzj.f...@fujitsu.com
 wrote:
>
> On Monday, March 14, 2022 5:08 AM Tomas Vondra 
>  wrote:
> >
> > On 3/12/22 05:30, Amit Kapila wrote:
> > >> ...
> > >
> > > Okay, please find attached. I have done basic testing of this, if we
> > > agree with this approach then this will require some more testing.
> > >
> >
> > Thanks, the proposed changes seem like a clear improvement, so I've
> > added them, with some minor tweaks (mostly to comments).
>
> Hi,
>
> Thanks for updating the patches !
> And sorry for the row filter bug caused by my mistake.
>
> I looked at the two fixup patches. I am thinking would it be better if we
> add one testcase for these two bugs? Maybe like the attachment.
>

Your tests look good to me. We might want to add some comments for
each test but I guess that can be done before committing. Tomas, it
seems you are planning to push these bug fixes, do let me know if you
want me to take care of these while you focus on the main patch? I
think the first patch needs to be backpatched till 13 and the second
one is for just HEAD.

-- 
With Regards,
Amit Kapila.




Re: Column Filtering in Logical Replication

2022-03-14 Thread Amit Kapila
On Mon, Mar 14, 2022 at 7:02 PM Tomas Vondra
 wrote:
>
> On 3/14/22 13:47, Amit Kapila wrote:
> > On Mon, Mar 14, 2022 at 5:42 PM Tomas Vondra
> >  wrote:
> >>
> >> On 3/14/22 12:12, houzj.f...@fujitsu.com wrote:
> >>> On Monday, March 14, 2022 5:08 AM Tomas Vondra 
> >>>  wrote:
> >>
> >> Anyway, the fix does not address tablesync, as explained in [1]. I'm not
> >> sure what to do about it - in principle, we could calculate which
> >> relations to sync, and then eliminate "duplicates" (i.e. relations where
> >> we are going to sync an ancestor).
> >>
> >
> > As mentioned in my previous email [1], this appears to be a base code
> > issue (even without row filter or column filter work), so it seems
> > better to deal with it separately. It has been reported separately as
> > well [2] where we found some similar issues.
> >
>
> Right. I don't want to be waiting for that fix either, that'd block this
> patch unnecessarily. If there are no other comments, I'll go ahead,
> polish the existing patches a bit more and get them committed. We can
> worry about this pre-existing issue later.
>

I think the first two patches are ready to go. I haven't read the
latest version in detail but I have in mind that we want to get this
in for PG-15.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  wrote:
>
> I've attached an updated version patch.
>

Review:
===
1.
+++ b/doc/src/sgml/logical-replication.sgml
@@ -366,15 +366,19 @@ CONTEXT:  processing remote data for replication
origin "pg_16395" during "INSER
transaction, the subscription needs to be disabled temporarily by
ALTER SUBSCRIPTION ... DISABLE first or
alternatively, the
subscription can be used with the
disable_on_error option.
-   Then, the transaction can be skipped by calling the
+   Then, the transaction can be skipped by using
+   ALTER SUBSCRITPION ... SKIP with the finish LSN
+   (i.e., LSN 0/14C0378). After that the replication
+   can be resumed by ALTER SUBSCRIPTION ... ENABLE.
+   Alternatively, the transaction can also be skipped by calling the

Do we really need to disable the subscription for the skip feature? I
think that is required for origin_advance. Also, probably, we can say
Finish LSN could be Prepare LSN, Commit LSN, etc.

2.
+ /*
+ * Quick return if it's not requested to skip this transaction. This
+ * function is called every start of applying changes and we assume that
+ * skipping the transaction is not used in many cases.
+ */
+ if (likely(XLogRecPtrIsInvalid(MySubscription->skiplsn) ||

The second part of this comment (especially ".. every start of
applying changes ..") sounds slightly odd to me. How about changing it
to: "This function is called for every remote transaction and we
assume that skipping the transaction is not used in many cases."

3.
+
+ ereport(LOG,
+ errmsg("start skipping logical replication transaction which
finished at %X/%X",
...
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",

No need of 'which' in above LOG messages. I think the message will be
clear without the use of which in above message.

4.
+ ereport(LOG,
+ (errmsg("done skipping logical replication transaction which
finished at %X/%X",
+ LSN_FORMAT_ARGS(skip_xact_finish_lsn;
+
+ /* Stop skipping changes */
+ skip_xact_finish_lsn = InvalidXLogRecPtr;

Let's reverse the order of these statements to make them consistent
with the corresponding maybe_start_* function.

5.
+
+ if (myskiplsn != finish_lsn)
+ ereport(WARNING,
+ errmsg("skip-LSN of logical replication subscription \"%s\"
cleared", MySubscription->name),

Shouldn't this be a LOG instead of a WARNING as this will be displayed
only in server logs and by background apply worker?

6.
@@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
  TupleTableSlot *remoteslot;
  MemoryContext oldctx;

- if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+ if (is_skipping_changes() ||

Is there a reason to keep the skip_changes check here and in other DML
operations instead of at one central place in apply_dispatch?

7.
+ /*
+ * Start a new transaction to clear the subskipxid, if not started
+ * yet. The transaction is committed below.
+ */
+ if (!IsTransactionState())

I think the second part of the comment: "The transaction is committed
below." is not required.

8.
+ XLogRecPtr subskiplsn; /* All changes which finished at this LSN are
+ * skipped */
+
 #ifdef CATALOG_VARLEN /* variable-length fields start here */
  /* Connection string to the publisher */
  text subconninfo BKI_FORCE_NOT_NULL;
@@ -109,6 +112,8 @@ typedef struct Subscription
  bool disableonerr; /* Indicates if the subscription should be
  * automatically disabled if a worker error
  * occurs */
+ XLogRecPtr skiplsn; /* All changes which finished at this LSN are
+ * skipped */

No need for 'which' in the above comments.

9.
Can we merge 029_disable_on_error in 030_skip_xact and name it as
029_on_error (or 029_on_error_skip_disable or some variant of it)?
Both seem to be related features. I am slightly worried at the pace at
which the number of test files are growing in subscription test.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
>
> On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > wrote:
> > >
> >
> > 6.
> > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> >   TupleTableSlot *remoteslot;
> >   MemoryContext oldctx;
> >
> > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > + if (is_skipping_changes() ||
> >
> > Is there a reason to keep the skip_changes check here and in other DML
> > operations instead of at one central place in apply_dispatch?
>
> Since we already have the check of applying the change on the spot at
> the beginning of the handlers I feel it's better to add
> is_skipping_changes() to the check than add a new if statement to
> apply_dispatch, but do you prefer to check it in one central place in
> apply_dispatch?
>

I think either way is fine. I just wanted to know the reason, your
current change looks okay to me.

Some questions/comments
==
1. IIRC, earlier, we thought of allowing to use of this option (SKIP)
only for superusers (as this can lead to inconsistent data if not used
carefully) but I don't see that check in the latest patch. What is the
reason for the same?

2.
+ /*
+ * Update the subskiplsn of the tuple to InvalidXLogRecPtr.

I think we can change the above part of the comment to "Clear subskiplsn."

3.
+ * Since we already have

Isn't it better to say here: Since we have already ...?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Tue, Mar 15, 2022 at 7:30 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, March 15, 2022 3:13 PM Masahiko Sawada  
> wrote:
> > I've attached an updated version patch.
>
> A couple of minor comments on v14.
>
> (1) apply_handle_commit_internal
>
>
> +   if (is_skipping_changes())
> +   {
> +   stop_skipping_changes();
> +
> +   /*
> +* Start a new transaction to clear the subskipxid, if not 
> started
> +* yet. The transaction is committed below.
> +*/
> +   if (!IsTransactionState())
> +   StartTransactionCommand();
> +   }
> +
>
> I suppose we can move this condition check and stop_skipping_changes() call
> to the inside of the block we enter when IsTransactionState() returns true.
>
> As the comment of apply_handle_commit_internal() mentions,
> it's the helper function for apply_handle_commit() and
> apply_handle_stream_commit().
>
> Then, I couldn't think that both callers don't open
> a transaction before the call of apply_handle_commit_internal().
> For applying spooled messages, we call begin_replication_step as well.
>
> I can miss something, but timing when we receive COMMIT message
> without opening a transaction, would be the case of empty transactions
> where the subscription (and its subscription worker) is not interested.
>

I think when we skip non-streamed transactions we don't start a
transaction. So, if we do what you are suggesting, we will miss to
clear the skip_lsn after skipping the transaction.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > 6.
> > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > >   TupleTableSlot *remoteslot;
> > >   MemoryContext oldctx;
> > >
> > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > + if (is_skipping_changes() ||
> > >
> > > Is there a reason to keep the skip_changes check here and in other DML
> > > operations instead of at one central place in apply_dispatch?
> >
> > Since we already have the check of applying the change on the spot at
> > the beginning of the handlers I feel it's better to add
> > is_skipping_changes() to the check than add a new if statement to
> > apply_dispatch, but do you prefer to check it in one central place in
> > apply_dispatch?
> >
>
> I think either way is fine. I just wanted to know the reason, your
> current change looks okay to me.
>

I feel it is better to at least add a comment suggesting that we skip
only data modification changes because the other part of message
handle_stream_* is there in other message handlers as well. It will
make it easier to add a similar check in future message handlers.

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 7:58 AM Amit Kapila  wrote:
>
> On Wed, Mar 16, 2022 at 6:03 AM Masahiko Sawada  wrote:
> >
> > On Tue, Mar 15, 2022 at 7:18 PM Amit Kapila  wrote:
> > >
> > > On Tue, Mar 15, 2022 at 11:43 AM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > 6.
> > > @@ -1583,7 +1649,8 @@ apply_handle_insert(StringInfo s)
> > >   TupleTableSlot *remoteslot;
> > >   MemoryContext oldctx;
> > >
> > > - if (handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
> > > + if (is_skipping_changes() ||
> > >
> > > Is there a reason to keep the skip_changes check here and in other DML
> > > operations instead of at one central place in apply_dispatch?
> >
> > Since we already have the check of applying the change on the spot at
> > the beginning of the handlers I feel it's better to add
> > is_skipping_changes() to the check than add a new if statement to
> > apply_dispatch, but do you prefer to check it in one central place in
> > apply_dispatch?
> >
>
> I think either way is fine. I just wanted to know the reason, your
> current change looks okay to me.
>
> Some questions/comments
> ==
>

Some cosmetic suggestions:
==
1.
+# Create subscriptions. Both subscription sets disable_on_error to on
+# so that they get disabled when a conflict occurs.
+$node_subscriber->safe_psql(
+ 'postgres',
+ qq[
+CREATE SUBSCRIPTION $subname CONNECTION '$publisher_connstr'
PUBLICATION tap_pub WITH (streaming = on, two_phase = on,
disable_on_error = on);
+]);

I don't understand what you mean by 'Both subscription ...' in the
above comments.

2.
+ # Check the log indicating that successfully skipped the transaction,

How about slightly rephrasing this to: "Check the log to ensure that
the transaction is skipped"?

-- 
With Regards,
Amit Kapila.




Re: Tablesync early exit

2022-03-15 Thread Amit Kapila
On Mon, Aug 30, 2021 at 8:50 AM Peter Smith  wrote:
>
> Patch v2 is the same; it only needed re-basing to the latest HEAD.
>

Why do you think it is correct to exit before trying to receive any
message? How will we ensure whether the apply worker has processed any
message? At the beginning of function LogicalRepApplyLoop(),
last_received is the LSN where the copy has finished in the case of
tablesync worker. I think we need to receive the message before trying
to ensure whether we have synced with the apply worker or not.

-- 
With Regards,
Amit Kapila.




Re: Tablesync early exit

2022-03-15 Thread Amit Kapila
On Wed, Mar 16, 2022 at 1:16 AM Greg Stark  wrote:
>
> This patch has been through five CFs without any review. It's a very
> short patch and I'm assuming the only issue is that it's not clear
> whether it's a good idea or not and there are few developers familiar
> with this area of code?
>
> Amit, it looks like you were the one who asked for it to be split off
> from the logical decoding of 2PC patch in [1]. Can you summarize what
> questions remain here? Should we just commit this or is there any
> issue that needs to be debated?
>

Looking closely at this, I am not sure whether this is a good idea or
not. Responded accordingly.

-- 
With Regards,
Amit Kapila.




  1   2   3   4   5   6   7   8   9   10   >