Re: Proposal: Conflict log history table for Logical Replication

shveta malik Mon, 01 Dec 2025 01:28:20 -0800

On Mon, Dec 1, 2025 at 2:04 PM Dilip Kumar <[email protected]> wrote:
>
> On Mon, Dec 1, 2025 at 1:57 PM shveta malik <[email protected]> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> 
> > > wrote:
> > > >
> > > > > Few observations related to publication.
> > > > > ------------------------------
> > >
> > > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > > conflict log tables it should be good enough if we restrict it when
> > > ALL TABLE options are used, I don't think we need to put extra effort
> > > to completely restrict it even if users want to explicitly list it
> > > into the publication.
> > >
> > > > >
> > > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > > >
> > > > > 1)
> > > > > 'select pg_relation_is_publishable(clt)' returns true for 
> > > > > conflict-log table.
> > >
> > > This function is used while publishing every single change and I don't
> > > think we want to add a cost to check each subscription to identify
> > > whether the table is listed as CLT.
> > >
> > > > > 2)
> > > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > > show that for clt.
> > >
> > > I think we should fix this.
> > >
> > > > > 3)
> > > > > I am able to create a publication for clt table, should it be allowed?
> > >
> > > I believe we should not do any specific handling to restrict this but
> > > I am open for the opinions.
> > >
> > > > > create subscription sub1 connection '...' publication pub1
> > > > > WITH(conflict_log_table='clt');
> > > > > create publication pub3 for table clt;
> > > > >
> > > > > 4)
> > > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > > will also get clt as non-publishable always (and will solve a few of
> > > > > the above issues I think). IIUC, there is no place where we want to
> > > > > mark CLT as publishable or is there any?
> > >
> > > IMHO the main reason is performance.
> > >
> > > > > 5) Also, I feel we can add some documentation now to help others to
> > > > > understand/review the patch better without going through the long
> > > > > thread.
> > >
> > > Make sense, I will do that in the next version.
> > >
> > > > >
> > > > > Few observations related to conflict-logging:
> > > > > ------------------------------
> > > > > 1)
> > > > > I found that for the conflicts which ultimately result in Error, we do
> > > > > not insert any conflict-record in clt.
> > > > >
> > > > > a)
> > > > > Example: insert_exists, update_Exists
> > > > > create table tab1 (i int primary key, j int);
> > > > > sub: insert into tab1 values(30,10);
> > > > > pub: insert into tab1 values(30,10);
> > > > > ERROR:  conflict detected on relation "public.tab1": 
> > > > > conflict=insert_exists
> > > > > No record in clt.
> > > > >
> > > > > sub:
> > > > > <some pre-data needed>
> > > > > update tab1 set i=40 where i = 30;
> > > > > pub: update tab1 set i=40 where i = 20;
> > > > > ERROR:  conflict detected on relation "public.tab1": 
> > > > > conflict=update_exists
> > > > > No record in clt.
> > >
> > > Yeah that interesting need to put thought on how to commit this record
> > > when an outer transaction is aborted as we do not have autonomous
> > > transactions which are generally used for this kind of logging.  But
> > > we can explore more options like inserting into conflict log tables
> > > outside the outer transaction.
> > >
> > > > > b)
> > > > > Another question related to this is, since these conflicts (which
> > > > > results in error) keep on happening until user resolves these or skips
> > > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > > time in clt as well. Thoughts?
> > >
> > > I think it make sense to insert every time we see the conflict, but it
> > > would be good to have opinion from others as well.
> >
> > Since there is a concern that multiple rows for
> > multiple_unique_conflicts can cause data-bloat, it made me rethink
> > that this is actually more prone to causing data-bloat if it is not
> > resolved on time, as it seems a far more frequent scenario. So shall
> > we keep inserting the record or insert it once and avoid inserting it
> > again based on lsn?  Thoughts?
>
> I agree, this is the real problem related to bloat so maybe we can see
> if the same tuple exists we can avoid inserting it again, although I
> haven't put thought on how to we distinguish between the new conflict
> on the same row vs the same conflict being inserted multiple times due
> to worker restart.
>


If there is consensus on this approach, IMO, it appears safe to rely
on 'remote_origin' and 'remote_commit_lsn' as the comparison keys for
the given 'conflict_type' before we insert a new record.

thanks
Shveta

Re: Proposal: Conflict log history table for Logical Replication

Reply via email to