Re: Proposal: Conflict log history table for Logical Replication

Dilip Kumar Mon, 01 Dec 2025 00:34:47 -0800

On Mon, Dec 1, 2025 at 1:57 PM shveta malik <[email protected]> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> wrote:
> > >
> > > > Few observations related to publication.
> > > > ------------------------------
> >
> > Thanks Shveta, for testing and sharing your thoughts.  IMHO for
> > conflict log tables it should be good enough if we restrict it when
> > ALL TABLE options are used, I don't think we need to put extra effort
> > to completely restrict it even if users want to explicitly list it
> > into the publication.
> >
> > > >
> > > > (In the below comments, clt/CLT implies Conflict Log Table)
> > > >
> > > > 1)
> > > > 'select pg_relation_is_publishable(clt)' returns true for conflict-log 
> > > > table.
> >
> > This function is used while publishing every single change and I don't
> > think we want to add a cost to check each subscription to identify
> > whether the table is listed as CLT.
> >
> > > > 2)
> > > > '\d+ clt'   shows all-tables publication name. I feel we should not
> > > > show that for clt.
> >
> > I think we should fix this.
> >
> > > > 3)
> > > > I am able to create a publication for clt table, should it be allowed?
> >
> > I believe we should not do any specific handling to restrict this but
> > I am open for the opinions.
> >
> > > > create subscription sub1 connection '...' publication pub1
> > > > WITH(conflict_log_table='clt');
> > > > create publication pub3 for table clt;
> > > >
> > > > 4)
> > > > Is there a reason we have not made '!IsConflictHistoryRelid' check as
> > > > part of is_publishable_class() itself? If we do so, other code-logics
> > > > will also get clt as non-publishable always (and will solve a few of
> > > > the above issues I think). IIUC, there is no place where we want to
> > > > mark CLT as publishable or is there any?
> >
> > IMHO the main reason is performance.
> >
> > > > 5) Also, I feel we can add some documentation now to help others to
> > > > understand/review the patch better without going through the long
> > > > thread.
> >
> > Make sense, I will do that in the next version.
> >
> > > >
> > > > Few observations related to conflict-logging:
> > > > ------------------------------
> > > > 1)
> > > > I found that for the conflicts which ultimately result in Error, we do
> > > > not insert any conflict-record in clt.
> > > >
> > > > a)
> > > > Example: insert_exists, update_Exists
> > > > create table tab1 (i int primary key, j int);
> > > > sub: insert into tab1 values(30,10);
> > > > pub: insert into tab1 values(30,10);
> > > > ERROR:  conflict detected on relation "public.tab1": 
> > > > conflict=insert_exists
> > > > No record in clt.
> > > >
> > > > sub:
> > > > <some pre-data needed>
> > > > update tab1 set i=40 where i = 30;
> > > > pub: update tab1 set i=40 where i = 20;
> > > > ERROR:  conflict detected on relation "public.tab1": 
> > > > conflict=update_exists
> > > > No record in clt.
> >
> > Yeah that interesting need to put thought on how to commit this record
> > when an outer transaction is aborted as we do not have autonomous
> > transactions which are generally used for this kind of logging.  But
> > we can explore more options like inserting into conflict log tables
> > outside the outer transaction.
> >
> > > > b)
> > > > Another question related to this is, since these conflicts (which
> > > > results in error) keep on happening until user resolves these or skips
> > > > these or 'disable_on_error' is set. Then are we going to insert these
> > > > multiple times? We do count these in 'confl_insert_exists' and
> > > > 'confl_update_exists' everytime, so it makes sense to log those each
> > > > time in clt as well. Thoughts?
> >
> > I think it make sense to insert every time we see the conflict, but it
> > would be good to have opinion from others as well.
>
> Since there is a concern that multiple rows for
> multiple_unique_conflicts can cause data-bloat, it made me rethink
> that this is actually more prone to causing data-bloat if it is not
> resolved on time, as it seems a far more frequent scenario. So shall
> we keep inserting the record or insert it once and avoid inserting it
> again based on lsn?  Thoughts?


I agree, this is the real problem related to bloat so maybe we can see
if the same tuple exists we can avoid inserting it again, although I
haven't put thought on how to we distinguish between the new conflict
on the same row vs the same conflict being inserted multiple times due
to worker restart.

-- 
Regards,
Dilip Kumar
Google

Re: Proposal: Conflict log history table for Logical Replication

Reply via email to