Re: Proposal: Conflict log history table for Logical Replication

Amit Kapila Fri, 28 Nov 2025 04:20:45 -0800

On Tue, Nov 18, 2025 at 3:40 PM shveta malik <[email protected]> wrote:
>
> On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote:
> >
> > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> wrote:
> > >
> > > 3)
> > > We also need to think how we are going to display the info in case of
> > > multiple_unique_conflicts as there could be multiple local and remote
> > > tuples conflicting for one single operation. Example:
> > >
> > > create table conf_tab (a int primary key, b int unique, c int unique);
> > >
> > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > >
> > > pub: insert into conf_tab values (2,3,4);
> > >
> > > ERROR:  conflict detected on relation "public.conf_tab":
> > > conflict=multiple_unique_conflicts
> > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > during message type "INSERT" for replication target relation
> > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > >
> > > Currently in clt, we have singular terms such as 'key_tuple',
> > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > But it does not look reasonable to have multiple rows inserted for a
> > > single conflict raised. I will think more about this.
> >
> > Currently I am inserting multiple records in the conflict history
> > table, the same as each tuple is logged, but couldn't find any better
> > way for this.
> >


The biggest drawback of this approach is data bloat. The incoming data
row will be stored multiple times.

> > Another option is to use an array of tuples instead of a
> > single tuple but not sure this might make things more complicated to
> > process by any external tool.
>
> It’s arguable and hard to say what the correct behaviour should be.
> I’m slightly leaning toward having a single row per conflict.
>

Yeah, it is better to either have a single row per conflict or have
two tables conflict_history and conflict_history_details to avoid data
bloat as pointed above. For example, two-table approach could be:

1. The Header Table (Incoming Data)
This stores the data that tried to be applied.
SQL
CREATE TABLE conflict_header (
    conflict_id     SERIAL PRIMARY KEY,
    source_tx_id    VARCHAR(100),    -- Transaction ID from source
    table_name      VARCHAR(100),
    operation       CHAR(1),         -- 'I' for Insert
    incoming_data   JSONB,           -- Store the incoming row as JSON
...
);

2. The Detail Table (Existing Conflicting Data)
This stores the actual rows currently in the database that caused the
violations.
CREATE TABLE conflict_details (
    detail_id       SERIAL PRIMARY KEY,
    conflict_id     INT REFERENCES conflict_header(conflict_id),
    constraint_name/key_tuple VARCHAR(100),
    conflicting_row_data JSONB       -- The existing row in the DB
that blocked the insert
);

Please don't consider these exact columns; you can use something on
the lines of what is proposed in the patch. This is just to show how
the conflict data can be rearranged. Now, one argument against this is
that users need to use JOIN to query data but still better than
bloating the table. The idea to store in a single table could be
changed to have columns like violated_constraints TEXT[],      --
e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
[{"const": "uk_email", "val": "[email protected]"}, ...]. If we want to store
multiple conflicting tuples in a single column, we need to ensure it
is queryable via a JSONB column. The point in favour of a single JSONB
column to combine multiple conflicting tuples is that we need this
combination only for one kind of conflict.

Both the approaches have their pros and cons. I feel we should dig a
bit deeper for both by laying out details for each method and see what
others think.

-- 
With Regards,
Amit Kapila.

Re: Proposal: Conflict log history table for Logical Replication

Reply via email to