Re: Proposal: Conflict log history table for Logical Replication

Dilip Kumar Sun, 30 Nov 2025 20:41:49 -0800

On Fri, Nov 28, 2025 at 5:50 PM Amit Kapila <[email protected]> wrote:
>
> On Tue, Nov 18, 2025 at 3:40 PM shveta malik <[email protected]> wrote:
> >
> > On Thu, Nov 13, 2025 at 9:17 PM Dilip Kumar <[email protected]> wrote:
> > >
> > > On Thu, Nov 13, 2025 at 2:39 PM shveta malik <[email protected]> 
> > > wrote:
> > > >
> > > > 3)
> > > > We also need to think how we are going to display the info in case of
> > > > multiple_unique_conflicts as there could be multiple local and remote
> > > > tuples conflicting for one single operation. Example:
> > > >
> > > > create table conf_tab (a int primary key, b int unique, c int unique);
> > > >
> > > > sub: insert into conf_tab values (2,2,2), (3,3,3), (4,4,4);
> > > >
> > > > pub: insert into conf_tab values (2,3,4);
> > > >
> > > > ERROR:  conflict detected on relation "public.conf_tab":
> > > > conflict=multiple_unique_conflicts
> > > > DETAIL:  Key already exists in unique index "conf_tab_pkey", modified
> > > > locally in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (a)=(2); existing local row (2, 2, 2); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_b_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (b)=(3); existing local row (3, 3, 3); remote row (2, 3, 4).
> > > > Key already exists in unique index "conf_tab_c_key", modified locally
> > > > in transaction 874 at 2025-11-12 14:35:13.452143+05:30.
> > > > Key (c)=(4); existing local row (4, 4, 4); remote row (2, 3, 4).
> > > > CONTEXT:  processing remote data for replication origin "pg_16392"
> > > > during message type "INSERT" for replication target relation
> > > > "public.conf_tab" in transaction 781, finished at 0/017FDDA0
> > > >
> > > > Currently in clt, we have singular terms such as 'key_tuple',
> > > > 'local_tuple', 'remote_tuple'.  Shall we have multiple rows inserted?
> > > > But it does not look reasonable to have multiple rows inserted for a
> > > > single conflict raised. I will think more about this.
> > >
> > > Currently I am inserting multiple records in the conflict history
> > > table, the same as each tuple is logged, but couldn't find any better
> > > way for this.
> > >
>
> The biggest drawback of this approach is data bloat. The incoming data
> row will be stored multiple times.
>
> > > Another option is to use an array of tuples instead of a
> > > single tuple but not sure this might make things more complicated to
> > > process by any external tool.
> >
> > It’s arguable and hard to say what the correct behaviour should be.
> > I’m slightly leaning toward having a single row per conflict.
> >
>
> Yeah, it is better to either have a single row per conflict or have
> two tables conflict_history and conflict_history_details to avoid data
> bloat as pointed above. For example, two-table approach could be:
>
> 1. The Header Table (Incoming Data)
> This stores the data that tried to be applied.
> SQL
> CREATE TABLE conflict_header (
>     conflict_id     SERIAL PRIMARY KEY,
>     source_tx_id    VARCHAR(100),    -- Transaction ID from source
>     table_name      VARCHAR(100),
>     operation       CHAR(1),         -- 'I' for Insert
>     incoming_data   JSONB,           -- Store the incoming row as JSON
> ...
> );
>
> 2. The Detail Table (Existing Conflicting Data)
> This stores the actual rows currently in the database that caused the
> violations.
> CREATE TABLE conflict_details (
>     detail_id       SERIAL PRIMARY KEY,
>     conflict_id     INT REFERENCES conflict_header(conflict_id),
>     constraint_name/key_tuple VARCHAR(100),
>     conflicting_row_data JSONB       -- The existing row in the DB
> that blocked the insert
> );
>
> Please don't consider these exact columns; you can use something on
> the lines of what is proposed in the patch. This is just to show how
> the conflict data can be rearranged. Now, one argument against this is
> that users need to use JOIN to query data but still better than
> bloating the table. The idea to store in a single table could be
> changed to have columns like violated_constraints TEXT[],      --
> e.g., ['uk_email', 'uk_phone'], error_details   JSONB  -- e.g.,
> [{"const": "uk_email", "val": "[email protected]"}, ...]. If we want to store
> multiple conflicting tuples in a single column, we need to ensure it
> is queryable via a JSONB column. The point in favour of a single JSONB
> column to combine multiple conflicting tuples is that we need this
> combination only for one kind of conflict.
>
> Both the approaches have their pros and cons. I feel we should dig a
> bit deeper for both by laying out details for each method and see what
> others think.


The specific scenario we are discussing is when a single row from the
publisher attempts to apply an operation that causes a conflict across
multiple unique keys, with each of those unique key violations
conflicting with a different local row on the subscriber, is very
rare.  IMHO this low-frequency scenario does not justify
overcomplicating the design with an array field or a multi-level
table.

Consider the infrequency of the root causes:
- How often does a table have more than 3 to 4 unique keys?
- How frequently would each of these keys conflict with a unique row
on the subscriber side?

If resolving this occasional, synthetic conflict requires inserting
two or three rows instead of a single one, this is an acceptable
trade-off considering how rare it can occur.  Anyway this is my
opinion and I am open to opinions from others.

-- 
Regards,
Dilip Kumar
Google

Re: Proposal: Conflict log history table for Logical Replication

Reply via email to