On Thu, Jul 11, 2024 at 7:47 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Wednesday, July 10, 2024 5:39 PM shveta malik <shveta.ma...@gmail.com> > wrote: > > > > On Wed, Jul 3, 2024 at 8:31 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> > > wrote: > > > > > > On Wednesday, June 26, 2024 10:58 AM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> wrote: > > > > > > > > > > Hi, > > > > > > As suggested by Sawada-san in another thread[1]. > > > > > > I am attaching the V4 patch set which tracks the delete_differ > > > conflict in logical replication. > > > > > > delete_differ means that the replicated DELETE is deleting a row that > > > was modified by a different origin. > > > > > > > Thanks for the patch. I am still in process of review but please find few > > comments: > > Thanks for the comments! > > > 1) When I try to *insert* primary/unique key on pub, which already exists on > > sub, conflict gets detected. But when I try to *update* primary/unique key > > to a > > value on pub which already exists on sub, conflict is not detected. I get > > the > > error: > > > > 2024-07-10 14:21:09.976 IST [647678] ERROR: duplicate key value violates > > unique constraint "t1_pkey" > > 2024-07-10 14:21:09.976 IST [647678] DETAIL: Key (pk)=(4) already exists. > > Yes, I think the detection of this conflict is not added with the > intention to control the size of the patch in the first version. > > > > > This is because such conflict detection needs detection of constraint > > violation > > using the *new value* rather than *existing* value during UPDATE. INSERT > > conflict detection takes care of this case i.e. the columns of incoming row > > are > > considered as new values and it tries to see if all unique indexes are okay > > to > > digest such new values (all incoming columns) but update's logic is > > different. > > It searches based on oldTuple *only* and thus above detection is missing. > > I think the logic is the same if we want to detect the unique violation > for UDPATE, we need to check if the new value of the UPDATE violates any > unique constraints same as the detection of insert_exists (e.g. check > the conflict around ExecInsertIndexTuples()) > > > > > Shall we support such detection? If not, is it worth docuementing? > > I am personally OK to support this detection.
+1. I think it should not be a complex or too big change. > And > I think it's already documented that we only detect unique violation for > insert which mean update conflict is not detected. > > > 2) > > Another case which might confuse user: > > > > CREATE TABLE t1 (pk integer primary key, val1 integer, val2 integer); > > > > On PUB: insert into t1 values(1,10,10); insert into t1 values(2,20,20); > > > > On SUB: update t1 set pk=3 where pk=2; > > > > Data on PUB: {1,10,10}, {2,20,20} > > Data on SUB: {1,10,10}, {3,20,20} > > > > Now on PUB: update t1 set val1=200 where val1=20; > > > > On Sub, I get this: > > 2024-07-10 14:44:00.160 IST [648287] LOG: conflict update_missing detected > > on relation "public.t1" > > 2024-07-10 14:44:00.160 IST [648287] DETAIL: Did not find the row to be > > updated. > > 2024-07-10 14:44:00.160 IST [648287] CONTEXT: processing remote data for > > replication origin "pg_16389" during message type "UPDATE" for replication > > target relation "public.t1" in transaction 760, finished at 0/156D658 > > > > To user, it could be quite confusing, as val1=20 exists on sub but still he > > gets > > update_missing conflict and the 'DETAIL' is not sufficient to give the > > clarity. I > > think on HEAD as well (have not tested), we will get same behavior i.e. > > update > > will be ignored as we make search based on RI (pk in this case). So we are > > not > > worsening the situation, but now since we are detecting conflict, is it > > possible > > to give better details in 'DETAIL' section indicating what is actually > > missing? > > I think It's doable to report the row value that cannot be found in the local > relation, but the concern is the potential risk of exposing some > sensitive data in the log. This may be OK, as we are already reporting the > key value for constraints violation, so if others also agree, we can add > the row value in the DETAIL as well. Okay, let's see what others say. JFYI, the same situation holds valid for delete_missing case. I have one concern about how we deal with conflicts. As for insert_exists, we keep on erroring out while raising conflict, until it is manually resolved: ERROR: conflict insert_exists detected But for other cases, we just log conflict and either skip or apply the operation. I LOG: conflict update_differ detected DETAIL: Updating a row that was modified by a different origin I know that it is no different than HEAD. But now since we are logging conflicts explicitly, we should call out default behavior on each conflict. I see some incomplete and scattered info in '31.5. Conflicts' section saying that: "When replicating UPDATE or DELETE operations, missing data will not produce a conflict and such operations will simply be skipped." (lets say it as pt a) Also some more info in a later section saying (pt b): :A conflict will produce an error and will stop the replication; it must be resolved manually by the user." My suggestions: 1) in point a above, shall we have: missing data or differing data (i.e. somehow reword to accommodate update_differ and delete_differ cases) 2) Now since we have a section explaining conflicts detected and logged with detect_conflict=true, shall we mention default behaviour with each? insert_exists: error will be raised until resolved manually. update_differ: update will be applied update_missing: update will be skipped delete_missing: delete will be skipped delete_differ: delete will be applied. thanks Shveta