On Wed, Feb 7, 2018 at 6:00 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Feb 7, 2018 at 3:42 PM, amul sul <sula...@gmail.com> wrote: >> On Wed, Feb 7, 2018 at 3:03 PM, Amit Khandekar <amitdkhan...@gmail.com> >> wrote: >>> On 7 February 2018 at 13:53, amul sul <sula...@gmail.com> wrote: >>>> Hi, >>>> >>>> If an update of partition key involves tuple movement from one partition to >>>> another partition then there will be a separate delete on one partition and >>>> insert on the other partition made. >>>> >>>> In the logical replication if an update performed on the master and >>>> standby at >>>> the same moment, then replication worker tries to replicate delete + insert >>>> operation on standby. While replying master changes on standby for the >>>> delete >>>> operation worker will log "concurrent update, retrying" message (because >>>> the >>>> update on standby has already deleted) and move forward to reply the next >>>> insert operation. Standby update also did the same delete+insert is as >>>> part of >>>> the update of partition key in a result there will be two records inserted >>>> on >>>> standby. >>> >>> A quick thinking on how to resolve this makes me wonder if we can >>> manage to pass some information through logical decoding that the >>> delete is part of a partition key update. This is analogous to how we >>> set some information locally in the tuple by setting >>> tp.t_data->t_ctid.ip_blkid to InvalidBlockNumber. >>> >> >> +1, >> > > I also mentioned the same thing in the other thread [1], but I think > that alone won't solve the dual record problem as you are seeing. I > think we need to do something for next insert as you are suggesting. > >> also if worker failed to reply delete operation on standby then >> we need to decide what will be the next step, should we skip follow >> insert operation or error out or something else. >> > > That would be tricky, do you see any simple way of doing either of those. >
Not really, like ExecUpdate for an update of partition key if delete is failed then the further insert will be skipped, but you are correct, it might be more tricky than I can think -- there is no guarantee that the next insert operation which replication worker trying to replicate is part of the update of partition key mechanism. How can one identify that an insert operation on one relation is related to previously deleting operation on some other relation? Regards, Amul