On Tue, Jun 4, 2024 at 9:37 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > > > > > > > > > Conflict Resolution > > > > ---------------- > > > > a) latest_timestamp_wins: The change with later commit timestamp > > > > wins. > > > > b) earliest_timestamp_wins: The change with earlier commit timestamp > > > > wins. > > Can you share the use case of "earliest_timestamp_wins" resolution > method? It seems after the initial update on the local node, it will > never allow remote update to succeed which sounds a bit odd. Jan has > shared this and similar concerns about this resolution method, so I > have added him to the email as well.
I do not have the exact scenario for this. But I feel, if 2 nodes are concurrently inserting different data against a primary key, then some users may have preferences that retain the row which was inserted earlier. It is no different from latest_timestamp_wins. It totally depends upon what kind of application and requirement the user may have, based on which, he may discard the later coming rows (specially for INSERT case). > > > > Conflict Types: > > > > ---------------- > > > > a) update_differ: The origin of an incoming update's key row differs > > > > from the local row i.e.; the row has already been updated locally or > > > > by different nodes. > > > > b) update_missing: The row with the same value as that incoming > > > > update's key does not exist. Remote is trying to update a row which > > > > does not exist locally. > > > > c) update_deleted: The row with the same value as that incoming > > > > update's key does not exist. The row is already deleted. This conflict > > > > type is generated only if the deleted row is still detectable i.e., it > > > > is not removed by VACUUM yet. If the row is removed by VACUUM already, > > > > it cannot detect this conflict. It will detect it as update_missing > > > > and will follow the default or configured resolver of update_missing > > > > itself. > > > > > > > > > > I don't understand the why should update_missing or update_deleted be > > > different, especially considering it's not detected reliably. And also > > > that even if we happen to find the row the associated TOAST data may > > > have already been removed. So why would this matter? > > > > Here, we are trying to tackle the case where the row is 'recently' > > deleted i.e. concurrent UPDATE and DELETE on pub and sub. User may > > want to opt for a different resolution in such a case as against the > > one where the corresponding row was not even present in the first > > place. The case where the row was deleted long back may not fall into > > this category as there are higher chances that they have been removed > > by vacuum and can be considered equivalent to the update_ missing > > case. > > > > I think to make 'update_deleted' work, we need another scan with a > different snapshot type to find the recently deleted row. I don't know > if it is a good idea to scan the index twice with different snapshots, > so for the sake of simplicity, can we consider 'updated_deleted' same > as 'update_missing'? If we think it is an important case to consider > then we can try to accomplish this once we finalize the > design/implementation of other resolution methods. I think it is important for scenarios when data is being updated and deleted concurrently. But yes, I agree that implementation may have some performance hit for this case. We can tackle this scenario at a later stage. > > > > > > > > To implement the above, subscription commands will be changed to have > > > > one more parameter 'conflict_resolution=on/off', default will be OFF. > > > > > > > > To configure global resolvers, new DDL command will be introduced: > > > > > > > > CONFLICT RESOLVER ON <conflict_type> IS <conflict_resolver> > > > > > > > > > > I very much doubt we want a single global conflict resolver, or even one > > > resolver per subscription. It seems like a very table-specific thing. > > > > +1 to make it a table-level configuration but we probably need > something at the global level as well such that by default if users > don't define anything at table-level global-level configuration will > be used. > > > > > > > > > Also, doesn't all this whole design ignore the concurrency between > > > publishers? Isn't this problematic considering the commit timestamps may > > > go backwards (for a given publisher), which means the conflict > > > resolution is not deterministic (as it depends on how exactly it > > > interleaves)? > > > > > I am not able to imagine the cases you are worried about. Can you > please be specific? Is it similar to the case I described in > yesterday's email [1]? > > [1] - > https://www.postgresql.org/message-id/CAA4eK1JTMiBOoGqkt%3DaLPLU8Rs45ihbLhXaGHsz8XC76%2BOG3%2BQ%40mail.gmail.com > thanks Shveta