On Wed, Jun 12, 2024 at 10:03 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Tue, Jun 11, 2024 at 7:44 PM Tomas Vondra > <tomas.von...@enterprisedb.com> wrote: > > > > Yes, that's correct. However, many cases could benefit from the > > > update_deleted conflict type if it can be implemented reliably. That's > > > why we wanted to give it a try. But if we can't achieve predictable > > > results with it, I'm fine to drop this approach and conflict_type. We > > > can consider a better design in the future that doesn't depend on > > > non-vacuumed entries and provides a more robust method for identifying > > > deleted rows. > > > > > > > I agree having a separate update_deleted conflict would be beneficial, > > I'm not arguing against that - my point is actually that I think this > > conflict type is required, and that it needs to be detected reliably. > > > > When working with a distributed system, we must accept some form of > eventual consistency model. However, it's essential to design a > predictable and acceptable behavior. For example, if a change is a > result of a previous operation (such as an update on node B triggered > after observing an operation on node A), we can say that the operation > on node A happened before the operation on node B. Conversely, if > operations on nodes A and B are independent, we consider them > concurrent. > > In distributed systems, clock skew is a known issue. To establish a > consistency model, we need to ensure it guarantees the > "happens-before" relationship. Consider a scenario with three nodes: > NodeA, NodeB, and NodeC. If NodeA sends changes to NodeB, and > subsequently NodeB makes changes, and then both NodeA's and NodeB's > changes are sent to NodeC, the clock skew might make NodeB's changes > appear to have occurred before NodeA's changes. However, we should > maintain data that indicates NodeB's changes were triggered after > NodeA's changes arrived at NodeB. This implies that logically, NodeB's > changes happened after NodeA's changes, despite what the timestamps > suggest. > > A common method to handle such cases is using vector clocks for > conflict resolution. >
I think the unbounded size of the vector could be a problem to store for each event. However, while researching previous discussions, it came to our notice that we have discussed this topic in the past as well in the context of standbys. For recovery_min_apply_delay, we decided the clock skew is not a problem as the settings of this parameter are much larger than typical time deviations between servers as mentioned in docs. Similarly for casual reads [1], there was a proposal to introduce max_clock_skew parameter and suggesting the user to make sure to have NTP set up correctly. We have tried to check other databases (like Ora and BDR) where CDR is implemented but didn't find anything specific to clock skew. So, I propose to go with a GUC like max_clock_skew such that if the difference of time between the incoming transaction's commit time and the local time is more than max_clock_skew then we raise an ERROR. It is not clear to me that putting bigger effort into clock skew is worth especially when other systems providing CDR feature (like Ora or BDR) for decades have not done anything like vector clocks. It is possible that this is less of a problem w.r.t CDR and just detecting the anomaly in clock skew is good enough. [1] - https://www.postgresql.org/message-id/flat/CAEepm%3D1iiEzCVLD%3DRoBgtZSyEY1CR-Et7fRc9prCZ9MuTz3pWg%40mail.gmail.com -- With Regards, Amit Kapila.