Alexey, >> In short, the root cause of this issue is that there are configurations >> that allow a key to be stored on primary and backup nodes with different >> versions. Faced with the same problem during ReadRepair development.
>> I suggest to force reads from a primary >> node inside optimistic serializable transactions. It looks like a proper fix (read-from-backup= ... && !read-through). >> I would suggest to revisit the >> read-through and TTL expiry semantics. Do we really need these features? - we have great full-featured consistent persistence, what's the point to use limited-featured inconsistent persistence via the external database? Can we get rid of this feature at 3.0? - Expiry policy is expensive (slowdown the cluster) and does not guarantee the in-time removal, and always may be replaced by proper design (state machine, query, eviction, in-memory cluster restart, etc). On Thu, Mar 5, 2020 at 12:33 PM Alexey Goncharuk <alexey.goncha...@gmail.com> wrote: > Igniters, > > I have recently discovered [1] that Ignite can arrive in a state when an > optimistic serializable transaction can never be successfully committed > from a backup node [2]. > > In short, the root cause of this issue is that there are configurations > that allow a key to be stored on primary and backup nodes with different > versions. This is a fundamental design choice that made a while ago, > however, I am not sure if this is a right way to go. When primary and > backup versions differ and read load balancing is enabled, the read version > will always mismatch with primary version and optimistic serializable > transaction will always fail. > > Here I wanted to discuss both short-term mitigation plan for the issue [2] > as well as a longer-term changes to replication protocol. > > As a short-term solution for [2] I suggest to force reads from a primary > node inside optimistic serializable transactions. The question is whether > to enforce this behavior only if the cache has a 3-rd party persistence > storage or this behavior should be always enforced. Note that the version > mismatch may appear even without a 3-rd party persistence storage when an > expiry policy is used. However, in this case, the version mismatch is > time-bound to the TTL cleanup lag. Personally, I would go with always > enforcing primary-node reads inside an optimistic serializable transaction. > > As a long-term solution which would eliminate the possibility of versions > desync on primary and backup nodes, I would suggest to revisit the > read-through and TTL expiry semantics. It looks like quite a lot of users > are actually struggling with the current implementation of read-through > because a miss does not load the value to all partition nodes [3]. As for > TTL, I remember it clearing up entries locally was a big issue for a proper > MVCC rebalance implementation (we ended up prohibiting TTL for MVCC > caches). > I think it may be better to make read-through and entry expiry a > partition-wide operation with the underlying cache guarantees. For > read-through it is justified because a partition-wide operation penalty is > comparable with the cache store load anyway (otherwise, a 3rd party storage > makes little sense). For entries expiration it should not make any > difference because it happens in background anyways. > > Any thoughts on the subject are very much appreciated. > > --AG > > [1] > > http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html > [2] https://issues.apache.org/jira/browse/IGNITE-12739 > [3] > > http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html >