Igniters,

I have recently discovered [1] that Ignite can arrive in a state when an
optimistic serializable transaction can never be successfully committed
from a backup node [2].

In short, the root cause of this issue is that there are configurations
that allow a key to be stored on primary and backup nodes with different
versions. This is a fundamental design choice that made a while ago,
however, I am not sure if this is a right way to go. When primary and
backup versions differ and read load balancing is enabled, the read version
will always mismatch with primary version and optimistic serializable
transaction will always fail.

Here I wanted to discuss both short-term mitigation plan for the issue [2]
as well as a longer-term changes to replication protocol.

As a short-term solution for [2] I suggest to force reads from a primary
node inside optimistic serializable transactions. The question is whether
to enforce this behavior only if the cache has a 3-rd party persistence
storage or this behavior should be always enforced. Note that the version
mismatch may appear even without a 3-rd party persistence storage when an
expiry policy is used. However, in this case, the version mismatch is
time-bound to the TTL cleanup lag. Personally, I would go with always
enforcing primary-node reads inside an optimistic serializable transaction.

As a long-term solution which would eliminate the possibility of versions
desync on primary and backup nodes, I would suggest to revisit the
read-through and TTL expiry semantics. It looks like quite a lot of users
are actually struggling with the current implementation of read-through
because a miss does not load the value to all partition nodes [3]. As for
TTL, I remember it clearing up entries locally was a big issue for a proper
MVCC rebalance implementation (we ended up prohibiting TTL for MVCC caches).
I think it may be better to make read-through and entry expiry a
partition-wide operation with the underlying cache guarantees. For
read-through it is justified because a partition-wide operation penalty is
comparable with the cache store load anyway (otherwise, a 3rd party storage
makes little sense). For entries expiration it should not make any
difference because it happens in background anyways.

Any thoughts on the subject are very much appreciated.

--AG

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Fwd-NodeOrder-in-GridCacheVersion-td46108.html
[2] https://issues.apache.org/jira/browse/IGNITE-12739
[3]
http://apache-ignite-developers.2346864.n4.nabble.com/Re-Read-through-not-working-as-expected-in-case-of-Replicated-cache-td46083.html

Reply via email to