Igniters,

We recently experienced some issues with TTL with enabled persistence, the
issues were related to persistence implementation details. However, when we
were adding tests to cover more cases, we found more failures, which, I
think, reveal some fundamental issues with expire mechanism.

In short, the root cause of the issue is that we expire entries on primary
and backup nodes independently, which means:
1) Partition sizes may have different values at different times which will
trigger false-negative checks on partition map exchange which was recently
added
2) More importantly, this may lead to inconsistent primary and backup node
values when EntryProcessor is used, because an entry processor may observe
a non-null value on one node and a null value on another node.

In my opinion, the second issue is critical and we must change the expiry
mechanics to run expiry in a distributed mode, with cache mode semantics
for entry remove.

Thoughts?

Reply via email to