Igniters, We recently experienced some issues with TTL with enabled persistence, the issues were related to persistence implementation details. However, when we were adding tests to cover more cases, we found more failures, which, I think, reveal some fundamental issues with expire mechanism.
In short, the root cause of the issue is that we expire entries on primary and backup nodes independently, which means: 1) Partition sizes may have different values at different times which will trigger false-negative checks on partition map exchange which was recently added 2) More importantly, this may lead to inconsistent primary and backup node values when EntryProcessor is used, because an entry processor may observe a non-null value on one node and a null value on another node. In my opinion, the second issue is critical and we must change the expiry mechanics to run expiry in a distributed mode, with cache mode semantics for entry remove. Thoughts?