Hi, Igniters! Current implementation of IgniteSet is fragile with respect to cluster recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses set's size() behavior, but the problem is slightly broader. The text below is my comment from Jira issue. I encourage you to discuss it.
We can put current set size into set header cache entry. This will fix size(), but we have broken iterator() implementation as well. Currently, set implementation maintains plain Java sets on every node, see CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache entries, both primary and backup. size() and iterator() calls issue distributed queries to collect/filter data from all setDataMap's. And setDataMaps remain empty after cluster is recovered from checkpoint. Now I see the following options to fix the issue. #1 - Naive. Iterate over all datastructure-backing caches entries during recover from checkpoint procedure, filter set-related entries and refill setDataMap's. Pros: easy to implement Cons: inpredictable time/memory overhead. #2 - More realistic. Avoid node-local copies of cache data. Maintain linked list in datastructure-backing cache: key is set item, value is next set item. List head is stored in set header cache entry (this set item is youngest one). Iterators build on top of this structure are fail-fast. Pros: less memory overhead, no need to maintain node-local mirrors of cache data Cons: iterators are not fail-safe. #3 - Option #2 modified. We can store reference counter and 'removed' flag along with next item reference. This allows to make iterators fail safe. Pros: iterators are fail-safe Cons: slightly more complicated implementation, may affect performance, also I see no way to handle active iterators on remote nodes failures. Best regards, Andrey.