Hi Andrey,

Thanks for a detailed email. I think your suggestions do make sense. Ignite
cannot afford to have a distributed set that is not fail-safe. Can you
please focus only on solutions that provide consistent behavior in case of
topology changes and failures and document them in the ticket?

https://issues.apache.org/jira/browse/IGNITE-5553

D.

On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov <stku...@gmail.com> wrote:

> Hi, Igniters!
>
> Current implementation of IgniteSet is fragile with respect to cluster
> recovery from a checkpoint. We have an issue (IGNITE-5553) that addresses
> set's size() behavior, but the problem is slightly broader. The text below
> is my comment from Jira issue. I encourage you to discuss it.
>
> We can put current set size into set header cache entry. This will fix
> size(), but we have broken iterator() implementation as well.
>
> Currently, set implementation maintains plain Java sets on every node, see
> CacheDataStructuresManager.setDataMap. These sets duplicate backing-cache
> entries, both primary and backup. size() and iterator() calls issue
> distributed queries to collect/filter data from all setDataMap's. And
> setDataMaps remain empty after cluster is recovered from checkpoint.
>
> Now I see the following options to fix the issue.
>
> #1 - Naive. Iterate over all datastructure-backing caches entries during
> recover from checkpoint procedure, filter set-related entries and refill
> setDataMap's.
> Pros: easy to implement
> Cons: inpredictable time/memory overhead.
>
> #2 - More realistic. Avoid node-local copies of cache data. Maintain linked
> list in datastructure-backing cache: key is set item, value is next set
> item. List head is stored in set header cache entry (this set item is
> youngest one). Iterators build on top of this structure are fail-fast.
> Pros: less memory overhead, no need to maintain node-local mirrors of cache
> data
> Cons: iterators are not fail-safe.
>
> #3 - Option #2 modified. We can store reference counter and 'removed' flag
> along with next item reference. This allows to make iterators fail safe.
> Pros: iterators are fail-safe
> Cons: slightly more complicated implementation, may affect performance,
> also I see no way to handle active iterators on remote nodes failures.
>
>
> Best regards,
>
> Andrey.
>

Reply via email to