Pavel, I'm a bit confused. In my understanding, issue exists because we have local in-memory maps which are used as the main source of truth about which structures currently exist. During restart, we lose all this data even if data structures cache(s) are persisted. Once we fix this, issue goes away, regardless of weather we store data structure per cache or everything in single cache. Am I missing something?
I also agree with Dmitry. While cache per set approach can make sense for non-collocated sets, for collocated ones it definitely doesn't. So I would fix the original issue first, and then change the architecture if it's really needed. -Val On Fri, Feb 9, 2018 at 10:39 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Hi Pavel, > > We have 2 types of data structures, collocated and non-collocated. The > difference between them is that the collocated set is generally smaller and > will always end up on the same node. Users generally will have many > colllocated sets. On the other hand, a non-collocated set can span multiple > nodes and therefore is able to store a lot more data. > > I can see how cache-per-set strategy can be applied to the non-collocated > set. As a matter of fact, I would be surprised if it is not implemented > that way already. > > However, I do not see this strategy applied to the collocated sets. Users > can have 1000s of collocated sets or more. Are you suggesting that this > will translate into 1000s of caches? > > D. > > On Fri, Feb 9, 2018 at 8:10 AM, Pavel Pereslegin <xxt...@gmail.com> wrote: > > > Hello, Valentin. > > > > Thank you for the reply. > > > > As mentioned in this conversation, for now we have at least two issues > > with IgniteSet: > > 1. Incorrect behavior after recovery from PDS [1]. > > 2. The data in the cache is duplicated on-heap [2], which is not > > documented and lead to heap/GC overhead when using large Sets. > > > > Without significant changes, it is possible to solve [1] with the > > workaround, proposed by Andrey Kuznetsov - iterate over all > > datastructure-backing caches entries during recover from checkpoint > > procedure, filter set-related entries and refill setDataMap's. > > As a workaround for [2] we can add configuration option which data > > structure to use for "local caching" (on-heap or off-heap). > > If we go this way then cache data duplication will remain and some > > kind of off-heap ConcurrentHashMap should be implemented in Ignite > > (probably, already exists in some form, need to investigate this topic > > properly). > > > > On the other hand, if we use separate cache for each IgniteSet instance: > > 1. It will be not necessary to maintain redundant data stored > > somewhere other than the cache. > > 2. It will be not necessary to implement workaround for recovery from > PDS. > > For the collocated mode we can, for example, enforce REPLICATED cache > mode. > > > > Why don't you like the idea with separate cache? > > > > [1] https://issues.apache.org/jira/browse/IGNITE-7565 > > [2] https://issues.apache.org/jira/browse/IGNITE-5553 > > > > > > 2018-02-09 0:44 GMT+03:00 Valentin Kulichenko < > > valentin.kuliche...@gmail.com>: > > > Pavel, > > > > > > I don't like an idea of creating separate cache for each data > structure, > > > especially for collocated ones. Can actually, I'm not sure I understand > > how > > > that would help. It sounds like that we just need to properly persist > the > > > data structures cache and then reload on restart. > > > > > > -Val > > > > > > On Thu, Feb 8, 2018 at 6:12 AM, Pavel Pereslegin <xxt...@gmail.com> > > wrote: > > > > > >> Hello, Igniters! > > >> > > >> We have some issues with current IgniteSet implementation ([1], [2], > > [3], > > >> [4]). > > >> > > >> As was already described in this conversation, the main problem is > > >> that current IgniteSet implementation maintains plain Java sets on > > >> every node (see CacheDataStructuresManager.setDataMap). These sets > > >> duplicate backing-cache entries, both primary and backup. size() and > > >> iterator() calls issue distributed queries to collect/filter data from > > >> all setDataMap's. > > >> > > >> I believe we can solve specified issues if each instance of IgniteSet > > >> will have separate internal cache that will be destroyed on close. > > >> > > >> What do you think about such major change? Do you have any thoughts or > > >> objections? > > >> > > >> [1] https://issues.apache.org/jira/browse/IGNITE-7565 > > >> [2] https://issues.apache.org/jira/browse/IGNITE-5370 > > >> [3] https://issues.apache.org/jira/browse/IGNITE-5553 > > >> [4] https://issues.apache.org/jira/browse/IGNITE-6474 > > >> > > >> > > >> 2017-10-31 5:53 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > >> > Hi Andrey, > > >> > > > >> > Thanks for a detailed email. I think your suggestions do make sense. > > >> Ignite > > >> > cannot afford to have a distributed set that is not fail-safe. Can > you > > >> > please focus only on solutions that provide consistent behavior in > > case > > >> of > > >> > topology changes and failures and document them in the ticket? > > >> > > > >> > https://issues.apache.org/jira/browse/IGNITE-5553 > > >> > > > >> > D. > > >> > > > >> > On Mon, Oct 30, 2017 at 3:07 AM, Andrey Kuznetsov < > stku...@gmail.com> > > >> wrote: > > >> > > > >> >> Hi, Igniters! > > >> >> > > >> >> Current implementation of IgniteSet is fragile with respect to > > cluster > > >> >> recovery from a checkpoint. We have an issue (IGNITE-5553) that > > >> addresses > > >> >> set's size() behavior, but the problem is slightly broader. The > text > > >> below > > >> >> is my comment from Jira issue. I encourage you to discuss it. > > >> >> > > >> >> We can put current set size into set header cache entry. This will > > fix > > >> >> size(), but we have broken iterator() implementation as well. > > >> >> > > >> >> Currently, set implementation maintains plain Java sets on every > > node, > > >> see > > >> >> CacheDataStructuresManager.setDataMap. These sets duplicate > > >> backing-cache > > >> >> entries, both primary and backup. size() and iterator() calls issue > > >> >> distributed queries to collect/filter data from all setDataMap's. > And > > >> >> setDataMaps remain empty after cluster is recovered from > checkpoint. > > >> >> > > >> >> Now I see the following options to fix the issue. > > >> >> > > >> >> #1 - Naive. Iterate over all datastructure-backing caches entries > > during > > >> >> recover from checkpoint procedure, filter set-related entries and > > refill > > >> >> setDataMap's. > > >> >> Pros: easy to implement > > >> >> Cons: inpredictable time/memory overhead. > > >> >> > > >> >> #2 - More realistic. Avoid node-local copies of cache data. > Maintain > > >> linked > > >> >> list in datastructure-backing cache: key is set item, value is next > > set > > >> >> item. List head is stored in set header cache entry (this set item > is > > >> >> youngest one). Iterators build on top of this structure are > > fail-fast. > > >> >> Pros: less memory overhead, no need to maintain node-local mirrors > of > > >> cache > > >> >> data > > >> >> Cons: iterators are not fail-safe. > > >> >> > > >> >> #3 - Option #2 modified. We can store reference counter and > 'removed' > > >> flag > > >> >> along with next item reference. This allows to make iterators fail > > safe. > > >> >> Pros: iterators are fail-safe > > >> >> Cons: slightly more complicated implementation, may affect > > performance, > > >> >> also I see no way to handle active iterators on remote nodes > > failures. > > >> >> > > >> >> > > >> >> Best regards, > > >> >> > > >> >> Andrey. > > >> >> > > >> > > >