Hi, bigger B+ Tree = more operations to find a value. User may expect that cache having 20 entries (e.g. dictionary) will have great performance on get() and put().
But instead (if the 1 global cache group became default), such caches will take the same amount of time as the huge cache with millions of records. ср, 4 окт. 2017 г. в 8:39, Vladimir Ozerov <voze...@gridgain.com>: > I do not think that bigger B+Tree matter much. I was talking only about > data blocks. When you have a lot of logical caches, all of them are mixed > in the same data blocks. As a result you typically have to perform more IO > operations to read the same amount of data, as data block content becomes > more "chaotic". > > Currently all scans go through primary index. > > On Wed, Oct 4, 2017 at 12:24 AM, Denis Magda <dma...@apache.org> wrote: > > > Vladimir, > > > > Thanks for the explanation and see inline > > > > > On Oct 3, 2017, at 12:57 PM, Vladimir Ozerov <voze...@gridgain.com> > > wrote: > > > > > > Denis, > > > > > > This is not a "must have", neither I can name it a "feature". We have > > > internal partition state metadata. When there is a lot of caches, there > > is > > > a lot of metadata. It consumes local Java heap, causes high network > > traffic > > > on rebalance, and require Ignite to create a lot of files when > > persistence > > > is enabled, what slows down checkpoints. All these problems could be > > > resolved by better storage architecture and "joining" of partition maps > > of > > > caches with same affinity functions in runtime. > > > > > > But this is difficult, so we created "cache groups" as a kind of > > shortcut. > > > It saves heap, saves network, and reduces number of files. But it comes > > at > > > a cost - now single data page contain data from different caches. This > > > causes higher than usual miss rate (and as a result more OS calls) for > > > random cache operations and index lookups. > > > > Do you mean longer traverse of the b+tree under the "higher miss rate”? > > Has anybody measured the impact? Personally, for me log(n1) is not that > > different from log(n1 + n2 + n3) unless n is a big coefficient. > > > > > > > In future it will also cause > > > poor compression rates when compression is implemented, and it will > cause > > > poor scan performance when efficient scans are implemented. > > > > > > > How do we scan grouped caches presently? Simply filtering out the entries > > not belonging to a cache of interest? > > > > > To summarize, we *SHOULD NOT* advise users to use this feature unless > > they > > > have problems with high heap usage due to partition maps, or poor > > > chekpointing performance due to excessive fsyncs. > > > > > > > Ivan R., Alex G., could you comment on the checkpointing performance? I > > don’t get why a number of opened files affects it. What should matter is > > the frequency of fsync, shouldn’t it? If we have fewer files then the > > frequency will soar since every cache writes into a single destination. > > > > Vladimir, what’s about long joining process and rebalancing kick-off on > > node failure? I heard an amount of partition maps influences on this and > > put this on paper. > > > > — > > Denis > > > > > On Tue, Oct 3, 2017 at 10:48 PM, Denis Magda <dma...@apache.org> > wrote: > > > > > >> Vladimir, > > >> > > >> Please share more details that I can put on the paper. Presently the > > >> feature is described as a must have and I struggled finding any > negative > > >> impact related info. > > >> > > >> — > > >> Denis > > >> > > >>> On Oct 3, 2017, at 12:46 PM, Vladimir Ozerov <voze...@gridgain.com> > > >> wrote: > > >>> > > >>> Denis, > > >>> > > >>> This feature should not be enabled by default as it negatively > affects > > >> read > > >>> performance. > > >>> > > >>> On Tue, Oct 3, 2017 at 10:31 PM, Denis Magda <dma...@apache.org> > > wrote: > > >>> > > >>>> Sam, > > >>>> > > >>>> Is there any technical limitation that prevents us from assigning > > caches > > >>>> with similar parameters to relevant groups on-the-fly? > > >>>> > > >>>> After finishing the doc, I’m convinced the feature should be enabled > > by > > >>>> default unless there are some pitfalls not known by me. > > >>>> > > >>>> BTW, decided to avoid logical caches term usage falling back to > vivid > > >>>> cache groups notion: > > >>>> https://apacheignite.readme.io/docs/cache-groups < > > >>>> https://apacheignite.readme.io/docs/cache-groups> > > >>>> > > >>>> — > > >>>> Denis > > >>>> > > >>>>> On Oct 3, 2017, at 12:10 AM, Semyon Boikov <sboi...@gridgain.com> > > >> wrote: > > >>>>> > > >>>>> Hi, > > >>>>> > > >>>>> Regarding question about default cache group: by default cache > > groups > > >>>> are > > >>>>> not enabled, each cache is started in separate group. Cache group > is > > >>>>> enabled only if groupName is set in CacheConfiguration. > > >>>>> > > >>>>> Thanks > > >>>>> > > >>>>> On Sat, Sep 30, 2017 at 11:55 PM, <dsetrak...@apache.org> wrote: > > >>>>> > > >>>>>> Why not? Obviously compression would have to be enabled per group, > > not > > >>>> per > > >>>>>> cache. > > >>>>>> > > >>>>>> D. > > >>>>>> > > >>>>>> On Sep 29, 2017, 10:50 PM, at 10:50 PM, Vladimir Ozerov < > > >>>>>> voze...@gridgain.com> wrote: > > >>>>>>> And it will continue hitting us in future. For example, when data > > >>>>>>> compression is implemented, for logical caches compression rate > > will > > >> be > > >>>>>>> poor, as it would be impossbile to build efficient dictionaries > in > > >>>>>>> mixed > > >>>>>>> data pages. > > >>>>>>> > > >>>>>>> On Sat, Sep 30, 2017 at 8:48 AM, Vladimir Ozerov < > > >> voze...@gridgain.com > > >>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Folks, > > >>>>>>>> > > >>>>>>>> Honesly, to me logical caches appears to be a dirty shortcut to > > >>>>>>> mitigate > > >>>>>>>> some inefficient internal implementation. Why can't we merge > > >>>>>>> partition maps > > >>>>>>>> in runtime? This should not be a problem for context-independent > > >>>>>>> affinity > > >>>>>>>> functions (e.g. RendezvousAffinityFunction). From user > perspective > > >>>>>>> logic > > >>>>>>>> caches feature is: > > >>>>>>>> 1) Bad API. One cannot define group configuration. All you can > do > > is > > >>>>>>> to > > >>>>>>>> define group name on cache lavel and hope that nobody started > > >> another > > >>>>>>> cache > > >>>>>>>> in the same group with different configuration before. > > >>>>>>>> 2) Performance impact for scans, as you have to iterate over > mixed > > >>>>>>> data. > > >>>>>>>> > > >>>>>>>> Couldn't we fix partition map problem without cache groups? > > >>>>>>>> > > >>>>>>>> On Sat, Sep 30, 2017 at 2:35 AM, Denis Magda <dma...@apache.org > > > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Guys, > > >>>>>>>>> > > >>>>>>>>> Another question. Does this capability enabled by default? If > > yes, > > >>>>>>> how do > > >>>>>>>>> we decide which group a cache goes to? > > >>>>>>>>> > > >>>>>>>>> — > > >>>>>>>>> Denis > > >>>>>>>>> > > >>>>>>>>>> On Sep 29, 2017, at 3:58 PM, Denis Magda <dma...@apache.org> > > >>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>> Igniters, > > >>>>>>>>>> > > >>>>>>>>>> I’ve put on paper the feature from the subj: > > >>>>>>>>>> https://apacheignite.readme.io/docs/logical-caches < > > >>>>>>>>> https://apacheignite.readme.io/docs/logical-caches> > > >>>>>>>>>> > > >>>>>>>>>> Sam, will appreciate if you read through it and confirm I > > >>>>>>> explained the > > >>>>>>>>> topic 100% technically correct. > > >>>>>>>>>> > > >>>>>>>>>> However, are there any negative impacts of having logical > > caches? > > >>>>>>> This > > >>>>>>>>> page has “Possible Impacts” section unfilled: > > >>>>>>>>>> https://cwiki.apache.org/confluence/display/IGNITE/ > > Logical+Caches > > >>>>>>> < > > >>>>>>>>> https://cwiki.apache.org/confluence/display/IGNITE/ > > Logical+Caches> > > >>>>>>>>>> > > >>>>>>>>>> — > > >>>>>>>>>> Denis > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>> > > >>>> > > >> > > >> > > > > >