Re: Logical Cache Documented

Dmitry Pavlov Wed, 04 Oct 2017 06:02:56 -0700

Hi, bigger B+ Tree = more operations to find a value.

User may expect that cache having 20 entries (e.g. dictionary) will have
great performance on get() and put().


But instead (if the 1 global cache group became default), such caches will
take the same amount of time as the huge cache with millions of records.

ср, 4 окт. 2017 г. в 8:39, Vladimir Ozerov <voze...@gridgain.com>:

> I do not think that bigger B+Tree matter much. I was talking only about
> data blocks. When you have a lot of logical caches, all of them are mixed
> in the same data blocks. As a result you typically have to perform more IO
> operations to read the same amount of data, as data block content becomes
> more "chaotic".
>
> Currently all scans go through primary index.
>
> On Wed, Oct 4, 2017 at 12:24 AM, Denis Magda <dma...@apache.org> wrote:
>
> > Vladimir,
> >
> > Thanks for the explanation and see inline
> >
> > > On Oct 3, 2017, at 12:57 PM, Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> > >
> > > Denis,
> > >
> > > This is not a "must have", neither I can name it a "feature". We have
> > > internal partition state metadata. When there is a lot of caches, there
> > is
> > > a lot of metadata. It consumes local Java heap, causes high network
> > traffic
> > > on rebalance, and require Ignite to create a lot of files when
> > persistence
> > > is enabled, what slows down checkpoints. All these problems could be
> > > resolved by better storage architecture and "joining" of partition maps
> > of
> > > caches with same affinity functions in runtime.
> > >
> > > But this is difficult, so we created "cache groups" as a kind of
> > shortcut.
> > > It saves heap, saves network, and reduces number of files. But it comes
> > at
> > > a cost - now single data page contain data from different caches. This
> > > causes higher than usual miss rate (and as a result more OS calls) for
> > > random cache operations and index lookups.
> >
> > Do you mean longer traverse of the b+tree under the "higher miss rate”?
> > Has anybody measured the impact? Personally, for me log(n1) is not that
> > different from log(n1 + n2 + n3) unless n is a big coefficient.
> >
> >
> > > In future it will also cause
> > > poor compression rates when compression is implemented, and it will
> cause
> > > poor scan performance when efficient scans are implemented.
> > >
> >
> > How do we scan grouped caches presently? Simply filtering out the entries
> > not belonging to a cache of interest?
> >
> > > To summarize, we *SHOULD NOT* advise users to use this feature unless
> > they
> > > have problems with high heap usage due to partition maps, or poor
> > > chekpointing performance due to excessive fsyncs.
> > >
> >
> > Ivan R., Alex G., could you comment on the checkpointing performance? I
> > don’t get why a number of opened files affects it. What should matter is
> > the frequency of fsync, shouldn’t it? If we have fewer files then the
> > frequency will soar since every cache writes into a single destination.
> >
> > Vladimir, what’s about long joining process and rebalancing kick-off on
> > node failure? I heard an amount of partition maps influences on this and
> > put this on paper.
> >
> > —
> > Denis
> >
> > > On Tue, Oct 3, 2017 at 10:48 PM, Denis Magda <dma...@apache.org>
> wrote:
> > >
> > >> Vladimir,
> > >>
> > >> Please share more details that I can put on the paper. Presently the
> > >> feature is described as a must have and I struggled finding any
> negative
> > >> impact related info.
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Oct 3, 2017, at 12:46 PM, Vladimir Ozerov <voze...@gridgain.com>
> > >> wrote:
> > >>>
> > >>> Denis,
> > >>>
> > >>> This feature should not be enabled by default as it negatively
> affects
> > >> read
> > >>> performance.
> > >>>
> > >>> On Tue, Oct 3, 2017 at 10:31 PM, Denis Magda <dma...@apache.org>
> > wrote:
> > >>>
> > >>>> Sam,
> > >>>>
> > >>>> Is there any technical limitation that prevents us from assigning
> > caches
> > >>>> with similar parameters to relevant groups on-the-fly?
> > >>>>
> > >>>> After finishing the doc, I’m convinced the feature should be enabled
> > by
> > >>>> default unless there are some pitfalls not known by me.
> > >>>>
> > >>>> BTW, decided to avoid logical caches term usage falling back to
> vivid
> > >>>> cache groups notion:
> > >>>> https://apacheignite.readme.io/docs/cache-groups <
> > >>>> https://apacheignite.readme.io/docs/cache-groups>
> > >>>>
> > >>>> —
> > >>>> Denis
> > >>>>
> > >>>>> On Oct 3, 2017, at 12:10 AM, Semyon Boikov <sboi...@gridgain.com>
> > >> wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Regarding question about  default cache group: by default cache
> > groups
> > >>>> are
> > >>>>> not enabled, each cache is started in separate group. Cache group
> is
> > >>>>> enabled only if groupName is set in CacheConfiguration.
> > >>>>>
> > >>>>> Thanks
> > >>>>>
> > >>>>> On Sat, Sep 30, 2017 at 11:55 PM, <dsetrak...@apache.org> wrote:
> > >>>>>
> > >>>>>> Why not? Obviously compression would have to be enabled per group,
> > not
> > >>>> per
> > >>>>>> cache.
> > >>>>>>
> > >>>>>> ⁣D.
> > >>>>>>
> > >>>>>> On Sep 29, 2017, 10:50 PM, at 10:50 PM, Vladimir Ozerov <
> > >>>>>> voze...@gridgain.com> wrote:
> > >>>>>>> And it will continue hitting us in future. For example, when data
> > >>>>>>> compression is implemented, for logical caches compression rate
> > will
> > >> be
> > >>>>>>> poor, as it would be impossbile to build efficient dictionaries
> in
> > >>>>>>> mixed
> > >>>>>>> data pages.
> > >>>>>>>
> > >>>>>>> On Sat, Sep 30, 2017 at 8:48 AM, Vladimir Ozerov <
> > >> voze...@gridgain.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Folks,
> > >>>>>>>>
> > >>>>>>>> Honesly, to me logical caches appears to be a dirty shortcut to
> > >>>>>>> mitigate
> > >>>>>>>> some inefficient internal implementation. Why can't we merge
> > >>>>>>> partition maps
> > >>>>>>>> in runtime? This should not be a problem for context-independent
> > >>>>>>> affinity
> > >>>>>>>> functions (e.g. RendezvousAffinityFunction). From user
> perspective
> > >>>>>>> logic
> > >>>>>>>> caches feature is:
> > >>>>>>>> 1) Bad API. One cannot define group configuration. All you can
> do
> > is
> > >>>>>>> to
> > >>>>>>>> define group name on cache lavel and hope that nobody started
> > >> another
> > >>>>>>> cache
> > >>>>>>>> in the same group with different configuration before.
> > >>>>>>>> 2) Performance impact for scans, as you have to iterate over
> mixed
> > >>>>>>> data.
> > >>>>>>>>
> > >>>>>>>> Couldn't we fix partition map problem without cache groups?
> > >>>>>>>>
> > >>>>>>>> On Sat, Sep 30, 2017 at 2:35 AM, Denis Magda <dma...@apache.org
> >
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Guys,
> > >>>>>>>>>
> > >>>>>>>>> Another question. Does this capability enabled by default? If
> > yes,
> > >>>>>>> how do
> > >>>>>>>>> we decide which group a cache goes to?
> > >>>>>>>>>
> > >>>>>>>>> —
> > >>>>>>>>> Denis
> > >>>>>>>>>
> > >>>>>>>>>> On Sep 29, 2017, at 3:58 PM, Denis Magda <dma...@apache.org>
> > >>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Igniters,
> > >>>>>>>>>>
> > >>>>>>>>>> I’ve put on paper the feature from the subj:
> > >>>>>>>>>> https://apacheignite.readme.io/docs/logical-caches <
> > >>>>>>>>> https://apacheignite.readme.io/docs/logical-caches>
> > >>>>>>>>>>
> > >>>>>>>>>> Sam, will appreciate if you read through it and confirm I
> > >>>>>>> explained the
> > >>>>>>>>> topic 100% technically correct.
> > >>>>>>>>>>
> > >>>>>>>>>> However, are there any negative impacts of having logical
> > caches?
> > >>>>>>> This
> > >>>>>>>>> page has “Possible Impacts” section unfilled:
> > >>>>>>>>>> https://cwiki.apache.org/confluence/display/IGNITE/
> > Logical+Caches
> > >>>>>>> <
> > >>>>>>>>> https://cwiki.apache.org/confluence/display/IGNITE/
> > Logical+Caches>
> > >>>>>>>>>>
> > >>>>>>>>>> —
> > >>>>>>>>>> Denis
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: Logical Cache Documented

Reply via email to