Re: cache size() calculation for MVCC

Sergey Kalashnikov Wed, 25 Apr 2018 02:54:39 -0700

Guys,

The cache size counter is actually a set of per-partition counters.


2018-04-25 12:45 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> Yakov,
>
> Thread-per-partition is hardly applicable for general SQL use case as user
> operates on arbitrary data sets. But in general we may track size deltas for
> partitions on transaction level. If transaction span one or several
> partitions, we may hold this data in a single long or Map. If transaction
> spans a lot of partitions, we may store this data in array.
>
> What do you think?
>
> On Wed, Apr 25, 2018 at 12:18 PM, Yakov Zhdanov <yzhda...@apache.org> wrote:
>>
>> Guys,
>>
>> How do we update counter right now?
>>
>> If we move to fair thread-per-partition we can update counter only if we
>> add new key and skip if we add or remove a version. Thoughts?
>>
>> --Yakov
>>
>> 2018-04-25 12:07 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>>
>> > This is interesting question. Full-scan size may be tremendously slow
>> > operation on large data sets. On the other hand, printing total number
>> > of
>> > tuples including old and aborted versions make little to no sense as
>> > well.
>> > Looks like we need to choose lesser of two evils. What if we do the
>> > following:
>> > 1) Left default behavior as is - O(1) complexity, but includes invalid
>> > versions
>> > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will
>> > perform full scan.
>> >
>> > Alternatively we may throw an "UnsupportedOperationException" from this
>> > method - why not?
>> >
>> > Thoughts?
>> >
>> > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov
>> > <zkilling...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Igniters,
>> > >
>> > > I need your advice on a task at hand.
>> > >
>> > > Currently cache API size() is a constant time operation, since the
>> > > number of entries is maintained as a separate counter.
>> > > However, for MVCC-enabled cache there can be multiple versions of the
>> > > same entry.
>> > > In order to calculate the size we need to obtain a MVCC snapshot and
>> > > then iterate over data pages filtering invisible versions.
>> > > So, it is impossible to keep the same complexity guarantees.
>> > >
>> > > My current implementation internally switches to "full-scan" approach
>> > > if cache in question is a MVCC-enabled cache.
>> > > It happens unbeknown to users, which may expect lightning-fast
>> > > response as before.
>> > > Perhaps, we might add a new constant to CachePeekMode enumeration that
>> > > is passed to cache size() to make it explicit?
>> > >
>> > > The second concern is that cache size calculation is also included
>> > > into Cache Metrics API and Visor functionality.
>> > > Will it be OK for metrics and things alike to keep returning raw
>> > > unfiltered number of entries?
>> > > Is there any sense in showing raw unfiltered number of entries which
>> > > may vary greatly from invokation to invokation with just simple
>> > > updates running in background?
>> > >
>> > > Please share your thoughts.
>> > >
>> > > Thanks in advance.
>> > > --
>> > > Sergey
>> > >
>> >
>
>

Re: cache size() calculation for MVCC

Reply via email to