Guys, The cache size counter is actually a set of per-partition counters.
2018-04-25 12:45 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > Yakov, > > Thread-per-partition is hardly applicable for general SQL use case as user > operates on arbitrary data sets. But in general we may track size deltas for > partitions on transaction level. If transaction span one or several > partitions, we may hold this data in a single long or Map. If transaction > spans a lot of partitions, we may store this data in array. > > What do you think? > > On Wed, Apr 25, 2018 at 12:18 PM, Yakov Zhdanov <yzhda...@apache.org> wrote: >> >> Guys, >> >> How do we update counter right now? >> >> If we move to fair thread-per-partition we can update counter only if we >> add new key and skip if we add or remove a version. Thoughts? >> >> --Yakov >> >> 2018-04-25 12:07 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: >> >> > This is interesting question. Full-scan size may be tremendously slow >> > operation on large data sets. On the other hand, printing total number >> > of >> > tuples including old and aborted versions make little to no sense as >> > well. >> > Looks like we need to choose lesser of two evils. What if we do the >> > following: >> > 1) Left default behavior as is - O(1) complexity, but includes invalid >> > versions >> > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will >> > perform full scan. >> > >> > Alternatively we may throw an "UnsupportedOperationException" from this >> > method - why not? >> > >> > Thoughts? >> > >> > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov >> > <zkilling...@gmail.com >> > > >> > wrote: >> > >> > > Hi Igniters, >> > > >> > > I need your advice on a task at hand. >> > > >> > > Currently cache API size() is a constant time operation, since the >> > > number of entries is maintained as a separate counter. >> > > However, for MVCC-enabled cache there can be multiple versions of the >> > > same entry. >> > > In order to calculate the size we need to obtain a MVCC snapshot and >> > > then iterate over data pages filtering invisible versions. >> > > So, it is impossible to keep the same complexity guarantees. >> > > >> > > My current implementation internally switches to "full-scan" approach >> > > if cache in question is a MVCC-enabled cache. >> > > It happens unbeknown to users, which may expect lightning-fast >> > > response as before. >> > > Perhaps, we might add a new constant to CachePeekMode enumeration that >> > > is passed to cache size() to make it explicit? >> > > >> > > The second concern is that cache size calculation is also included >> > > into Cache Metrics API and Visor functionality. >> > > Will it be OK for metrics and things alike to keep returning raw >> > > unfiltered number of entries? >> > > Is there any sense in showing raw unfiltered number of entries which >> > > may vary greatly from invokation to invokation with just simple >> > > updates running in background? >> > > >> > > Please share your thoughts. >> > > >> > > Thanks in advance. >> > > -- >> > > Sergey >> > > >> > > >