I seem to be in the minority here :) Fine, let's make it as clear as possible which metric method (localCacheSize) should be called in order to retrieve a 100% progress milestone. I've left comments in the PR.
On Tue, Aug 11, 2020 at 4:31 PM Nikolay Izhikov <nizhi...@apache.org> wrote: > > I propose to stick with a cache-group level metric (e.g. > getIndexBuildProgress) > > +1 > > > that returns a float from 0 to 1, which is calculated as [processedKeys] > / [localCacheSize]. > > From my point of view, we shouldn’t do calculations on the Ignite side if > we can avoid it. > I’d rather provide two separate metrics - processedKeys and localCacheSize. > > > 11 авг. 2020 г., в 16:26, Ivan Rakov <ivan.glu...@gmail.com> написал(а): > > > >> > >> As a compromise, I can add jmx methods (rebuilding indexes in the > process > >> and the percentage of rebuilding) for the entire node, but I tried to > find > >> a suitable place and did not find it, tell me where to add it? > > > > I have checked existing JMX beans. To be honest, I struggle to find a > > suitable place as well. > > We have ClusterMetrics that may represent the state of a local node, but > > this class is also used for aggregated cluster metrics. I can't propose a > > reasonable way to merge percentages from different nodes. > > On the other hand, total index rebuild for all caches isn't a common > > scenario. It's either performed after manual index.bin removal or after > > index creation, both operations are performed on cache / cache-group > level. > > Also, all other similar metrics are provided on cache-group level. > > > > I propose to stick with a cache-group level metric (e.g. > > getIndexBuildProgress) that returns a float from 0 to 1, which is > > calculated as [processedKeys] / [localCacheSize]. Even if a user handles > > metrics through Zabbix, I anticipate that he'll perform this calculation > on > > his own in order to estimate progress. Let's help him a bit and perform > it > > on the system side. > > If a per-group percentage metric is present, I > > think getIndexRebuildKeyProcessed becomes redundant. > > > > On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tkalkir...@yandex.ru> > > wrote: > > > >> Hi, Ivan! > >> > >> What precision would be sufficient? > >>> If the progress is very slow, I don't see issues with tracking it if > the > >>> percentage float has enough precision. > >> > >> I think we can add a mention getting cache size. > >>> 1. Gain an understanding that local cache size > >>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it > >>> isn't mentioned neither in javadoc nor in JMX method description). > >> > >> Do you think users collect metrics with their hands? I think this is > done > >> by other systems, such as zabbix. > >>> 2. Manually calculate sum of all metrics and divide to sum of all cache > >>> sizes. > >> > >> As a compromise, I can add jmx methods (rebuilding indexes in the > process > >> and the percentage of rebuilding) for the entire node, but I tried to > find > >> a suitable place and did not find it, tell me where to add it? > >>> On the other hand, % of index rebuild progress is self-descriptive. I > >> don't > >>> understand why we tend to make user's life harder. > >> > >> 10.08.2020, 21:57, "Ivan Rakov" <ivan.glu...@gmail.com>: > >>>> This metric can be used only for local node, to get size of cache use > >>>> > >> > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > >>> > >>> Got it, agree. > >>> > >>> If there is a lot of data in node that can be rebuilt, percentage may > >>>> change very rarely and may not give an estimate of how much time is > >> left. > >>>> If we see for example that 50_000 keys are rebuilt once a minute, and > >> we > >>>> have 1_000_000_000 keys, then we can have an approximate estimate. > >> What do > >>>> you think of that? > >>> > >>> If the progress is very slow, I don't see issues with tracking it if > the > >>> percentage float has enough precision. > >>> Still, usability of the metric concerns me. In order to estimate > >> remaining > >>> time of index rebuild, user should: > >>> 1. Gain an understanding that local cache size > >>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it > >>> isn't mentioned neither in javadoc nor in JMX method description). > >>> 2. Manually calculate sum of all metrics and divide to sum of all cache > >>> sizes. > >>> On the other hand, % of index rebuild progress is self-descriptive. I > >> don't > >>> understand why we tend to make user's life harder. > >>> > >>> -- > >>> Best regards, > >>> Ivan > >>> > >>> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tkalkir...@yandex.ru> > >>> wrote: > >>> > >>>> Hi, Ivan! > >>>> > >>>> For this you can use > >>>> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress > >>>>> How can a local number of processed keys can help us to understand > >> when > >>>>> index rebuild will be finished? > >>>> > >>>> This metric can be used only for local node, to get size of cache use > >>>> > >> > org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize. > >>>>> We can't compare metric value with cache.size(). First one is > >> node-local, > >>>>> while cache size covers all partitions in the cluster. > >>>> > >>>> If there is a lot of data in node that can be rebuilt, percentage may > >>>> change very rarely and may not give an estimate of how much time is > >> left. > >>>> If we see for example that 50_000 keys are rebuilt once a minute, and > >> we > >>>> have 1_000_000_000 keys, then we can have an approximate estimate. > >> What do > >>>> you think of that? > >>>>> I find one single metric much more usable. It would be perfect if > >> metric > >>>>> value is represented in percentage, e.g. current progress of local > >> node > >>>>> index rebuild is 60%. > >>>> > >>>> 10.08.2020, 19:11, "Ivan Rakov" <ivan.glu...@gmail.com>: > >>>>> Folks, > >>>>> > >>>>> Sorry for coming late to the party. I've taken a look at this issue > >>>> during > >>>>> review. > >>>>> > >>>>> How can a local number of processed keys can help us to understand > >> when > >>>>> index rebuild will be finished? > >>>>> We can't compare metric value with cache.size(). First one is > >> node-local, > >>>>> while cache size covers all partitions in the cluster. > >>>>> Also, I don't understand why we need to keep separate metrics for all > >>>>> caches. Of course, the metric becomes more fair, but obviously > >> harder to > >>>>> make conclusions on whether "the index rebuild" process is over (and > >> the > >>>>> cluster is ready to process queries quickly). > >>>>> > >>>>> I find one single metric much more usable. It would be perfect if > >> metric > >>>>> value is represented in percentage, e.g. current progress of local > >> node > >>>>> index rebuild is 60%. > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> Ivan > >>>>> > >>>>> On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov < > >>>> stanlukya...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Got it. I thought that index building and index rebuilding are > >>>> essentially > >>>>>> the same, > >>>>>> but now I see that they are different: index rebuilding cares about > >> all > >>>>>> indexes at once while index building cares about particular ones. > >>>>>> > >>>>>> Kirill's approach sounds good. > >>>>>> > >>>>>> Stan > >>>>>> > >>>>>>> On 20 Jul 2020, at 14:54, Alexey Goncharuk < > >>>> alexey.goncha...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> Stan, > >>>>>>> > >>>>>>> Currently we never build indexes one-by-one - we always use a > >> cache > >>>> data > >>>>>>> row visitor which either updates all indexes (see > >>>>>> IndexRebuildFullClosure) > >>>>>>> or updates a set of all indexes that need to catch up (see > >>>>>>> IndexRebuildPartialClosure). GIven that, I do not see any need for > >>>>>>> per-index rebuild status as this status will be updated for all > >>>> outdated > >>>>>>> indexes simultaneously. > >>>>>>> > >>>>>>> Kirill's approach for the total number of processed keys per cache > >>>> seems > >>>>>>> reasonable to me. > >>>>>>> > >>>>>>> --AG > >>>>>>> > >>>>>>> пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkir...@yandex.ru > >>> : > >>>>>>> > >>>>>>>> Hi, Stan! > >>>>>>>> > >>>>>>>> Perhaps it is worth clarifying what exactly I wanted to say. > >>>>>>>> Now we have 2 processes: building and rebuilding indexes. > >>>>>>>> > >>>>>>>> At moment, we have some metrics for rebuilding indexes: > >>>>>>>> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft". > >>>>>>>> > >>>>>>>> I suggest adding another metric "Indexrebuildkeyprocessed", which > >>>> will > >>>>>>>> allow you to determine how many records are left to rebuild for > >>>> cache. > >>>>>>>> > >>>>>>>> I think your comments are more about building an index that may > >> need > >>>>>> more > >>>>>>>> metrics, but I think you should do it in a separate ticket. > >>>>>>>> > >>>>>>>> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukya...@gmail.com > >>> : > >>>>>>>>> If multiple indexes are to be built "number of indexed keys" > >>>> metric may > >>>>>>>> be misleading. > >>>>>>>>> > >>>>>>>>> As a cluster admin, I'd like to know: > >>>>>>>>> - Are all indexes ready on a node? > >>>>>>>>> - How many indexes are to be built? > >>>>>>>>> - How much resources are used by the index building (how many > >>>> threads > >>>>>>>> are used)? > >>>>>>>>> - Which index(es?) is being built right now? > >>>>>>>>> - How much time until the current (single) index building > >> finishes? > >>>>>> Here > >>>>>>>> "time" can be a lot of things: partitions, entries, percent of > >> the > >>>>>> cache, > >>>>>>>> minutes and hours > >>>>>>>>> - How much time until all indexes are built? > >>>>>>>>> - How much does it take to build each of my indexes / a single > >>>> index of > >>>>>>>> my cache on average? > >>>>>>>>> > >>>>>>>>> I think we need a set of metrics and/or log messages to solve > >> all > >>>> of > >>>>>>>> these questions. > >>>>>>>>> I imaging something like: > >>>>>>>>> - numberOfIndexesToBuild > >>>>>>>>> - a standard set of metrics on the index building thread pool > >> (do > >>>> we > >>>>>>>> already have it?) > >>>>>>>>> - currentlyBuiltIndexName (assuming we only build one at a time > >>>> which > >>>>>> is > >>>>>>>> probably not true) > >>>>>>>>> - for the "time" metrics I think percentage might be the best as > >>>> it's > >>>>>>>> the easiest to understand; we may add multiple metrics though. > >>>>>>>>> - For "time per each index" I'd add detailed log messages > >> stating > >>>> how > >>>>>>>> long did it take to build a particular index > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Stan > >>>>>>>>> > >>>>>>>>>> On 26 Jun 2020, at 12:49, ткаленко кирилл < > >> tkalkir...@yandex.ru> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi, Igniters. > >>>>>>>>>> > >>>>>>>>>> I would like to know if it is possible to estimate how much the > >>>> index > >>>>>>>> rebuild will take? > >>>>>>>>>> > >>>>>>>>>> At the moment, I have found the following metrics [1] and [2] > >> and > >>>>>>>> since the rebuild is based on caches, I think it would be useful > >> to > >>>> know > >>>>>>>> how many records are processed in indexing. This way we can > >>>> estimate how > >>>>>>>> long we have to wait for the index to be rebuilt by subtracting > >> [3] > >>>> and > >>>>>> how > >>>>>>>> many records are indexed. > >>>>>>>>>> > >>>>>>>>>> I think we should add this metric [4]. > >>>>>>>>>> > >>>>>>>>>> Comments, suggestions? > >>>>>>>>>> > >>>>>>>>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184 > >>>>>>>>>> [2] - > >>>>>>>> > >>>>>> > >>>> > >> > org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft > >>>>>>>>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize > >>>>>>>>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys > >>>>>>>> > >> > >