Re: [DISCUSSION] Add index rebuild time metrics

Ivan Rakov Tue, 11 Aug 2020 06:27:58 -0700

>
> As a compromise, I can add jmx methods (rebuilding indexes in the process
> and the percentage of rebuilding) for the entire node, but I tried to find
> a suitable place and did not find it, tell me where to add it?


I have checked existing JMX beans. To be honest, I struggle to find a
suitable place as well.
We have ClusterMetrics that may represent the state of a local node, but
this class is also used for aggregated cluster metrics. I can't propose a
reasonable way to merge percentages from different nodes.
On the other hand, total index rebuild for all caches isn't a common
scenario. It's either performed after manual index.bin removal or after
index creation, both operations are performed on cache / cache-group level.
Also, all other similar metrics are provided on cache-group level.

I propose to stick with a cache-group level metric (e.g.
getIndexBuildProgress) that returns a float from 0 to 1, which is
calculated as [processedKeys] / [localCacheSize]. Even if a user handles
metrics through Zabbix, I anticipate that he'll perform this calculation on
his own in order to estimate progress. Let's help him a bit and perform it
on the system side.
If a per-group percentage metric is present, I
think getIndexRebuildKeyProcessed becomes redundant.

On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tkalkir...@yandex.ru>
wrote:

> Hi, Ivan!
>
> What precision would be sufficient?
> > If the progress is very slow, I don't see issues with tracking it if the
> > percentage float has enough precision.
>
> I think we can add a mention getting cache size.
> > 1. Gain an understanding that local cache size
> > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> > isn't mentioned neither in javadoc nor in JMX method description).
>
> Do you think users collect metrics with their hands? I think this is done
> by other systems, such as zabbix.
> > 2. Manually calculate sum of all metrics and divide to sum of all cache
> > sizes.
>
> As a compromise, I can add jmx methods (rebuilding indexes in the process
> and the percentage of rebuilding) for the entire node, but I tried to find
> a suitable place and did not find it, tell me where to add it?
> > On the other hand, % of index rebuild progress is self-descriptive. I
> don't
> > understand why we tend to make user's life harder.
>
> 10.08.2020, 21:57, "Ivan Rakov" <ivan.glu...@gmail.com>:
> >>  This metric can be used only for local node, to get size of cache use
> >>
>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >
> >  Got it, agree.
> >
> > If there is a lot of data in node that can be rebuilt, percentage may
> >>  change very rarely and may not give an estimate of how much time is
> left.
> >>  If we see for example that 50_000 keys are rebuilt once a minute, and
> we
> >>  have 1_000_000_000 keys, then we can have an approximate estimate.
> What do
> >>  you think of that?
> >
> > If the progress is very slow, I don't see issues with tracking it if the
> > percentage float has enough precision.
> > Still, usability of the metric concerns me. In order to estimate
> remaining
> > time of index rebuild, user should:
> > 1. Gain an understanding that local cache size
> > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> > isn't mentioned neither in javadoc nor in JMX method description).
> > 2. Manually calculate sum of all metrics and divide to sum of all cache
> > sizes.
> > On the other hand, % of index rebuild progress is self-descriptive. I
> don't
> > understand why we tend to make user's life harder.
> >
> > --
> > Best regards,
> > Ivan
> >
> > On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tkalkir...@yandex.ru>
> > wrote:
> >
> >>  Hi, Ivan!
> >>
> >>  For this you can use
> >>  org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
> >>  > How can a local number of processed keys can help us to understand
> when
> >>  > index rebuild will be finished?
> >>
> >>  This metric can be used only for local node, to get size of cache use
> >>
>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >>  > We can't compare metric value with cache.size(). First one is
> node-local,
> >>  > while cache size covers all partitions in the cluster.
> >>
> >>  If there is a lot of data in node that can be rebuilt, percentage may
> >>  change very rarely and may not give an estimate of how much time is
> left.
> >>  If we see for example that 50_000 keys are rebuilt once a minute, and
> we
> >>  have 1_000_000_000 keys, then we can have an approximate estimate.
> What do
> >>  you think of that?
> >>  > I find one single metric much more usable. It would be perfect if
> metric
> >>  > value is represented in percentage, e.g. current progress of local
> node
> >>  > index rebuild is 60%.
> >>
> >>  10.08.2020, 19:11, "Ivan Rakov" <ivan.glu...@gmail.com>:
> >>  > Folks,
> >>  >
> >>  > Sorry for coming late to the party. I've taken a look at this issue
> >>  during
> >>  > review.
> >>  >
> >>  > How can a local number of processed keys can help us to understand
> when
> >>  > index rebuild will be finished?
> >>  > We can't compare metric value with cache.size(). First one is
> node-local,
> >>  > while cache size covers all partitions in the cluster.
> >>  > Also, I don't understand why we need to keep separate metrics for all
> >>  > caches. Of course, the metric becomes more fair, but obviously
> harder to
> >>  > make conclusions on whether "the index rebuild" process is over (and
> the
> >>  > cluster is ready to process queries quickly).
> >>  >
> >>  > I find one single metric much more usable. It would be perfect if
> metric
> >>  > value is represented in percentage, e.g. current progress of local
> node
> >>  > index rebuild is 60%.
> >>  >
> >>  > --
> >>  > Best regards,
> >>  > Ivan
> >>  >
> >>  > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
> >>  stanlukya...@gmail.com>
> >>  > wrote:
> >>  >
> >>  >> Got it. I thought that index building and index rebuilding are
> >>  essentially
> >>  >> the same,
> >>  >> but now I see that they are different: index rebuilding cares about
> all
> >>  >> indexes at once while index building cares about particular ones.
> >>  >>
> >>  >> Kirill's approach sounds good.
> >>  >>
> >>  >> Stan
> >>  >>
> >>  >> > On 20 Jul 2020, at 14:54, Alexey Goncharuk <
> >>  alexey.goncha...@gmail.com>
> >>  >> wrote:
> >>  >> >
> >>  >> > Stan,
> >>  >> >
> >>  >> > Currently we never build indexes one-by-one - we always use a
> cache
> >>  data
> >>  >> > row visitor which either updates all indexes (see
> >>  >> IndexRebuildFullClosure)
> >>  >> > or updates a set of all indexes that need to catch up (see
> >>  >> > IndexRebuildPartialClosure). GIven that, I do not see any need for
> >>  >> > per-index rebuild status as this status will be updated for all
> >>  outdated
> >>  >> > indexes simultaneously.
> >>  >> >
> >>  >> > Kirill's approach for the total number of processed keys per cache
> >>  seems
> >>  >> > reasonable to me.
> >>  >> >
> >>  >> > --AG
> >>  >> >
> >>  >> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkir...@yandex.ru
> >:
> >>  >> >
> >>  >> >> Hi, Stan!
> >>  >> >>
> >>  >> >> Perhaps it is worth clarifying what exactly I wanted to say.
> >>  >> >> Now we have 2 processes: building and rebuilding indexes.
> >>  >> >>
> >>  >> >> At moment, we have some metrics for rebuilding indexes:
> >>  >> >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
> >>  >> >>
> >>  >> >> I suggest adding another metric "Indexrebuildkeyprocessed", which
> >>  will
> >>  >> >> allow you to determine how many records are left to rebuild for
> >>  cache.
> >>  >> >>
> >>  >> >> I think your comments are more about building an index that may
> need
> >>  >> more
> >>  >> >> metrics, but I think you should do it in a separate ticket.
> >>  >> >>
> >>  >> >> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukya...@gmail.com
> >:
> >>  >> >>> If multiple indexes are to be built "number of indexed keys"
> >>  metric may
> >>  >> >> be misleading.
> >>  >> >>>
> >>  >> >>> As a cluster admin, I'd like to know:
> >>  >> >>> - Are all indexes ready on a node?
> >>  >> >>> - How many indexes are to be built?
> >>  >> >>> - How much resources are used by the index building (how many
> >>  threads
> >>  >> >> are used)?
> >>  >> >>> - Which index(es?) is being built right now?
> >>  >> >>> - How much time until the current (single) index building
> finishes?
> >>  >> Here
> >>  >> >> "time" can be a lot of things: partitions, entries, percent of
> the
> >>  >> cache,
> >>  >> >> minutes and hours
> >>  >> >>> - How much time until all indexes are built?
> >>  >> >>> - How much does it take to build each of my indexes / a single
> >>  index of
> >>  >> >> my cache on average?
> >>  >> >>>
> >>  >> >>> I think we need a set of metrics and/or log messages to solve
> all
> >>  of
> >>  >> >> these questions.
> >>  >> >>> I imaging something like:
> >>  >> >>> - numberOfIndexesToBuild
> >>  >> >>> - a standard set of metrics on the index building thread pool
> (do
> >>  we
> >>  >> >> already have it?)
> >>  >> >>> - currentlyBuiltIndexName (assuming we only build one at a time
> >>  which
> >>  >> is
> >>  >> >> probably not true)
> >>  >> >>> - for the "time" metrics I think percentage might be the best as
> >>  it's
> >>  >> >> the easiest to understand; we may add multiple metrics though.
> >>  >> >>> - For "time per each index" I'd add detailed log messages
> stating
> >>  how
> >>  >> >> long did it take to build a particular index
> >>  >> >>>
> >>  >> >>> Thanks,
> >>  >> >>> Stan
> >>  >> >>>
> >>  >> >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <
> tkalkir...@yandex.ru>
> >>  >> >> wrote:
> >>  >> >>>>
> >>  >> >>>> Hi, Igniters.
> >>  >> >>>>
> >>  >> >>>> I would like to know if it is possible to estimate how much the
> >>  index
> >>  >> >> rebuild will take?
> >>  >> >>>>
> >>  >> >>>> At the moment, I have found the following metrics [1] and [2]
> and
> >>  >> >> since the rebuild is based on caches, I think it would be useful
> to
> >>  know
> >>  >> >> how many records are processed in indexing. This way we can
> >>  estimate how
> >>  >> >> long we have to wait for the index to be rebuilt by subtracting
> [3]
> >>  and
> >>  >> how
> >>  >> >> many records are indexed.
> >>  >> >>>>
> >>  >> >>>> I think we should add this metric [4].
> >>  >> >>>>
> >>  >> >>>> Comments, suggestions?
> >>  >> >>>>
> >>  >> >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
> >>  >> >>>> [2] -
> >>  >> >>
> >>  >>
> >>
>   
> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
> >>  >> >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
> >>  >> >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
> >>  >> >>
>

Re: [DISCUSSION] Add index rebuild time metrics

Reply via email to