Re: [DISCUSSION] Add index rebuild time metrics

ткаленко кирилл Mon, 10 Aug 2020 22:20:22 -0700

Hi, Ivan!

What precision would be sufficient?
> If the progress is very slow, I don't see issues with tracking it if the
> percentage float has enough precision.


I think we can add a mention getting cache size.
> 1. Gain an understanding that local cache size
> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> isn't mentioned neither in javadoc nor in JMX method description).

Do you think users collect metrics with their hands? I think this is done by 
other systems, such as zabbix.
> 2. Manually calculate sum of all metrics and divide to sum of all cache
> sizes.

As a compromise, I can add jmx methods (rebuilding indexes in the process and 
the percentage of rebuilding) for the entire node, but I tried to find a 
suitable place and did not find it, tell me where to add it?
> On the other hand, % of index rebuild progress is self-descriptive. I don't
> understand why we tend to make user's life harder.

10.08.2020, 21:57, "Ivan Rakov" <[email protected]>:
>>  This metric can be used only for local node, to get size of cache use
>>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>
>  Got it, agree.
>
> If there is a lot of data in node that can be rebuilt, percentage may
>>  change very rarely and may not give an estimate of how much time is left.
>>  If we see for example that 50_000 keys are rebuilt once a minute, and we
>>  have 1_000_000_000 keys, then we can have an approximate estimate. What do
>>  you think of that?
>
> If the progress is very slow, I don't see issues with tracking it if the
> percentage float has enough precision.
> Still, usability of the metric concerns me. In order to estimate remaining
> time of index rebuild, user should:
> 1. Gain an understanding that local cache size
> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> isn't mentioned neither in javadoc nor in JMX method description).
> 2. Manually calculate sum of all metrics and divide to sum of all cache
> sizes.
> On the other hand, % of index rebuild progress is self-descriptive. I don't
> understand why we tend to make user's life harder.
>
> --
> Best regards,
> Ivan
>
> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <[email protected]>
> wrote:
>
>>  Hi, Ivan!
>>
>>  For this you can use
>>  org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
>>  > How can a local number of processed keys can help us to understand when
>>  > index rebuild will be finished?
>>
>>  This metric can be used only for local node, to get size of cache use
>>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>  > We can't compare metric value with cache.size(). First one is node-local,
>>  > while cache size covers all partitions in the cluster.
>>
>>  If there is a lot of data in node that can be rebuilt, percentage may
>>  change very rarely and may not give an estimate of how much time is left.
>>  If we see for example that 50_000 keys are rebuilt once a minute, and we
>>  have 1_000_000_000 keys, then we can have an approximate estimate. What do
>>  you think of that?
>>  > I find one single metric much more usable. It would be perfect if metric
>>  > value is represented in percentage, e.g. current progress of local node
>>  > index rebuild is 60%.
>>
>>  10.08.2020, 19:11, "Ivan Rakov" <[email protected]>:
>>  > Folks,
>>  >
>>  > Sorry for coming late to the party. I've taken a look at this issue
>>  during
>>  > review.
>>  >
>>  > How can a local number of processed keys can help us to understand when
>>  > index rebuild will be finished?
>>  > We can't compare metric value with cache.size(). First one is node-local,
>>  > while cache size covers all partitions in the cluster.
>>  > Also, I don't understand why we need to keep separate metrics for all
>>  > caches. Of course, the metric becomes more fair, but obviously harder to
>>  > make conclusions on whether "the index rebuild" process is over (and the
>>  > cluster is ready to process queries quickly).
>>  >
>>  > I find one single metric much more usable. It would be perfect if metric
>>  > value is represented in percentage, e.g. current progress of local node
>>  > index rebuild is 60%.
>>  >
>>  > --
>>  > Best regards,
>>  > Ivan
>>  >
>>  > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
>>  [email protected]>
>>  > wrote:
>>  >
>>  >> Got it. I thought that index building and index rebuilding are
>>  essentially
>>  >> the same,
>>  >> but now I see that they are different: index rebuilding cares about all
>>  >> indexes at once while index building cares about particular ones.
>>  >>
>>  >> Kirill's approach sounds good.
>>  >>
>>  >> Stan
>>  >>
>>  >> > On 20 Jul 2020, at 14:54, Alexey Goncharuk <
>>  [email protected]>
>>  >> wrote:
>>  >> >
>>  >> > Stan,
>>  >> >
>>  >> > Currently we never build indexes one-by-one - we always use a cache
>>  data
>>  >> > row visitor which either updates all indexes (see
>>  >> IndexRebuildFullClosure)
>>  >> > or updates a set of all indexes that need to catch up (see
>>  >> > IndexRebuildPartialClosure). GIven that, I do not see any need for
>>  >> > per-index rebuild status as this status will be updated for all
>>  outdated
>>  >> > indexes simultaneously.
>>  >> >
>>  >> > Kirill's approach for the total number of processed keys per cache
>>  seems
>>  >> > reasonable to me.
>>  >> >
>>  >> > --AG
>>  >> >
>>  >> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <[email protected]>:
>>  >> >
>>  >> >> Hi, Stan!
>>  >> >>
>>  >> >> Perhaps it is worth clarifying what exactly I wanted to say.
>>  >> >> Now we have 2 processes: building and rebuilding indexes.
>>  >> >>
>>  >> >> At moment, we have some metrics for rebuilding indexes:
>>  >> >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>>  >> >>
>>  >> >> I suggest adding another metric "Indexrebuildkeyprocessed", which
>>  will
>>  >> >> allow you to determine how many records are left to rebuild for
>>  cache.
>>  >> >>
>>  >> >> I think your comments are more about building an index that may need
>>  >> more
>>  >> >> metrics, but I think you should do it in a separate ticket.
>>  >> >>
>>  >> >> 03.07.2020, 03:09, "Stanislav Lukyanov" <[email protected]>:
>>  >> >>> If multiple indexes are to be built "number of indexed keys"
>>  metric may
>>  >> >> be misleading.
>>  >> >>>
>>  >> >>> As a cluster admin, I'd like to know:
>>  >> >>> - Are all indexes ready on a node?
>>  >> >>> - How many indexes are to be built?
>>  >> >>> - How much resources are used by the index building (how many
>>  threads
>>  >> >> are used)?
>>  >> >>> - Which index(es?) is being built right now?
>>  >> >>> - How much time until the current (single) index building finishes?
>>  >> Here
>>  >> >> "time" can be a lot of things: partitions, entries, percent of the
>>  >> cache,
>>  >> >> minutes and hours
>>  >> >>> - How much time until all indexes are built?
>>  >> >>> - How much does it take to build each of my indexes / a single
>>  index of
>>  >> >> my cache on average?
>>  >> >>>
>>  >> >>> I think we need a set of metrics and/or log messages to solve all
>>  of
>>  >> >> these questions.
>>  >> >>> I imaging something like:
>>  >> >>> - numberOfIndexesToBuild
>>  >> >>> - a standard set of metrics on the index building thread pool (do
>>  we
>>  >> >> already have it?)
>>  >> >>> - currentlyBuiltIndexName (assuming we only build one at a time
>>  which
>>  >> is
>>  >> >> probably not true)
>>  >> >>> - for the "time" metrics I think percentage might be the best as
>>  it's
>>  >> >> the easiest to understand; we may add multiple metrics though.
>>  >> >>> - For "time per each index" I'd add detailed log messages stating
>>  how
>>  >> >> long did it take to build a particular index
>>  >> >>>
>>  >> >>> Thanks,
>>  >> >>> Stan
>>  >> >>>
>>  >> >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <[email protected]>
>>  >> >> wrote:
>>  >> >>>>
>>  >> >>>> Hi, Igniters.
>>  >> >>>>
>>  >> >>>> I would like to know if it is possible to estimate how much the
>>  index
>>  >> >> rebuild will take?
>>  >> >>>>
>>  >> >>>> At the moment, I have found the following metrics [1] and [2] and
>>  >> >> since the rebuild is based on caches, I think it would be useful to
>>  know
>>  >> >> how many records are processed in indexing. This way we can
>>  estimate how
>>  >> >> long we have to wait for the index to be rebuilt by subtracting [3]
>>  and
>>  >> how
>>  >> >> many records are indexed.
>>  >> >>>>
>>  >> >>>> I think we should add this metric [4].
>>  >> >>>>
>>  >> >>>> Comments, suggestions?
>>  >> >>>>
>>  >> >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
>>  >> >>>> [2] -
>>  >> >>
>>  >>
>>   
>> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
>>  >> >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
>>  >> >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>>  >> >>

Re: [DISCUSSION] Add index rebuild time metrics

Reply via email to