[ 
https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437762#comment-17437762
 ] 

ASF subversion and git services commented on KUDU-1959:
-------------------------------------------------------

Commit 59070bf5bd5924c6e4deb68434744cac3b062dcc in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=59070bf ]

KUDU-1959 - Implement aggregate startup progress metrics

We expose the below metrics as a part of this commit:
* startup_progress_steps_remaining : count of server startup steps which
  are yet to be completed. This value is in the range [0,4].
* startup_progress_time_elapsed : the time elapsed so far for the server to
  startup. If the startup is completed, this is the total time taken for the
  startup. This is in milliseconds.
These metrics are primarily expected to be used by third party monitoring tools
to see how long has the server taken to startup historically for any sort of
trend analysis.
The startup_progress_time_elapsed metric can also be used to check the
previous startup time as an alternative to the startup page in the WebUI.

Change-Id: I0a508c3baf0a0d77baf75f36f7bb305a6ad821e1
Reviewed-on: http://gerrit.cloudera.org:8080/17903
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>


> Hard to tell when a cluster is done starting up
> -----------------------------------------------
>
>                 Key: KUDU-1959
>                 URL: https://issues.apache.org/jira/browse/KUDU-1959
>             Project: Kudu
>          Issue Type: Improvement
>          Components: ops-tooling
>            Reporter: Jean-Daniel Cryans
>            Assignee: Abhishek
>            Priority: Major
>              Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when 
> it's "done". Right now the things I do:
>  - Run ksck, wait until most tablets are not in "unavailable" or 
> "boostrapping" state.
>  - Watch the metrics and see when the data under management is close to where 
> it was before restarting (it grows as tablets are getting bootstrapped).
>  - Look at the tablet server web UIs for tablets, compare how many are done 
> bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
>  - In the master's web UI for tablet servers, show how many tablets are 
> running VS not running (I wouldn't add anything about tombstoned tablets)
>  - Add metrics for tablets in different states.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to