[ https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437762#comment-17437762 ]
ASF subversion and git services commented on KUDU-1959: ------------------------------------------------------- Commit 59070bf5bd5924c6e4deb68434744cac3b062dcc in kudu's branch refs/heads/master from Abhishek Chennaka [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=59070bf ] KUDU-1959 - Implement aggregate startup progress metrics We expose the below metrics as a part of this commit: * startup_progress_steps_remaining : count of server startup steps which are yet to be completed. This value is in the range [0,4]. * startup_progress_time_elapsed : the time elapsed so far for the server to startup. If the startup is completed, this is the total time taken for the startup. This is in milliseconds. These metrics are primarily expected to be used by third party monitoring tools to see how long has the server taken to startup historically for any sort of trend analysis. The startup_progress_time_elapsed metric can also be used to check the previous startup time as an alternative to the startup page in the WebUI. Change-Id: I0a508c3baf0a0d77baf75f36f7bb305a6ad821e1 Reviewed-on: http://gerrit.cloudera.org:8080/17903 Tested-by: Kudu Jenkins Reviewed-by: Andrew Wong <aw...@cloudera.com> > Hard to tell when a cluster is done starting up > ----------------------------------------------- > > Key: KUDU-1959 > URL: https://issues.apache.org/jira/browse/KUDU-1959 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling > Reporter: Jean-Daniel Cryans > Assignee: Abhishek > Priority: Major > Labels: roadmap-candidate, usability > > Restarting a cluster that has a good amount of data, it's hard to tell when > it's "done". Right now the things I do: > - Run ksck, wait until most tablets are not in "unavailable" or > "boostrapping" state. > - Watch the metrics and see when the data under management is close to where > it was before restarting (it grows as tablets are getting bootstrapped). > - Look at the tablet server web UIs for tablets, compare how many are done > bootstrapping VS in the process of VS not started. > Ideas on how to improve this: > - In the master's web UI for tablet servers, show how many tablets are > running VS not running (I wouldn't add anything about tombstoned tablets) > - Add metrics for tablets in different states. -- This message was sent by Atlassian Jira (v8.3.4#803005)