[ 
https://issues.apache.org/jira/browse/KUDU-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437763#comment-17437763
 ] 

ASF subversion and git services commented on KUDU-1959:
-------------------------------------------------------

Commit 3e24e1be4362ba9efb6d295ff96d3a18893f2733 in kudu's branch 
refs/heads/master from Abhishek Chennaka
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3e24e1b ]

KUDU-1959 - Implement startup progress metrics related to containers and tablets

This patch implements the metrics related to the server startup.
* In case of log block manager, we expose:
  - log_block_manager_total_containers_startup : total containers present,
  - log_block_manager_processed_containers_startup : count of containers
    opened/processed until the requested instant of time and
  - log_block_manager_containers_processing_time_startup : time elapsed
    for opening the containers. If the containers are not yet opened, we
    provide the time elapsed so far.
* In case of tablet server, we expose:
  - tablets_num_total_startup : total tablets present,
  - tablets_num_opened_startup : count of tablets opened/processed until
    the requested instant of time and
  - tablets_opening_time_startup : time elapsed for opening the tablets.
    If the tablets are not yet opened, we provide the time elapsed so
    far.

All the times are in milliseconds and the time metrics are in debug
level.

Change-Id: I9d1aa85b0585214475a6bdb8c0e5d7343c5bc3c9
Reviewed-on: http://gerrit.cloudera.org:8080/17947
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>


> Hard to tell when a cluster is done starting up
> -----------------------------------------------
>
>                 Key: KUDU-1959
>                 URL: https://issues.apache.org/jira/browse/KUDU-1959
>             Project: Kudu
>          Issue Type: Improvement
>          Components: ops-tooling
>            Reporter: Jean-Daniel Cryans
>            Assignee: Abhishek
>            Priority: Major
>              Labels: roadmap-candidate, usability
>
> Restarting a cluster that has a good amount of data, it's hard to tell when 
> it's "done". Right now the things I do:
>  - Run ksck, wait until most tablets are not in "unavailable" or 
> "boostrapping" state.
>  - Watch the metrics and see when the data under management is close to where 
> it was before restarting (it grows as tablets are getting bootstrapped).
>  - Look at the tablet server web UIs for tablets, compare how many are done 
> bootstrapping VS in the process of VS not started.
> Ideas on how to improve this:
>  - In the master's web UI for tablet servers, show how many tablets are 
> running VS not running (I wouldn't add anything about tombstoned tablets)
>  - Add metrics for tablets in different states.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to