[ https://issues.apache.org/jira/browse/HIVE-25737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-25737 stopped by Viktor Csomor. -------------------------------------------- > Compaction Observability: Initiator/Worker/Cleaner cycle measurement > improvements > --------------------------------------------------------------------------------- > > Key: HIVE-25737 > URL: https://issues.apache.org/jira/browse/HIVE-25737 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Affects Versions: 4.0.0 > Reporter: Viktor Csomor > Assignee: Viktor Csomor > Priority: Major > > In the Compaction Observability the Initiator/Worker/Cleaner cycle is > measured with a [Dropwizard > Timer|https://metrics.dropwizard.io/4.2.0/getting-started.html] metrics. > {noformat} > Timers > A timer measures both the rate that a particular piece of code is called and > the distribution of its duration. > {noformat} > However this is not good to measure simply a duration. Furthermore, one HMS > can run multiple Worker threads and the duration of the last finished worker > is not really informative if a Worker thread got stuck. > Timers do not carry enough information because they only bump the counter if > a Worker has finished a loop. > If Initiator/Worker/Cleaner gets stuck, then the metrics is not provided > hence it didn't bump the counter. > It'd better to implement the followings: > - Time passed since Initiator start (single threaded) -> Gauge metric > - Oldest Working compaction -> Gauge Metric > - Oldest Working Cleaner -> Gauge metric -- This message was sent by Atlassian Jira (v8.20.1#820001)