[ https://issues.apache.org/jira/browse/HIVE-28554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated HIVE-28554: -------------------------------- Description: Looking at the metrics endpoint of the Tez AM: {code} query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 32.0 query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 2.0 query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 23.0 query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 21.0 query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 236.0 {code} after cancellation: {code} query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 31.0 query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 3.0 query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 44.0 query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 236.0 ] {code} there were expected transitions: {code} dag_running: 1 -> 0 completed_dag_count: 2 -> 3 running_task_count: 21 -> 0 {code} however, the ones that were supposed to turn 0 I think: {code} pending_task_count: 32 -> 31 schedulable_task_count: 23 -> 44 {code} as dag_running behaves correctly, we can ignore the ones that hasn't turned to 0, however, for clarity's sake, it would be nice to clear them was: Looking at the metrics endpoint of the Tez AM: {code} query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 32.0 query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 2.0 query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 23.0 query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 21.0 query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 236.0 {code} after cancellation: {code} query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 1.0 query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 31.0 query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 3.0 query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 44.0 query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 0.0 query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} 236.0 ] {code} there were expected transitions: {code} dag_running: 1 -> 0 completed_dag_count: 2 -> 3 running_task_count: 21 -> 0 {code} however, the ones that were supposed to turn 0 I think: {code} pending_task_count: 32 -> 31 schedulable_task_count: 23 -> 44 {code} as dag_running behaves correctly, we can ignore the ones that hasn't turned to 0, however, for clarity's sake, it would be nice to clear them > LlapTaskSchedulerMetrics doesn't seem to be cleared when cancelling a query > --------------------------------------------------------------------------- > > Key: HIVE-28554 > URL: https://issues.apache.org/jira/browse/HIVE-28554 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > > Looking at the metrics endpoint of the Tez AM: > {code} > > query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 1.0 > query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 1.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 32.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 2.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 1.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 23.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 21.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > > query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 236.0 > {code} > after cancellation: > {code} > > query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 1.0 > query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 31.0 > query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 3.0 > query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 44.0 > query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 0.0 > query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",} > 236.0 > ] > {code} > there were expected transitions: > {code} > dag_running: 1 -> 0 > completed_dag_count: 2 -> 3 > running_task_count: 21 -> 0 > {code} > however, the ones that were supposed to turn 0 I think: > {code} > pending_task_count: 32 -> 31 > schedulable_task_count: 23 -> 44 > {code} > as dag_running behaves correctly, we can ignore the ones that hasn't turned > to 0, however, for clarity's sake, it would be nice to clear them -- This message was sent by Atlassian Jira (v8.20.10#820010)