[ 
https://issues.apache.org/jira/browse/HIVE-28554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-28554:
--------------------------------
    Description: 
Looking at the metrics endpoint of the Tez AM:
{code}
 
query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 32.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 2.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 23.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 21.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
  
query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 236.0
{code}


after cancellation:
{code}
    
query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 31.0
query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 3.0
query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 44.0
query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 236.0

]
{code}


there were expected transitions:
{code}
dag_running: 1 -> 0
completed_dag_count: 2 -> 3
running_task_count: 21 -> 0
{code}

however, the ones that were supposed to turn 0 I think:
{code}
pending_task_count: 32 -> 31
schedulable_task_count: 23 -> 44
{code}

as dag_running behaves correctly, we can ignore the ones that hasn't turned to 
0, however, for clarity's sake, it would be nice to clear them

  was:
Looking at the metrics endpoint of the Tez AM:
{code}
 
query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 32.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 2.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 23.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 21.0
    
query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
  
query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 236.0
{code}


after cancellation:
{code}
    
query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 1.0
query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 31.0
query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 3.0
query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 44.0
query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 0.0
query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
 236.0

]
{code}


there were expected transitions:
{code}
dag_running: 1 -> 0
completed_dag_count: 2 -> 3
running_task_count: 21 -> 0
{code}

however, the ones that were supposed to turn 0 I think:
{code}
pending_task_count: 32 -> 31
schedulable_task_count: 23 -> 44
{code}

as dag_running behaves correctly, we can ignore the ones that hasn't turned to 
0, however, for clarity's sake, it would be nice to clear them


> LlapTaskSchedulerMetrics doesn't seem to be cleared when cancelling a query
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-28554
>                 URL: https://issues.apache.org/jira/browse/HIVE-28554
>             Project: Hive
>          Issue Type: Bug
>      Security Level: Public(Viewable by anyone) 
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>
> Looking at the metrics endpoint of the Tez AM:
> {code}
>  
> query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  1.0
> query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  1.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  32.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  2.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  1.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  23.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  21.0
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
>   
> query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  236.0
> {code}
> after cancellation:
> {code}
>     
> query_coordinator_llaptaskschedulermetrics_scheduler_cluster_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  1.0
> query_coordinator_llaptaskschedulermetrics_scheduler_dag_running{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_executors_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_pending_preemption_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_memory_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_cpu_cores_per_instance{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_pending_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  31.0
> query_coordinator_llaptaskschedulermetrics_scheduler_completed_dag_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  3.0
> query_coordinator_llaptaskschedulermetrics_scheduler_disabled_node_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_schedulable_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  44.0
> query_coordinator_llaptaskschedulermetrics_scheduler_running_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_preempted_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  0.0
> query_coordinator_llaptaskschedulermetrics_scheduler_successful_task_count{kind="LlapTaskSchedulerMetrics",role="LlapTaskScheduler",service="QueryCoordinator",}
>  236.0
> ]
> {code}
> there were expected transitions:
> {code}
> dag_running: 1 -> 0
> completed_dag_count: 2 -> 3
> running_task_count: 21 -> 0
> {code}
> however, the ones that were supposed to turn 0 I think:
> {code}
> pending_task_count: 32 -> 31
> schedulable_task_count: 23 -> 44
> {code}
> as dag_running behaves correctly, we can ignore the ones that hasn't turned 
> to 0, however, for clarity's sake, it would be nice to clear them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to