[ 
https://issues.apache.org/jira/browse/IMPALA-10476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenzhe Zhou reassigned IMPALA-10476:
------------------------------------

    Assignee:     (was: Wenzhe Zhou)

> Remove executor node with faulty disks from executor group
> ----------------------------------------------------------
>
>                 Key: IMPALA-10476
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10476
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Wenzhe Zhou
>            Priority: Major
>
> If an executor node frequently gets disk IO failures when reading/writing 
> local disk, it should report its unhealthy state to statestore so that the 
> node could be marked as down and be removed from executor group to avoid 
> repeated query failures in the cluster. This provides a mechanism for 
> executor node to remove itself from scheduling.
> The two major components of Impala that read/write from local disk are the 
> spill-to-disk and data caching features. We need to add stats for counting 
> such local disk failures over a period of time like last x seconds, then use 
> these stats to measure if a node is in good health for executing query 
> fragment instances.   
> The healthy state of an executor node should be shown on the debug WebUI. We 
> should also allow users to overwrite the node's healthy state. The node will 
> restart to register itself in the statestore once its healthy state is 
> overwritten.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to