Sumit Agrawal created HDDS-12468:
------------------------------------

             Summary: Check for space availability for all dns while container 
creation in pipeline
                 Key: HDDS-12468
                 URL: https://issues.apache.org/jira/browse/HDDS-12468
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Sumit Agrawal


At SCM for Ratis during allocateBlock,
 # Pipeline is chosen randomly
 # Container is choosen round robin with size required
 # if matching container is not found
 ## Create a new container and return back
 # Block is assigned to the container and returned back response

 

Later can can fail at DN while container creation with negative impact as below.

Issue here is,
 * If Leader node in pipeline do not have capacity to create new container, it 
will return back container creation failure
 * If Follower node do not have capacity to create new container, it will fail 
and keep trying (if another follower is success)
 * This can have negative impact of disk getting full in parallel write blocks 
via state machine, and slow down write capability and failure response

 

Its being observed that write on follower node getting stuck due to disk full / 
volume failure.

 

As solution,
 * In this situation, SCM should trigger pipeline closure (including container 
closure) with cool down time
 * Should choose other pipeline for block allocation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to