Sumit Agrawal created HDDS-12468:
------------------------------------
Summary: Check for space availability for all dns while container
creation in pipeline
Key: HDDS-12468
URL: https://issues.apache.org/jira/browse/HDDS-12468
Project: Apache Ozone
Issue Type: Sub-task
Reporter: Sumit Agrawal
At SCM for Ratis during allocateBlock,
# Pipeline is chosen randomly
# Container is choosen round robin with size required
# if matching container is not found
## Create a new container and return back
# Block is assigned to the container and returned back response
Later can can fail at DN while container creation with negative impact as below.
Issue here is,
* If Leader node in pipeline do not have capacity to create new container, it
will return back container creation failure
* If Follower node do not have capacity to create new container, it will fail
and keep trying (if another follower is success)
* This can have negative impact of disk getting full in parallel write blocks
via state machine, and slow down write capability and failure response
Its being observed that write on follower node getting stuck due to disk full /
volume failure.
As solution,
* In this situation, SCM should trigger pipeline closure (including container
closure) with cool down time
* Should choose other pipeline for block allocation
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]