Danny Becker created HDFS-17737:
-----------------------------------

             Summary: Implement Backoff Retry for ErasureCoding reads
                 Key: HDFS-17737
                 URL: https://issues.apache.org/jira/browse/HDFS-17737
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: dfsclient, ec, erasure-coding
    Affects Versions: 3.3.4
            Reporter: Danny Becker
            Assignee: Danny Becker


#Why
Currently EC Reads are less stable than replication reads because if 4 out of 9 
datanodes in the block group are busy, then the whole read fails. Erasure 
Coding reads need to be able to handle ERROR_BUSY signals from DataNodes and 
retry after a backoff duration to avoid overloading the DataNodes while 
increasing the stability of the read.

Throttling on server side was another proposed solution, but we prefer this 
client side backoff for a few main reasons (see 
https://msasg.visualstudio.com/DefaultCollection/Multi%20Tenancy/_git/Hadoop/pullRequest/5272897#1739008224):
1. Throttling on the server would use up thread connections which have a 
maximum limit.
2. Throttling was originally added only for cohosting scenario to reduce impact 
on other services
3. Throttling would use up resources on the DataNode which is already busy.

#What
The previous implementation followed a 4 phase algorithm to read.
1. Attempt to read chunks from the data blocks
2. Check for missing data chunks. Fail if there are more missing than the 
number of parity blocks, otherwise read parity blocks and null data blocks
3. Wait for data to be read into the buffers and handle any read errors by 
reading from more parity blocks
4. Check for missing blocks and either decode or fail.

The new implementation now merges phase 1-3 into a single loop:
1. Loop until we have enough blocks for read or decode, or we have too many 
missing blocks to succeed
   - Determine the number of chunks we need to fetch. ALLZERO chunks count 
towards this total. null data chunks also count towards this total unless there 
are missing data chunks.
   - Read chunks until we have enough pending or fetched to be able to decode 
or normal read.
faster.
   - Get results from reads and handle exceptions by preparing more reads for 
decoding the missing data
   - Check if we should sleep before retrying any reads.
2. Check for missing blocks and either decode or fail.

#Tests
Add unit test to `TestWriteReadStripedFile`
- Covers RS(3,2) with 1 chunk busy, 2 chunks busy, and 3 chunks busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to