Ethan Rose created HDDS-12986:
---------------------------------

             Summary: Race condition between BCSID update and container repair
                 Key: HDDS-12986
                 URL: https://issues.apache.org/jira/browse/HDDS-12986
             Project: Apache Ozone
          Issue Type: Sub-task
            Reporter: Ethan Rose


Right now we don't have an atomic view of the chunks the tree says are missing 
vs the BCSID we should update to if we pull those chunks. Here's an example:

Replicas r1 and r2 are missing chunk c1 and have bcsid 99. r3 has all chunks 
and bcsid 100
    - r1 starts reconciling with r3
    - r1 writes c1 to container
    - r1 does putblock with bcsid 100
    - r2 starts reconciling with r1
    - r2 pulls tree from r1,  which does not yet have c1
    - r2 pulls bcsid from r1, which is 100
    - r1 updates tree with c1
    - r2 pulls chunks based on the tree it already pulled, so it does not get c1
    - r2 updates bcsid to 100 because it had no failed chunk pulls, but it does 
not have c1 corresponding to bcsid 100.

The easiest fix for this is to put BCSID in the block merkle tree proto (but 
not the hash). This also removes the need for getBlock calls during 
reconciliation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to