Chris Nauroth created HDFS-7121:
-----------------------------------

             Summary: For JournalNode operations that must succeed on all 
nodes, attempt to undo the operation on all nodes if it fails on one node.
                 Key: HDFS-7121
                 URL: https://issues.apache.org/jira/browse/HDFS-7121
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: journal-node
            Reporter: Chris Nauroth


Several JournalNode operations are not satisfied by a quorum.  They must 
succeed on every JournalNode in the cluster.  If the operation succeeds on some 
nodes, but fails on others, then this may leave the nodes in an inconsistent 
state and require operations to do manual recovery steps.  For example, if 
{{doPreUpgrade}} succeeds on 2 nodes and fails on 1 node, then the operator 
will need to correct the problem on the failed node and also manually restore 
the previous.tmp directory to current on the 2 successful nodes before 
reattempting the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to