[jira] [Created] (HDFS-2741) dfs.datanode.max.xcievers missing in 0.20.205.0

2012-01-02 Thread Markus Jelsma (Created) (JIRA)
dfs.datanode.max.xcievers missing in 0.20.205.0
---

 Key: HDFS-2741
 URL: https://issues.apache.org/jira/browse/HDFS-2741
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Markus Jelsma
Priority: Minor


The dfs.datanode.max.xcievers configuration directive is missing in the 
hdfs-default.xml and documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2742) HA: observed dataloss in replication stress test

2012-01-02 Thread Todd Lipcon (Created) (JIRA)
HA: observed dataloss in replication stress test


 Key: HDFS-2742
 URL: https://issues.apache.org/jira/browse/HDFS-2742
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Attachments: log-colorized.txt

The replication stress test case failed over the weekend since one of the 
replicas went missing. Still diagnosing the issue, but it seems like the chain 
of events was something like:
- a block report was generated on one of the nodes while the block was being 
written - thus the block report listed the block as RBW
- when the standby replayed this queued message, it was replayed after the file 
was marked complete. Thus it marked this replica as corrupt
- it asked the DN holding the corrupt replica to delete it. And, I think, 
removed it from the block map at this time.
- That DN then did another block report before receiving the deletion. This 
caused it to be re-added to the block map, since it was "FINALIZED" now.
- Replication was lowered on the file, and it counted the above replica as 
non-corrupt, and asked for the other replicas to be deleted.
- All replicas were lost.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira