Number of bytes per checksum

2011-06-24 Thread Praveen Sripati
Hi, Why is the checksum done for io.bytes.per.checksum (defaults to 512) instead of the complete block at once (dfs.block.size defaults to 67108864)? If a block is corrupt then the entire block has to be replicated anyway. Isn't it more efficient to do the checksum for complete block at once

Re: Number of bytes per checksum

2011-06-24 Thread Doug Cutting
A smaller checksum interval decreases the overhead for random access. If one seeks to a random location, one must, on average, read and checksum an extra checksumInterval/2 bytes. 512 was chosen as a value that, with four-byte CRC32, reduced the impact on small seeks while increasing the storage a

Re: Number of bytes per checksum

2011-06-24 Thread Kihwal Lee
Doing CRC32 on a huge data block also reduces its error detection capability. If you need more information on this topic, this paper will be a good starting poing: http://www.ece.cmu.edu/~koopman/networks/dsn02/dsn02_koopman.pdf Kihwal On 6/24/11 9:50 AM, "Doug Cutting" wrote: > A smaller c

[jira] [Resolved] (HDFS-2077) 1073: address checkpoint upload when one of the storage dirs is failed

2011-06-24 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2077. --- Resolution: Fixed Hadoop Flags: [Reviewed] Thanks. I added a comment in that area of the code be

[jira] [Resolved] (HDFS-2078) 1073: NN should not clear storage directory when restoring removed storage

2011-06-24 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2078. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks for reviewing, Eli. > 10

[jira] [Created] (HDFS-2106) Umbrella JIRA for separating block management and name space management in NameNode

2011-06-24 Thread Tsz Wo (Nicholas), SZE (JIRA)
Umbrella JIRA for separating block management and name space management in NameNode --- Key: HDFS-2106 URL: https://issues.apache.org/jira/browse/HDFS-2106 Project: Hadoo

[jira] [Created] (HDFS-2107) Move block management code to a package

2011-06-24 Thread Tsz Wo (Nicholas), SZE (JIRA)
Move block management code to a package --- Key: HDFS-2107 URL: https://issues.apache.org/jira/browse/HDFS-2107 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporte

[jira] [Resolved] (HDFS-2088) Move edits log archiving logic into FSEditLog/JournalManager

2011-06-24 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2088. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks Eli > Move edits log arc

[jira] [Resolved] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-24 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-2093. --- Resolution: Fixed Hadoop Flags: [Reviewed] Committed to branch, thanks for review > 1073: Handl

[jira] [Created] (HDFS-2108) Move datanode heartbeat handling to BlockManager

2011-06-24 Thread Tsz Wo (Nicholas), SZE (JIRA)
Move datanode heartbeat handling to BlockManager Key: HDFS-2108 URL: https://issues.apache.org/jira/browse/HDFS-2108 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node