A smaller checksum interval decreases the overhead for random access. If one seeks to a random location, one must, on average, read and checksum an extra checksumInterval/2 bytes. 512 was chosen as a value that, with four-byte CRC32, reduced the impact on small seeks while increasing the storage and transmission overheads by less than 1%.
Increasing the interval would not likely reduce the computation significantly, as the same number of bytes are checksummed regardless, but it might optimize i/o operations in some cases without harming random access much if this were increased to 8k or larger. Doug On 06/24/2011 04:24 PM, Praveen Sripati wrote: > > Hi, > > Why is the checksum done for io.bytes.per.checksum (defaults to 512) > instead of the complete block at once (dfs.block.size defaults to > 67108864)? If a block is corrupt then the entire block has to be > replicated anyway. Isn't it more efficient to do the checksum for > complete block at once? >