Re: Number of bytes per checksum

Doug Cutting Fri, 24 Jun 2011 07:50:58 -0700

A smaller checksum interval decreases the overhead for random access.
If one seeks to a random location, one must, on average, read and
checksum an extra checksumInterval/2 bytes.  512 was chosen as a value
that, with four-byte CRC32, reduced the impact on small seeks while
increasing the storage and transmission overheads by less than 1%.

Increasing the interval would not likely reduce the computation
significantly, as the same number of bytes are checksummed regardless,
but it might optimize i/o operations in some cases without harming
random access much if this were increased to 8k or larger.

Doug

On 06/24/2011 04:24 PM, Praveen Sripati wrote:
> 
> Hi,
> 
> Why is the checksum done for io.bytes.per.checksum (defaults to 512)
> instead of the complete block at once (dfs.block.size defaults to
> 67108864)? If a block is corrupt then the entire block has to be
> replicated anyway. Isn't it more efficient to do the checksum for
> complete block at once?
>

Re: Number of bytes per checksum

Reply via email to