On Jan 5, 2011, at 4:03 PM, Milind Bhandarkar wrote:

> I agree with Jay B. Checksumming is usually the culprit for high CPU on 
> clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for 
> 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to 
> generate 1 ext3 checksum block per DFS block will speedup read/write without 
> any loss of reliability.
> 

But (speaking to non-MapReduce users) make sure this doesn't adversely affect 
your usage patterns.  If your checksum size is 64KB, then the minimum read size 
is 64KB.  So, an extremely unlucky read of 2 bytes might cause 128KB+overhead 
to travel across the network.

Know thine usage scenarios.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to