On Jan 5, 2011, at 4:03 PM, Milind Bhandarkar wrote: > I agree with Jay B. Checksumming is usually the culprit for high CPU on > clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for > 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to > generate 1 ext3 checksum block per DFS block will speedup read/write without > any loss of reliability. >
But (speaking to non-MapReduce users) make sure this doesn't adversely affect your usage patterns. If your checksum size is 64KB, then the minimum read size is 64KB. So, an extremely unlucky read of 2 bytes might cause 128KB+overhead to travel across the network. Know thine usage scenarios. Brian
smime.p7s
Description: S/MIME cryptographic signature