when a block is being received by a datanode (either because of a replication request or from a client write), the datanode verifies crc. Also, the there is a thread in the datanode that periodically verifies crc of existing blocks.
dhruba On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > Hey everyone, > > We're going through a review of our usage of HDFS (it's a good thing! - > we're trying to get "official"). One reviewer asked a good question that I > don't know the answer too - could you help? To quote, > > "What steps do you take to ensure the block rebalancing produces > non-corrupted files? Do you have to wait 2 weeks before you discover this?" > > I believe the correct answer is: > > """ > When a block is replicated from one node to another, only the resulting > block size is checked. The checksums on the source and destination are not > compared. Therefore, if there's any corruption that occurs, it would take > until the next block verification to detect it. > """ > > If you look at TCP error rates and random memory corruptions, it wouldn't > be surprising to see silent errors in copying between nodes, especially on > multi-hundred-TB or PB scale installs. > > Any comments? > > Brian