Re: Tracking Replication errors

Dhruba Borthakur Wed, 09 Sep 2009 20:25:46 -0700

when a block is being received by a datanode (either because of a
replication request or from a client write), the datanode verifies crc.
Also, the there is a thread in the datanode that periodically verifies crc
of existing blocks.


dhruba


On Wed, Sep 9, 2009 at 7:27 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

> Hey everyone,
>
> We're going through a review of our usage of HDFS (it's a good thing! -
> we're trying to get "official").  One reviewer asked a good question that I
> don't know the answer too - could you help?  To quote,
>
> "What steps do you take to ensure the block rebalancing produces
> non-corrupted files?  Do you have to wait 2 weeks before you discover this?"
>
> I believe the correct answer is:
>
> """
> When a block is replicated from one node to another, only the resulting
> block size is checked.  The checksums on the source and destination are not
> compared.  Therefore, if there's any corruption that occurs, it would take
> until the next block verification to detect it.
> """
>
> If you look at TCP error rates and random memory corruptions, it wouldn't
> be surprising to see silent errors in copying between nodes, especially on
> multi-hundred-TB or PB scale installs.
>
> Any comments?
>
> Brian

Re: Tracking Replication errors

Reply via email to