Hi Chris, BCC'ing hdfs-dev@ since you're using CDH, moving us to cdh-user@.
You should be able to manually copy the under-replicated blocks and md5 files to a different datanode and restart it. I'm curious that you're having this issue though, I haven't encountered it before. Can you send your NN logs to me, either as an attachment or a file drop? Also, what version of CDH are you using? Here are also a few ideas for things you can check: * There are a number of block replication stats available in the NN /jmx webui, e.g. PendingReplicationBlocks, UnderReplicatedBlocks, ScheduledReplicationBlocks. This will let you know if the NN is at least attempting to replicate your blocks (pending and scheduled). * Look in the NN log for BlockPlacementPolicy errors. It'll help to enable DEBUG level output here. Best, Andrew On Thu, Jan 9, 2014 at 10:46 AM, Cooper Bethea <co...@siftscience.com>wrote: > I have only 9 under-replicated blocks on the cluster, and it is very > important that I restore my cluster to a fully-replicated state. Is there a > way I can manually copy these blocks to other datanodes, or perhaps new > datanodes? > > > On Thu, Jan 9, 2014 at 10:34 AM, Cooper Bethea <co...@siftscience.com > >wrote: > > > Chris, Steve, thanks for responding. > > > > Overnight I ran a script to bump replication, then lower it, as Chris > > suggested. There has been no effect--all underreplicated blocks still > have > > only 1 replica. > > > > Steve, I am running the rebalancer. > > > > > > On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran <ste...@hortonworks.com > >wrote: > > > >> are you running the rebalancer? > >> > >> > >> On 9 January 2014 04:40, Chris Embree <cemb...@gmail.com> wrote: > >> > >> > It's too bad that this hasn't been corrected in HDFS 2.0.... I have a > >> > script that I run several times a day to ensure that blocks are > >> replicated > >> > correctly. Here a link to an article about it: > >> > http://dataforprofit.com/?p=427 > >> > > >> > > >> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com> > >> > wrote: > >> > > >> > > Following on--is there a way that I can forcibly replicate these > >> blocks, > >> > > perhaps by rsyncing the underlying files to other datanodes? As you > >> might > >> > > imagine under-replicated data makes me very uneasy. > >> > > > >> > > > >> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea < > co...@siftscience.com > >> > > >wrote: > >> > > > >> > > > Hi HDFS developers, > >> > > > > >> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am > >> > running. > >> > > 9 > >> > > > blocks in the cluster are persistently reported to be > >> under-replicated > >> > > per > >> > > > "hdfs fsck". > >> > > > > >> > > > I am able to fetch the files that contain these blocks, so I know > >> that > >> > > the > >> > > > data is there, but for some reason replication is not taking > >> effect. In > >> > > > hopes of getting the cluster to notice that there were > >> under-replicated > >> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication > >> > factor, > >> > > > but the cluster continues to report a single replica for each of > >> these > >> > > > blocks. When viewing master logs I see that the replication factor > >> > change > >> > > > is respected, but there are no messages that refer to the > >> > > under-replicated > >> > > > blocks. > >> > > > > >> > > > Thanks for your time. Please let me know what I can do to > >> investigate > >> > > > further. > >> > > > > >> > > > >> > > >> > >> -- > >> CONFIDENTIALITY NOTICE > >> NOTICE: This message is intended for the use of the individual or entity > >> to > >> which it is addressed and may contain information that is confidential, > >> privileged and exempt from disclosure under applicable law. If the > reader > >> of this message is not the intended recipient, you are hereby notified > >> that > >> any printing, copying, dissemination, distribution, disclosure or > >> forwarding of this communication is strictly prohibited. If you have > >> received this communication in error, please contact the sender > >> immediately > >> and delete it from your system. Thank You. > >> > > > > >