Aseem,
Regd over-replication, it is mostly app related issue as Alex mentioned.
But if you are concerned about under-replicated blocks in fsck output :
These blocks should not stay under-replicated if you have enough nodes
and enough space on them (check NameNode webui).
Try grep-ing for one of the blocks in NameNode log (and datnode logs as
well, since you have just 3 nodes).
Raghu.
Puri, Aseem wrote:
Alex,
Ouput of $ bin/hadoop fsck / command after running HBase data insert
command in a table is:
.....
.....
.....
.....
.....
/hbase/test/903188508/tags/info/4897652949308499876: Under replicated
blk_-5193
695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/data: Under
replicated
blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/index: Under
replicated
blk_3934493034551838567_3132. Target Replicas is 3 but found 1
replica(s).
.
/user/HadoopAdmin/hbase table.doc: Under replicated
blk_4339521803948458144_103
1. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/bin.doc: Under replicated
blk_-3661765932004150973_1030
. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file01.txt: Under replicated
blk_2744169131466786624_10
01. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file02.txt: Under replicated
blk_2021956984317789924_10
02. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/test.txt: Under replicated
blk_-3062256167060082648_100
4. Target Replicas is 3 but found 2 replica(s).
...
/user/HadoopAdmin/output/part-00000: Under replicated
blk_8908973033976428484_1
010. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
Total size: 48510226 B
Total dirs: 492
Total files: 439 (Files currently being written: 2)
Total blocks (validated): 401 (avg. block size 120973 B) (Total
open file
blocks (not validated): 2)
Minimally replicated blocks: 401 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 399 (99.50124 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.3117207
Corrupt blocks: 0
Missing replicas: 675 (128.327 %)
Number of data-nodes: 2
Number of racks: 1
The filesystem under path '/' is HEALTHY
Please tell what is wrong.
Aseem
-----Original Message-----
From: Alex Loddengaard [mailto:[email protected]]
Sent: Friday, April 10, 2009 11:04 PM
To: [email protected]
Subject: Re: More Replication on dfs
Aseem,
How are you verifying that blocks are not being replicated? Have you
ran
fsck? *bin/hadoop fsck /*
I'd be surprised if replication really wasn't happening. Can you run
fsck
and pay attention to "Under-replicated blocks" and "Mis-replicated
blocks?"
In fact, can you just copy-paste the output of fsck?
Alex
On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
<[email protected]>wrote:
Hi
I also tried the command $ bin/hadoop balancer. But still the
same problem.
Aseem
-----Original Message-----
From: Puri, Aseem [mailto:[email protected]]
Sent: Friday, April 10, 2009 11:18 AM
To: [email protected]
Subject: RE: More Replication on dfs
Hi Alex,
Thanks for sharing your knowledge. Till now I have three
machines and I have to check the behavior of Hadoop so I want
replication factor should be 2. I started my Hadoop server with
replication factor 3. After that I upload 3 files to implement word
count program. But as my all files are stored on one machine and
replicated to other datanodes also, so my map reduce program takes
input
from one Datanode only. I want my files to be on different data node
so
to check functionality of map reduce properly.
Also before starting my Hadoop server again with replication
factor 2 I formatted all Datanodes and deleted all old data manually.
Please suggest what I should do now.
Regards,
Aseem Puri
-----Original Message-----
From: Mithila Nagendra [mailto:[email protected]]
Sent: Friday, April 10, 2009 10:56 AM
To: [email protected]
Subject: Re: More Replication on dfs
To add to the question, how does one decide what is the optimal
replication
factor for a cluster. For instance what would be the appropriate
replication
factor for a cluster consisting of 5 nodes.
Mithila
On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <[email protected]>
wrote:
Did you load any files when replication was set to 3? If so, you'll
have
to
rebalance:
<http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
r>
<
http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
er
Note that most people run HDFS with a replication factor of 3.
There
have
been cases when clusters running with a replication of 2 discovered
new
bugs, because replication is so often set to 3. That said, if you
can
do
it, it's probably advisable to run with a replication factor of 3
instead
of
2.
Alex
On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
<[email protected]
wrote:
Hi
I am a new Hadoop user. I have a small cluster with 3
Datanodes. In hadoop-site.xml values of dfs.replication property
is
2
but then also it is replicating data on 3 machines.
Please tell why is it happening?
Regards,
Aseem Puri