Hello,

We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are 
using default HDFS block size.

We have noticed that disks of slaves are almost full. From name node’s status 
page (namenode:50070), we could see that disks of live nodes are 90% full and 
DFS Used% in cluster summary page  is ~1TB.

However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB 
number looks to be correct because we keep only few Hive tables and hadoop’s 
/tmp (distributed cache and job outputs) in HDFS. All other data is cleaned up. 
I cross-checked this from hadoop dfs -ls. Also I think that there is no 
internal fragmentation because the files in our Hive tables are well-chopped in 
~50MB chunks. Here are last few lines of hadoop fsck / -files -blocks

Status: HEALTHY
 Total size:    38086441332 B
 Total dirs:    232
 Total files:   802
 Total blocks (validated):      796 (avg. block size 47847288 B)
 Minimally replicated blocks:   796 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       6 (0.75376886 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     3.0439699
 Corrupt blocks:                0
 Missing replicas:              6 (0.24762692 %)
 Number of data-nodes:          9
 Number of racks:               1
FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds


My question is that why disks of slaves are getting full even though there are 
only few files in DFS?

Reply via email to