HDFS file system size issue

Saumitra Sun, 13 Apr 2014 12:55:17 -0700

Hello,

We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are 
using default HDFS block size.


We have noticed that disks of slaves are almost full. From name node’s status 
page (namenode:50070), we could see that disks of live nodes are 90% full and 
DFS Used% in cluster summary page  is ~1TB.

However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB 
number looks to be correct because we keep only few Hive tables and hadoop’s 
/tmp (distributed cache and job outputs) in HDFS. All other data is cleaned up. 
I cross-checked this from hadoop dfs -ls. Also I think that there is no 
internal fragmentation because the files in our Hive tables are well-chopped in 
~50MB chunks. Here are last few lines of hadoop fsck / -files -blocks

Status: HEALTHY
 Total size:    38086441332 B
 Total dirs:    232
 Total files:   802
 Total blocks (validated):      796 (avg. block size 47847288 B)
 Minimally replicated blocks:   796 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       6 (0.75376886 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     3.0439699
 Corrupt blocks:                0
 Missing replicas:              6 (0.24762692 %)
 Number of data-nodes:          9
 Number of racks:               1
FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds


My question is that why disks of slaves are getting full even though there are 
only few files in DFS?

HDFS file system size issue

Reply via email to