Are you stopping and starting data nodes often? Are your files small on average? What Hadoop version?
It looks like on startup The datanode chooses the first volume to use for the first block it writes and is round-robin from there. Are you simply adding the extra disk and changing the config? Or were both mounts there from the start? It should not fail until both are full either way. The only improvements I see in the trunk (inner class FSVolumeSet in FSDataset.java) are: * Initialize the current volume to a random index in the constructor rather than the first one. * Rather than choose by round-robin, weight the choice by free space available. This does not have to check all disks' free space each time, it can remember the values of all volumes and only update the free space of the current one under consideration during the check it currently does. On 3/16/09 5:19 AM, "Vaibhav J" <vaibh...@rediff.co.in> wrote: > > > > > _____ > > From: Vaibhav J [mailto:vaibh...@rediff.co.in] > Sent: Monday, March 16, 2009 5:46 PM > To: 'nutch-...@lucene.apache.org'; 'nutch-u...@lucene.apache.org' > Subject: Problem : data distribution is non uniform between two different > disks on datanode. > > > > > > > > > > We have 27 datanode and replication factor is 1. (data size is ~6.75 TB) > > We have specified two different disks for dfs data directory on each > datanode by using > > property dfs.data.dir in hadoop-site.xml file of conf directory. > > (value of property dfs.data.dir : /mnt/hadoop-dfs/data, > /mnt2/hadoop-dfs/data) > > > > when we are setting replication factor 2 then data distribution is biased to > first disk, > > more data is coping on /mnt/hadoop-dfs/data and after copying some > data...first disk becomes full > > and showing no available space on disk while we have enough space on second > disk (/mnt2/hadoop-dfs/data ). > > so, it is difficult to achieve replication factor 2. > > > > Data traffic is coming on second disk also (/mnt2/hadoop-dfs/data) but it > looks that > > more data is copied on fisrt disk (/mnt/hadoop-dfs/data). > > > > > > What should we do to get uniform data distribution between two different > disks on > > each datanode to achieve replication factor 2? > > > > > > Regards > > Vaibhav J. > >