it may be that intermediate results are filling your disks and when
the jobs crash, this all gets deleted.  so it would look like you have
spare space when in reality you don't.

i would check on the file system as your jobs run and see if indeed
they are filling-up.

Miles

2009/4/16 Rakhi Khatwani <[email protected]>:
> Hi,
>    following is the output on the df command
> [r...@domu-12-31-39-00-e5-d2 conf]# df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             9.9G  4.2G  5.2G  45% /
> /dev/sdb              414G  924M  392G   1% /mnt
>
> from the o/p it seems that i have quite an amount of memory available. but i
> still get the exception :(
>
> Thanks
> Raakhi
>
> On Thu, Apr 16, 2009 at 1:18 PM, Desai, Milind B <[email protected]>wrote:
>
>> From the exception it appears that there is no space left on machine. You
>> can check using 'df'
>>
>> Thanks
>> Milind
>>
>> -----Original Message-----
>> From: Rakhi Khatwani [mailto:[email protected]]
>> Sent: Thursday, April 16, 2009 1:15 PM
>> To: [email protected]; [email protected]
>> Subject: No space left on device Exception
>>
>> Hi,
>>     I am running a map-reduce program on 6-Node ec2 cluster. and after a
>> couple of hours all my tasks gets hanged.
>>
>> so i started digging into the logs....
>>
>> there were no logs for regionserver
>> no logs for tasktracker.
>> However for jobtracker i get the following:
>>
>> 2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 9 on 50002, call
>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false,
>> true,
>> 10745) from 10.254.27.79:44222: error: java.io.IOException:
>> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException: No
>> space left on device
>>       at
>>
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>>       at
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>       at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at
>>
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>>       at
>>
>> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>>       at
>> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
>>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>>       at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>>       at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
>>       at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
>>       at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
>>       at java.io.BufferedWriter.close(BufferedWriter.java:248)
>>       at java.io.PrintWriter.close(PrintWriter.java:295)
>>       at
>>
>> org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
>>       at
>> org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
>>       at org.apache.hadoop.mapred.JobInProgress.comp
>>
>>
>>
>> following are the disk information on dfs UI
>> domU-12-31-39-00-0C-A1<
>> http://domu-12-31-39-00-0c-a1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.8321.19391.360.2
>> 94.672353 domU-12-31-39-00-16-F1<
>> http://domu-12-31-39-00-16-f1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >1In
>> Service413.380.4621.24391.670.11
>> 94.752399 domU-12-31-39-00-45-71<
>> http://domu-12-31-39-00-45-71.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >1In
>> Service413.380.6421.34391.40.16
>> 94.682303 domU-12-31-39-00-E5-D2<
>> http://domu-12-31-39-00-e5-d2.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.6621.53391.180.16
>> 94.632319 domU-12-31-39-01-64-12<
>> http://domu-12-31-39-01-64-12.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >2In
>> Service413.380.6421.24391.490.16
>> 94.712264 domU-12-31-39-01-78-D1<
>> http://domu-12-31-39-01-78-d1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.4921.24391.650.12
>> 94.741952
>>
>> I m using hadoop 0.19.0 and hbase 0.19.0
>>
>> n googling the error i came arcoss the JIRA issue
>> http://issues.apache.org/jira/browse/HADOOP-4163
>>
>> which says tht its been fixed in this version. :(
>>
>> Has anyone else come up with this exception?
>>
>> how do we check the maximum capacity for usable dfs and non usable dfs.
>> Thanks
>> Raakhi,
>>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply via email to