I found the problem is, for each application, the Spark worker node saves the corresponding std output and std err under ./spark/work/appid, where appid is the id of the application. If I ran several applications in a row, it will out of space. In my case, the disk usage under ./spark/work/ is as follows: 1689784 ./app-20150208203033-0002/01689788 ./app-20150208203033-000240324 ./driver-20150208180505-00011691400 ./app-20150208180509-0001/01691404 ./app-20150208180509-000140316 ./driver-20150208203030-000240320 ./driver-20150208173156-00001649876 ./app-20150208173200-0000/01649880 ./app-20150208173200-00005152036 . Any suggestion how to resolve it? Thanks. Ey-Chih ChowFrom: eyc...@hotmail.com To: gen.tan...@gmail.com CC: user@spark.apache.org Subject: RE: no space left at worker node Date: Sun, 8 Feb 2015 15:25:43 -0800
By this way, the input and output paths of the job are all in s3. I did not use paths of hdfs as input or output. Best regards, Ey-Chih Chow From: eyc...@hotmail.com To: gen.tan...@gmail.com CC: user@spark.apache.org Subject: RE: no space left at worker node Date: Sun, 8 Feb 2015 14:57:15 -0800 Hi Gen, Thanks. I save my logs in a file under /var/log. This is the only place to save data. Will the problem go away if I use a better machine? Best regards, Ey-Chih Chow Date: Sun, 8 Feb 2015 23:32:27 +0100 Subject: Re: no space left at worker node From: gen.tan...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org Hi, I am sorry that I made a mistake. r3.large has only one SSD which has been mounted in /mnt. Therefore this is no /dev/sdc.In fact, the problem is that there is no space in the under / directory. So you should check whether your application write data under this directory(for instance, save file in file:///). If not, you can use watch du -sh to during the running time to figure out which directory is expanding. Normally, only /mnt directory which is supported by SSD is expanding significantly, because the data of hdfs is saved here. Then you can find the directory which caused no space problem and find out the specific reason. CheersGen On Sun, Feb 8, 2015 at 10:45 PM, ey-chih chow <eyc...@hotmail.com> wrote: Thanks Gen. How can I check if /dev/sdc is well mounted or not? In general, the problem shows up when I submit the second or third job. The first job I submit most likely will succeed. Ey-Chih Chow Date: Sun, 8 Feb 2015 18:18:03 +0100 Subject: Re: no space left at worker node From: gen.tan...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org Hi, In fact, /dev/sdb is /dev/xvdb. It seems that there is no problem about double mount. However, there is no information about /mnt2. You should check whether /dev/sdc is well mounted or not.The reply of Micheal is good solution about this type of problem. You can check his site. CheersGen On Sun, Feb 8, 2015 at 5:53 PM, ey-chih chow <eyc...@hotmail.com> wrote: Gen, Thanks for your information. The content of /etc/fstab at the worker node (r3.large) is: #LABEL=/ / ext4 defaults,noatime 1 1tmpfs /dev/shm tmpfs defaults 0 0devpts /dev/pts devpts gid=5,mode=620 0 0sysfs /sys sysfs defaults 0 0proc /proc proc defaults 0 0/dev/sdb /mnt auto defaults,noatime,nodiratime,comment=cloudconfig 0 0/dev/sdc /mnt2 auto defaults,noatime,nodiratime,comment=cloudconfig 0 0 There is no entry of /dev/xvdb. Ey-Chih Chow Date: Sun, 8 Feb 2015 12:09:37 +0100 Subject: Re: no space left at worker node From: gen.tan...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org Hi, I fact, I met this problem before. it is a bug of AWS. Which type of machine do you use? If I guess well, you can check the file /etc/fstab. There would be a double mount of /dev/xvdb.If yes, you should1. stop hdfs2. umount /dev/xvdb at / 3. restart hdfs Hope this could be helpful.CheersGen On Sun, Feb 8, 2015 at 8:16 AM, ey-chih chow <eyc...@hotmail.com> wrote: Hi, I submitted a spark job to an ec2 cluster, using spark-submit. At a worker node, there is an exception of 'no space left on device' as follows. ========================================== 15/02/08 01:53:38 ERROR logging.FileAppender: Error writing stream to file /root/spark/work/app-20150208014557-0003/0/stdout java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.spark.util.logging.FileAppender.appendToFile(FileAppender.scala:92) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:72) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) =========================================== The command df showed the following information at the worker node: Filesystem 1K-blocks Used Available Use% Mounted on /dev/xvda1 8256920 8256456 0 100% / tmpfs 7752012 0 7752012 0% /dev/shm /dev/xvdb 30963708 1729652 27661192 6% /mnt Does anybody know how to fix this? Thanks. Ey-Chih Chow -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/no-space-left-at-worker-node-tp21545.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org