Re: Error: No space left on device

Xiangrui Meng Wed, 16 Jul 2014 16:30:07 -0700

For ALS, I would recommend repartitioning the ratings to match the
number of CPU cores or even less. ALS is not computation heavy for
small k but communication heavy. Having small number of partitions may
help. For EC2 clusters, we use /mnt/spark and /mnt2/spark as the
default local directory because they are local hard drives. Did your
last run of ALS on MovieLens 10M-100K with the default settings
succeed? -Xiangrui


On Wed, Jul 16, 2014 at 8:00 AM, Chris DuBois <chris.dub...@gmail.com> wrote:
> Hi Xiangrui,
>
> I accidentally did not send df -i for the master node. Here it is at the
> moment of failure:
>
> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
> /dev/xvda1            524288  280938  243350   54% /
> tmpfs                3845409       1 3845408    1% /dev/shm
> /dev/xvdb            10002432    1027 10001405    1% /mnt
> /dev/xvdf            10002432      16 10002416    1% /mnt2
> /dev/xvdv            524288000      13 524287987    1% /vol
>
> I am using default settings now, but is there a way to make sure that the
> proper directories are being used? How many blocks/partitions do you
> recommend?
>
> Chris
>
>
> On Wed, Jul 16, 2014 at 1:09 AM, Chris DuBois <chris.dub...@gmail.com>
> wrote:
>>
>> Hi Xiangrui,
>>
>> Here is the result on the master node:
>> $ df -i
>> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
>> /dev/xvda1            524288  273997  250291   53% /
>> tmpfs                1917974       1 1917973    1% /dev/shm
>> /dev/xvdv            524288000      30 524287970    1% /vol
>>
>> I have reproduced the error while using the MovieLens 10M data set on a
>> newly created cluster.
>>
>> Thanks for the help.
>> Chris
>>
>>
>> On Wed, Jul 16, 2014 at 12:22 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>>
>>> Hi Chris,
>>>
>>> Could you also try `df -i` on the master node? How many
>>> blocks/partitions did you set?
>>>
>>> In the current implementation, ALS doesn't clean the shuffle data
>>> because the operations are chained together. But it shouldn't run out
>>> of disk space on the MovieLens dataset, which is small. spark-ec2
>>> script sets /mnt/spark and /mnt/spark2 as the local.dir by default, I
>>> would recommend leaving this setting as the default value.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Wed, Jul 16, 2014 at 12:02 AM, Chris DuBois <chris.dub...@gmail.com>
>>> wrote:
>>> > Thanks for the quick responses!
>>> >
>>> > I used your final -Dspark.local.dir suggestion, but I see this during
>>> > the
>>> > initialization of the application:
>>> >
>>> > 14/07/16 06:56:08 INFO storage.DiskBlockManager: Created local
>>> > directory at
>>> > /vol/spark-local-20140716065608-7b2a
>>> >
>>> > I would have expected something in /mnt/spark/.
>>> >
>>> > Thanks,
>>> > Chris
>>> >
>>> >
>>> >
>>> > On Tue, Jul 15, 2014 at 11:44 PM, Chris Gore <cdg...@cdgore.com> wrote:
>>> >>
>>> >> Hi Chris,
>>> >>
>>> >> I've encountered this error when running Spark’s ALS methods too.  In
>>> >> my
>>> >> case, it was because I set spark.local.dir improperly, and every time
>>> >> there
>>> >> was a shuffle, it would spill many GB of data onto the local drive.
>>> >> What
>>> >> fixed it was setting it to use the /mnt directory, where a network
>>> >> drive is
>>> >> mounted.  For example, setting an environmental variable:
>>> >>
>>> >> export SPACE=$(mount | grep mnt | awk '{print $3"/spark/"}' | xargs |
>>> >> sed
>>> >> 's/ /,/g’)
>>> >>
>>> >> Then adding -Dspark.local.dir=$SPACE or simply
>>> >> -Dspark.local.dir=/mnt/spark/,/mnt2/spark/ when you run your driver
>>> >> application
>>> >>
>>> >> Chris
>>> >>
>>> >> On Jul 15, 2014, at 11:39 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>> >>
>>> >> > Check the number of inodes (df -i). The assembly build may create
>>> >> > many
>>> >> > small files. -Xiangrui
>>> >> >
>>> >> > On Tue, Jul 15, 2014 at 11:35 PM, Chris DuBois
>>> >> > <chris.dub...@gmail.com>
>>> >> > wrote:
>>> >> >> Hi all,
>>> >> >>
>>> >> >> I am encountering the following error:
>>> >> >>
>>> >> >> INFO scheduler.TaskSetManager: Loss was due to java.io.IOException:
>>> >> >> No
>>> >> >> space
>>> >> >> left on device [duplicate 4]
>>> >> >>
>>> >> >> For each slave, df -h looks roughtly like this, which makes the
>>> >> >> above
>>> >> >> error
>>> >> >> surprising.
>>> >> >>
>>> >> >> Filesystem            Size  Used Avail Use% Mounted on
>>> >> >> /dev/xvda1            7.9G  4.4G  3.5G  57% /
>>> >> >> tmpfs                 7.4G  4.0K  7.4G   1% /dev/shm
>>> >> >> /dev/xvdb              37G  3.3G   32G  10% /mnt
>>> >> >> /dev/xvdf              37G  2.0G   34G   6% /mnt2
>>> >> >> /dev/xvdv             500G   33M  500G   1% /vol
>>> >> >>
>>> >> >> I'm on an EC2 cluster (c3.xlarge + 5 x m3) that I launched using
>>> >> >> the
>>> >> >> spark-ec2 scripts and a clone of spark from today. The job I am
>>> >> >> running
>>> >> >> closely resembles the collaborative filtering example. This issue
>>> >> >> happens
>>> >> >> with the 1M version as well as the 10 million rating version of the
>>> >> >> MovieLens dataset.
>>> >> >>
>>> >> >> I have seen previous questions, but they haven't helped yet. For
>>> >> >> example, I
>>> >> >> tried setting the Spark tmp directory to the EBS volume at /vol/,
>>> >> >> both
>>> >> >> by
>>> >> >> editing the spark conf file (and copy-dir'ing it to the slaves) as
>>> >> >> well
>>> >> >> as
>>> >> >> through the SparkConf. Yet I still get the above error. Here is my
>>> >> >> current
>>> >> >> Spark config below. Note that I'm launching via
>>> >> >> ~/spark/bin/spark-submit.
>>> >> >>
>>> >> >> conf = SparkConf()
>>> >> >> conf.setAppName("RecommendALS").set("spark.local.dir",
>>> >> >> "/vol/").set("spark.executor.memory",
>>> >> >> "7g").set("spark.akka.frameSize",
>>> >> >> "100").setExecutorEnv("SPARK_JAVA_OPTS", "
>>> >> >> -Dspark.akka.frameSize=100")
>>> >> >> sc = SparkContext(conf=conf)
>>> >> >>
>>> >> >> Thanks for any advice,
>>> >> >> Chris
>>> >> >>
>>> >>
>>> >
>>
>>
>

Re: Error: No space left on device

Reply via email to