Ognen - just so I understand. The issue is that there weren't enough
inodes and this was causing a "No space left on device" error? Is that
correct? If so, that's good to know because it's definitely counter
intuitive.

On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski
<og...@nengoiksvelzud.com> wrote:
> I would love to work on this (and other) stuff if I can bother someone with
> questions offline or on a dev mailing list.
> Ognen
>
>
> On 3/23/14, 10:04 PM, Aaron Davidson wrote:
>
> Thanks for bringing this up, 100% inode utilization is an issue I haven't
> seen raised before and this raises another issue which is not on our current
> roadmap for state cleanup (cleaning up data which was not fully cleaned up
> from a crashed process).
>
>
> On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski
> <og...@plainvanillagames.com> wrote:
>>
>> Bleh, strike that, one of my slaves was at 100% inode utilization on the
>> file system. It was /tmp/spark* leftovers that apparently did not get
>> cleaned up properly after failed or interrupted jobs.
>> Mental note - run a cron job on all slaves and master to clean up
>> /tmp/spark* regularly.
>>
>> Thanks (and sorry for the noise)!
>> Ognen
>>
>>
>> On 3/23/14, 9:52 PM, Ognen Duzlevski wrote:
>>
>> Aaron, thanks for replying. I am very much puzzled as to what is going on.
>> A job that used to run on the same cluster is failing with this mysterious
>> message about not having enough disk space when in fact I can see through
>> "watch df -h" that the free space is always hovering around 3+GB on the disk
>> and the free inodes are at 50% (this is on master). I went through each
>> slave and the spark/work/app*/stderr and stdout and spark/logs/*out files
>> and no mention of too many open files failures on any of the slaves nor on
>> the master :(
>>
>> Thanks
>> Ognen
>>
>> On 3/23/14, 8:38 PM, Aaron Davidson wrote:
>>
>> By default, with P partitions (for both the pre-shuffle stage and
>> post-shuffle), there are P^2 files created. With
>> spark.shuffle.consolidateFiles turned on, we would instead create only P
>> files. Disk space consumption is largely unaffected, however. by the number
>> of partitions unless each partition is particularly small.
>>
>> You might look at the actual executors' logs, as it's possible that this
>> error was caused by an earlier exception, such as "too many open files".
>>
>>
>> On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski
>> <og...@plainvanillagames.com> wrote:
>>>
>>> On 3/23/14, 5:49 PM, Matei Zaharia wrote:
>>>
>>> You can set spark.local.dir to put this data somewhere other than /tmp if
>>> /tmp is full. Actually it's recommended to have multiple local disks and set
>>> to to a comma-separated list of directories, one per disk.
>>>
>>> Matei, does the number of tasks/partitions in a transformation influence
>>> something in terms of disk space consumption? Or inode consumption?
>>>
>>> Thanks,
>>> Ognen
>>
>>
>>
>> --
>> "A distributed system is one in which the failure of a computer you didn't
>> even know existed can render your own computer unusable"
>> -- Leslie Lamport
>
>
>
> --
> "No matter what they ever do to us, we must always act for the love of our
> people and the earth. We must not react out of hatred against those who have
> no sense."
> -- John Trudell

Reply via email to