Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Michael Shtelma Wed, 28 Mar 2018 02:01:35 -0700

Hi,

this property will be used in YARN mode only by the driver.
Executors will use the properties coming from YARN for storing temporary
files.



Best,
Michael

On Wed, Mar 28, 2018 at 7:37 AM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
>
> As per documentation in: https://spark.apache.org/
> docs/latest/configuration.html
>
>
> spark.local.dir /tmp Directory to use for "scratch" space in Spark,
> including map output files and RDDs that get stored on disk. This should be
> on a fast, local disk in your system. It can also be a comma-separated list
> of multiple directories on different disks. NOTE: In Spark 1.0 and later
> this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
> LOCAL_DIRS (YARN) environment variables set by the cluster manager.
>
> Regards,
> Gourav Sengupta
>
>
>
>
>
> On Mon, Mar 26, 2018 at 8:28 PM, Michael Shtelma <mshte...@gmail.com>
> wrote:
>
>> Hi Keith,
>>
>> Thanks  for the suggestion!
>> I have solved this already.
>> The problem was, that the yarn process was not responding to
>> start/stop commands and has not applied my configuration changes.
>> I have killed it and restarted my cluster, and after that yarn has
>> started using yarn.nodemanager.local-dirs parameter defined in
>> yarn-site.xml.
>> After this change, -Djava.io.tmpdir for the spark executor was set
>> correctly,  according to yarn.nodemanager.local-dirs parameter.
>>
>> Best,
>> Michael
>>
>>
>> On Mon, Mar 26, 2018 at 9:15 PM, Keith Chapman <keithgchap...@gmail.com>
>> wrote:
>> > Hi Michael,
>> >
>> > sorry for the late reply. I guess you may have to set it through the
>> hdfs
>> > core-site.xml file. The property you need to set is "hadoop.tmp.dir"
>> which
>> > defaults to "/tmp/hadoop-${user.name}"
>> >
>> > Regards,
>> > Keith.
>> >
>> > http://keith-chapman.com
>> >
>> > On Mon, Mar 19, 2018 at 1:05 PM, Michael Shtelma <mshte...@gmail.com>
>> wrote:
>> >>
>> >> Hi Keith,
>> >>
>> >> Thank you for the idea!
>> >> I have tried it, so now the executor command is looking in the
>> following
>> >> way :
>> >>
>> >> /bin/bash -c /usr/java/latest//bin/java -server -Xmx51200m
>> >> '-Djava.io.tmpdir=my_prefered_path'
>> >>
>> >> -Djava.io.tmpdir=/tmp/hadoop-msh/nm-local-dir/usercache/msh/
>> appcache/application_1521110306769_0041/container_1521110306
>> 769_0041_01_000004/tmp
>> >>
>> >> JVM is using the second Djava.io.tmpdir parameter and writing
>> >> everything to the same directory as before.
>> >>
>> >> Best,
>> >> Michael
>> >> Sincerely,
>> >> Michael Shtelma
>> >>
>> >>
>> >> On Mon, Mar 19, 2018 at 6:38 PM, Keith Chapman <
>> keithgchap...@gmail.com>
>> >> wrote:
>> >> > Can you try setting spark.executor.extraJavaOptions to have
>> >> > -Djava.io.tmpdir=someValue
>> >> >
>> >> > Regards,
>> >> > Keith.
>> >> >
>> >> > http://keith-chapman.com
>> >> >
>> >> > On Mon, Mar 19, 2018 at 10:29 AM, Michael Shtelma <
>> mshte...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Keith,
>> >> >>
>> >> >> Thank you for your answer!
>> >> >> I have done this, and it is working for spark driver.
>> >> >> I would like to make something like this for the executors as well,
>> so
>> >> >> that the setting will be used on all the nodes, where I have
>> executors
>> >> >> running.
>> >> >>
>> >> >> Best,
>> >> >> Michael
>> >> >>
>> >> >>
>> >> >> On Mon, Mar 19, 2018 at 6:07 PM, Keith Chapman
>> >> >> <keithgchap...@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Michael,
>> >> >> >
>> >> >> > You could either set spark.local.dir through spark conf or
>> >> >> > java.io.tmpdir
>> >> >> > system property.
>> >> >> >
>> >> >> > Regards,
>> >> >> > Keith.
>> >> >> >
>> >> >> > http://keith-chapman.com
>> >> >> >
>> >> >> > On Mon, Mar 19, 2018 at 9:59 AM, Michael Shtelma <
>> mshte...@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi everybody,
>> >> >> >>
>> >> >> >> I am running spark job on yarn, and my problem is that the
>> >> >> >> blockmgr-*
>> >> >> >> folders are being created under
>> >> >> >> /tmp/hadoop-msh/nm-local-dir/usercache/msh/appcache/applicat
>> ion_id/*
>> >> >> >> The size of this folder can grow to a significant size and does
>> not
>> >> >> >> really fit into /tmp file system for one job, which makes a real
>> >> >> >> problem for my installation.
>> >> >> >> I have redefined hadoop.tmp.dir in core-site.xml and
>> >> >> >> yarn.nodemanager.local-dirs in yarn-site.xml pointing to other
>> >> >> >> location and expected that the block manager will create the
>> files
>> >> >> >> there and not under /tmp, but this is not the case. The files are
>> >> >> >> created under /tmp.
>> >> >> >>
>> >> >> >> I am wondering if there is a way to make spark not use /tmp at
>> all
>> >> >> >> and
>> >> >> >> configure it to create all the files somewhere else ?
>> >> >> >>
>> >> >> >> Any assistance would be greatly appreciated!
>> >> >> >>
>> >> >> >> Best,
>> >> >> >> Michael
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------
>> ---------
>> >> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Re: Running out of space on /tmp file system while running spark job on yarn because of size of blockmgr folder

Reply via email to