Re: Spark In Memory Shuffle / 5403

Peter Rudenko Fri, 19 Oct 2018 06:39:00 -0700

Hey Peter, in SparkRDMA shuffle plugin (
https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file, to
do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
Mellanox NIC support On Demand Paging, where OS invalidates translations
which are no longer valid due to either non-present pages or mapping
changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
 ), have a try. For network intensive apps you should get better
performance.


Thanks,
Peter Rudenko

чт, 18 жовт. 2018 о 18:07 Peter Liu <peter.p...@gmail.com> пише:

> I would be very interested in the initial question here:
>
> is there a production level implementation for memory only shuffle and
> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
> storage level) as mentioned in this ticket,
> https://github.com/apache/spark/pull/5403 ?
>
> It would be a quite practical and useful option/feature. not sure what is
> the status of this ticket implementation?
>
> Thanks!
>
> Peter
>
> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ravishankar.n...@gmail.com>
> wrote:
>
>> Thanks..great info. Will try and let all know.
>>
>> Best
>>
>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <onmstes...@zoho.com>
>> wrote:
>>
>>> create the ramdisk:
>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>
>>> then point spark.local.dir to the ramdisk, which depends on your
>>> deployment strategy, for me it was through SparkConf object before passing
>>> it to SparkContext:
>>> conf.set("spark.local.dir","/mnt/spark")
>>>
>>> To validate that spark is actually using your ramdisk (by default it
>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>> directories (with date on directory name) on your ramdisk
>>>
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>> <ravishankar.n...@gmail.com <ravishankar.n...@gmail.com>>* wrote ----
>>>
>>> What are the steps to configure this? Thanks
>>>
>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>> onmstes...@zoho.com.invalid> wrote:
>>>
>>>
>>> Hi,
>>> I failed to config spark for in-memory shuffle so currently just
>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>> so everything is fast
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>>
>>>

Re: Spark In Memory Shuffle / 5403

Reply via email to