Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

Reynold Xin Fri, 01 Apr 2016 14:23:56 -0700

Sure - feel free to totally disagree.


On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com> wrote:

> I totally disagree that it’s not a problem.
>
> - Network fetch throughput on 40G Ethernet exceeds the throughput of NVME
> drives.
> - What Spark is depending on is Linux’s IO cache as an effective buffer
> pool  This is fine for small jobs but not for jobs with datasets in the
> TB/node range.
> - On larger jobs flushing the cache causes Linux to block.
> - On a modern 56-hyperthread 2-socket host the latency caused by multiple
> executors writing out to disk increases greatly.
>
> I thought the whole point of Spark was in-memory computing?  It’s in fact
> in-memory for some things but  use spark.local.dir as a buffer pool of
> others.
>
> *Hence, the performance of  Spark is gated by the performance of
> spark.local.dir, even on large memory systems.*
>
> "Currently it is not possible to not write shuffle files to disk.”
>
> What changes >would< make it possible?
>
> The only one that seems possible is to clone the shuffle service and make
> it in-memory.
>
>
>
>
>
> On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> wrote:
>
> spark.shuffle.spill actually has nothing to do with whether we write
> shuffle files to disk. Currently it is not possible to not write shuffle
> files to disk, and typically it is not a problem because the network fetch
> throughput is lower than what disks can sustain. In most cases, especially
> with SSDs, there is little difference between putting all of those in
> memory and on disk.
>
> However, it is becoming more common to run Spark on a few number of beefy
> nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into
> improving performance for those. Meantime, you can setup local ramdisks on
> each node for shuffle writes.
>
>
>
> On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <slavi...@gmail.com>
> wrote:
>
>> Hello;
>>
>> I’m working on spark with very large memory systems (2TB+) and notice
>> that Spark spills to disk in shuffle.  Is there a way to force spark to
>> stay in memory when doing shuffle operations?   The goal is to keep the
>> shuffle data either in the heap or in off-heap memory (in 1.6.x) and never
>> touch the IO subsystem.  I am willing to have the job fail if it runs out
>> of RAM.
>>
>> spark.shuffle.spill true  is deprecated in 1.6 and does not work in
>> Tungsten sort in 1.5.x
>>
>> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, but
>> this is ignored by the tungsten-sort shuffle manager; its optimized
>> shuffles will continue to spill to disk when necessary.”
>>
>> If this is impossible via configuration changes what code changes would
>> be needed to accomplish this?
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

Reply via email to