As I mentioned earlier this flag is now ignored. On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch <slavi...@gmail.com> wrote:
> Shuffling a 1tb set of keys and values (aka sort by key) results in about > 500gb of io to disk if compression is enabled. Is there any way to > eliminate shuffling causing io? > > On Fri, Apr 1, 2016, 6:32 PM Reynold Xin <r...@databricks.com> wrote: > >> Michael - I'm not sure if you actually read my email, but spill has >> nothing to do with the shuffle files on disk. It was for the partitioning >> (i.e. sorting) process. If that flag is off, Spark will just run out of >> memory when data doesn't fit in memory. >> >> >> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch <slavi...@gmail.com> >> wrote: >> >>> RAMdisk is a fine interim step but there is a lot of layers eliminated >>> by keeping things in memory unless there is need for spillover. At one >>> time there was support for turning off spilling. That was eliminated. >>> Why? >>> >>> >>> On Fri, Apr 1, 2016, 6:05 PM Mridul Muralidharan <mri...@gmail.com> >>> wrote: >>> >>>> I think Reynold's suggestion of using ram disk would be a good way to >>>> test if these are the bottlenecks or something else is. >>>> For most practical purposes, pointing local dir to ramdisk should >>>> effectively give you 'similar' performance as shuffling from memory. >>>> >>>> Are there concerns with taking that approach to test ? (I dont see >>>> any, but I am not sure if I missed something). >>>> >>>> >>>> Regards, >>>> Mridul >>>> >>>> >>>> >>>> >>>> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch <slavi...@gmail.com> >>>> wrote: >>>> > I totally disagree that it’s not a problem. >>>> > >>>> > - Network fetch throughput on 40G Ethernet exceeds the throughput of >>>> NVME >>>> > drives. >>>> > - What Spark is depending on is Linux’s IO cache as an effective >>>> buffer pool >>>> > This is fine for small jobs but not for jobs with datasets in the >>>> TB/node >>>> > range. >>>> > - On larger jobs flushing the cache causes Linux to block. >>>> > - On a modern 56-hyperthread 2-socket host the latency caused by >>>> multiple >>>> > executors writing out to disk increases greatly. >>>> > >>>> > I thought the whole point of Spark was in-memory computing? It’s in >>>> fact >>>> > in-memory for some things but use spark.local.dir as a buffer pool of >>>> > others. >>>> > >>>> > Hence, the performance of Spark is gated by the performance of >>>> > spark.local.dir, even on large memory systems. >>>> > >>>> > "Currently it is not possible to not write shuffle files to disk.” >>>> > >>>> > What changes >would< make it possible? >>>> > >>>> > The only one that seems possible is to clone the shuffle service and >>>> make it >>>> > in-memory. >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > On Apr 1, 2016, at 4:57 PM, Reynold Xin <r...@databricks.com> wrote: >>>> > >>>> > spark.shuffle.spill actually has nothing to do with whether we write >>>> shuffle >>>> > files to disk. Currently it is not possible to not write shuffle >>>> files to >>>> > disk, and typically it is not a problem because the network fetch >>>> throughput >>>> > is lower than what disks can sustain. In most cases, especially with >>>> SSDs, >>>> > there is little difference between putting all of those in memory and >>>> on >>>> > disk. >>>> > >>>> > However, it is becoming more common to run Spark on a few number of >>>> beefy >>>> > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into >>>> improving >>>> > performance for those. Meantime, you can setup local ramdisks on each >>>> node >>>> > for shuffle writes. >>>> > >>>> > >>>> > >>>> > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <slavi...@gmail.com >>>> > >>>> > wrote: >>>> >> >>>> >> Hello; >>>> >> >>>> >> I’m working on spark with very large memory systems (2TB+) and >>>> notice that >>>> >> Spark spills to disk in shuffle. Is there a way to force spark to >>>> stay in >>>> >> memory when doing shuffle operations? The goal is to keep the >>>> shuffle data >>>> >> either in the heap or in off-heap memory (in 1.6.x) and never touch >>>> the IO >>>> >> subsystem. I am willing to have the job fail if it runs out of RAM. >>>> >> >>>> >> spark.shuffle.spill true is deprecated in 1.6 and does not work in >>>> >> Tungsten sort in 1.5.x >>>> >> >>>> >> "WARN UnsafeShuffleManager: spark.shuffle.spill was set to false, >>>> but this >>>> >> is ignored by the tungsten-sort shuffle manager; its optimized >>>> shuffles will >>>> >> continue to spill to disk when necessary.” >>>> >> >>>> >> If this is impossible via configuration changes what code changes >>>> would be >>>> >> needed to accomplish this? >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> --------------------------------------------------------------------- >>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> >> For additional commands, e-mail: user-h...@spark.apache.org >>>> >> >>>> > >>>> > >>>> >>> -- >>> Michael Slavitch >>> 62 Renfrew Ave. >>> Ottawa Ontario >>> K1S 1Z5 >>> >> >> -- > Michael Slavitch > 62 Renfrew Ave. > Ottawa Ontario > K1S 1Z5 > -- Michael Slavitch 62 Renfrew Ave. Ottawa Ontario K1S 1Z5