Error running spark-sql-perf version 0.3.2 against Spark 1.6

2016-04-27 Thread Michael Slavitch
Hello; I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark 1.6, I get the following when running ./bin/run --benchmark DatsetPerformance Exception in thread "main" java.lang.ClassNotFoundException: com.databricks.spark.sql.perf.DatsetPerformance Even though the cl

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly propagated to all nodes? Are they identical? > On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o resolution;

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
ou could try to > hack a new shuffle implementation, since shuffle framework is pluggable. > > > On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch > wrote: > >> As I mentioned earlier this flag is now ignored. >> >> >> On Fri, Apr 1, 2016, 6:39 PM Michael S

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
As I mentioned earlier this flag is now ignored. On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch wrote: > Shuffling a 1tb set of keys and values (aka sort by key) results in about > 500gb of io to disk if compression is enabled. Is there any way to > eliminate shuffling causing io? &

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
pill has > nothing to do with the shuffle files on disk. It was for the partitioning > (i.e. sorting) process. If that flag is off, Spark will just run out of > memory when data doesn't fit in memory. > > > On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch > wrote: > >&g

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
oncerns with taking that approach to test ? (I dont see > any, but I am not sure if I missed something). > > > Regards, > Mridul > > > > > On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch > wrote: > > I totally disagree that it’s not a problem. > >

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
number of beefy > nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving > performance for those. Meantime, you can setup local ramdisks on each node > for shuffle writes. > > > > On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <mailto:slavi...@g

Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Michael Slavitch
Hello; I’m working on spark with very large memory systems (2TB+) and notice that Spark spills to disk in shuffle. Is there a way to force spark to stay in memory when doing shuffle operations? The goal is to keep the shuffle data either in the heap or in off-heap memory (in 1.6.x) and never