Hello;
I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark
1.6, I get the following when running
./bin/run --benchmark DatsetPerformance
Exception in thread "main" java.lang.ClassNotFoundException:
com.databricks.spark.sql.perf.DatsetPerformance
Even though the cl
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
propagated to all nodes? Are they identical?
> On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
>
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resolution;
ou could try to
> hack a new shuffle implementation, since shuffle framework is pluggable.
>
>
> On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch
> wrote:
>
>> As I mentioned earlier this flag is now ignored.
>>
>>
>> On Fri, Apr 1, 2016, 6:39 PM Michael S
As I mentioned earlier this flag is now ignored.
On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch wrote:
> Shuffling a 1tb set of keys and values (aka sort by key) results in about
> 500gb of io to disk if compression is enabled. Is there any way to
> eliminate shuffling causing io?
&
pill has
> nothing to do with the shuffle files on disk. It was for the partitioning
> (i.e. sorting) process. If that flag is off, Spark will just run out of
> memory when data doesn't fit in memory.
>
>
> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch
> wrote:
>
>&g
oncerns with taking that approach to test ? (I dont see
> any, but I am not sure if I missed something).
>
>
> Regards,
> Mridul
>
>
>
>
> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch
> wrote:
> > I totally disagree that it’s not a problem.
> >
number of beefy
> nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving
> performance for those. Meantime, you can setup local ramdisks on each node
> for shuffle writes.
>
>
>
> On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <mailto:slavi...@g
Hello;
I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle. Is there a way to force spark to stay in
memory when doing shuffle operations? The goal is to keep the shuffle data
either in the heap or in off-heap memory (in 1.6.x) and never