Hello;
I'm trying to run spark-sql-perf version 0.3.2 (hash cb0347b) against Spark
1.6, I get the following when running
./bin/run --benchmark DatsetPerformance
Exception in thread "main" java.lang.ClassNotFoundException:
com.databricks.spark.sql.perf.DatsetPerformance
Even though the cl
You should consider mobile agents that feed data into a spark datacenter via
spark streaming.
> On Apr 7, 2016, at 8:28 AM, Ashic Mahtab wrote:
>
> Spark may not be the right tool for this. Working on just the mobile device,
> you won't be scaling out stuff, and as such most of the benefits o
increase
> spark.storage.memoryFraction? Also I'm thinking maybe I should repartition
> all_pairs so that each partition will be small enough to be handled.
>
> On Tue, Apr 5, 2016 at 8:03 PM, Michael Slavitch <mailto:slavi...@gmail.com>> wrote:
> Do you have enough disk
Do you have enough disk space for the spill? It seems it has lots of memory
reserved but not enough for the spill. You will need a disk that can handle the
entire data partition for each host. Compression of the spilled data saves
about 50% in most if not all cases.
Given the large data set I
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
propagated to all nodes? Are they identical?
> On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
>
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resolution;
ou could try to
> hack a new shuffle implementation, since shuffle framework is pluggable.
>
>
> On Sat, Apr 2, 2016 at 6:48 AM, Michael Slavitch
> wrote:
>
>> As I mentioned earlier this flag is now ignored.
>>
>>
>> On Fri, Apr 1, 2016, 6:39 PM Michael S
As I mentioned earlier this flag is now ignored.
On Fri, Apr 1, 2016, 6:39 PM Michael Slavitch wrote:
> Shuffling a 1tb set of keys and values (aka sort by key) results in about
> 500gb of io to disk if compression is enabled. Is there any way to
> eliminate shuffling causing io?
&
pill has
> nothing to do with the shuffle files on disk. It was for the partitioning
> (i.e. sorting) process. If that flag is off, Spark will just run out of
> memory when data doesn't fit in memory.
>
>
> On Fri, Apr 1, 2016 at 3:28 PM, Michael Slavitch
> wrote:
>
>&g
oncerns with taking that approach to test ? (I dont see
> any, but I am not sure if I missed something).
>
>
> Regards,
> Mridul
>
>
>
>
> On Fri, Apr 1, 2016 at 2:10 PM, Michael Slavitch
> wrote:
> > I totally disagree that it’s not a problem.
> >
number of beefy
> nodes (e.g. 2 nodes each with 1TB of RAM). We do want to look into improving
> performance for those. Meantime, you can setup local ramdisks on each node
> for shuffle writes.
>
>
>
> On Fri, Apr 1, 2016 at 11:32 AM, Michael Slavitch <mailto:slavi...@g
Hello;
I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle. Is there a way to force spark to stay
exclusively in memory when doing shuffle operations? The goal is to keep
the shuffle data either in the heap or in off-heap memory (in 1.6.x)
Hello;
I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle. Is there a way to force spark to stay in
memory when doing shuffle operations? The goal is to keep the shuffle data
either in the heap or in off-heap memory (in 1.6.x) and never
Hello;
I’m working on spark with very large memory systems (2TB+) and notice that
Spark spills to disk in shuffle. Is there a way to force spark to stay in
memory when doing shuffle operations? The goal is to keep the shuffle data
either in the heap or in off-heap memory (in 1.6.x) and never
13 matches
Mail list logo