I am not sure what you compare here. You would need to provide additional
details, such as algorithms and functionality supported by your framework. For
instance, Spark has built-in fault-tolerance and is a generic framework, which
has advantage with respect to development and operations, but ma
That seems reasonable, but it seems pretty unfair to the HPC setup in which
the master is reading all the data. Basically you can make HPC perform
infinitely worse by just adding more modes to Spark.
On Monday, February 29, 2016, yasincelik wrote:
> Hello,
>
> I am working on a project as a part
Hello,
I am working on a project as a part of my research. The system I am working
on is basically an in-memory computing system. I want to compare its
performance with Spark. Here is how I conduct experiments. For my project: I
have a software defined network(SDN) that allows HPC applications to
I have created jira for this feature , comments and feedback are welcome
about how to improve it and whether it's valuable for users.
https://issues.apache.org/jira/browse/SPARK-13587
Here's some background info and status of this work.
Currently, it's not easy for user to add third party pyth
In yarn mode, spark.local.dir is yarn.nodemanager.local-dirs for shuffle
data and block manager disk data. What do you mean "But output files to
upload to s3 still created in /tmp on slaves" ? You should have control on
where to store your output data if that means your job's output.
On Tue, Mar 1
I have Spark on yarn
I defined yarn.nodemanager.local-dirs to be /data01/yarn/nm,/data02/yarn/nm
when I look at yarn executor container log I see that blockmanager files
created in /data01/yarn/nm,/data02/yarn/nm
But output files to upload to s3 still created in /tmp on slaves
I do not want Spa
Hello All,
Just to add to this question a bit more context
I have a join as stated above and I see in my executor logs the below :
16/02/29 17:02:35 INFO TaskSetManager: Finished task 198.0 in stage 7.0
(TID 1114) in 20354 ms on localhost (196/200)
16/02/29 17:02:35 INFO ShuffleBlockFetcher
Hello,
I'm trying to join 2 dataframes A and B with a
sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a");
Now what I have done is that I have registeredTempTables for A and B after
loading these DataFrames from different sources. I need the join to be
really fast and I was wondering if th
On 27 Feb 2016, at 20:40, Prabhu Joseph
mailto:prabhujose.ga...@gmail.com>> wrote:
Hi All,
When i change the spark log4j.properties conversion pattern to know the
fully qualified class name, all the logs has the FQCN as
org.apache.spark.Logging. The actual fully qualified class name is ov
Hi,
You can create log4j.properties for executors, and use "--files
> log4j.properties" when submitting
In the case when we are initializing spark context via java, how can we
pass the same parameter?
jsc = new JavaSparkContext(conf);
Is it possible to set this parameter in spark-defaults.con
10 matches
Mail list logo