date:20160229

Re: Spark performance comparison for research

2016-02-29 Thread Jörn Franke

I am not sure what you compare here. You would need to provide additional details, such as algorithms and functionality supported by your framework. For instance, Spark has built-in fault-tolerance and is a generic framework, which has advantage with respect to development and operations, but ma

Re: Spark performance comparison for research

2016-02-29 Thread Reynold Xin

That seems reasonable, but it seems pretty unfair to the HPC setup in which the master is reading all the data. Basically you can make HPC perform infinitely worse by just adding more modes to Spark. On Monday, February 29, 2016, yasincelik wrote: > Hello, > > I am working on a project as a part

Spark performance comparison for research

2016-02-29 Thread yasincelik

Hello, I am working on a project as a part of my research. The system I am working on is basically an in-memory computing system. I want to compare its performance with Spark. Here is how I conduct experiments. For my project: I have a software defined network(SDN) that allows HPC applications to

Support virtualenv in PySpark

2016-02-29 Thread Jeff Zhang

I have created jira for this feature , comments and feedback are welcome about how to improve it and whether it's valuable for users. https://issues.apache.org/jira/browse/SPARK-13587 Here's some background info and status of this work. Currently, it's not easy for user to add third party pyth

Re: What should be spark.local.dir in spark on yarn?

2016-02-29 Thread Jeff Zhang

In yarn mode, spark.local.dir is yarn.nodemanager.local-dirs for shuffle data and block manager disk data. What do you mean "But output files to upload to s3 still created in /tmp on slaves" ? You should have control on where to store your output data if that means your job's output. On Tue, Mar 1

What should be spark.local.dir in spark on yarn?

2016-02-29 Thread Alexander Pivovarov

I have Spark on yarn I defined yarn.nodemanager.local-dirs to be /data01/yarn/nm,/data02/yarn/nm when I look at yarn executor container log I see that blockmanager files created in /data01/yarn/nm,/data02/yarn/nm But output files to upload to s3 still created in /tmp on slaves I do not want Spa

Re: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan

Hello All, Just to add to this question a bit more context I have a join as stated above and I see in my executor logs the below : 16/02/29 17:02:35 INFO TaskSetManager: Finished task 198.0 in stage 7.0 (TID 1114) in 20354 ms on localhost (196/200) 16/02/29 17:02:35 INFO ShuffleBlockFetcher

Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan

Hello, I'm trying to join 2 dataframes A and B with a sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a"); Now what I have done is that I have registeredTempTables for A and B after loading these DataFrames from different sources. I need the join to be really fast and I was wondering if th

Re: Spark log4j fully qualified class name

2016-02-29 Thread Steve Loughran

On 27 Feb 2016, at 20:40, Prabhu Joseph mailto:prabhujose.ga...@gmail.com>> wrote: Hi All, When i change the spark log4j.properties conversion pattern to know the fully qualified class name, all the logs has the FQCN as org.apache.spark.Logging. The actual fully qualified class name is ov

Re: Control the stdout and stderr streams in a executor JVM

2016-02-29 Thread Anuruddha Premalal

Hi, You can create log4j.properties for executors, and use "--files > log4j.properties" when submitting In the case when we are initializing spark context via java, how can we pass the same parameter? jsc = new JavaSparkContext(conf); Is it possible to set this parameter in spark-defaults.con

Re: Spark performance comparison for research

Re: Spark performance comparison for research

Spark performance comparison for research

Support virtualenv in PySpark

Re: What should be spark.local.dir in spark on yarn?

What should be spark.local.dir in spark on yarn?

Re: Mapper side join with DataFrames API

Mapper side join with DataFrames API

Re: Spark log4j fully qualified class name

Re: Control the stdout and stderr streams in a executor JVM

10 matches

Site Navigation

Mail list logo

Footer information