Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Xiangrui Meng
On Tue, Mar 3, 2015 at 11:15 PM, Krishna Sankar wrote: > +1 (non-binding, of course) > > 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:53 min > mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 > -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 > 2. Tested pyspark, mlib -

Re: [VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 13:53 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11 2. Tested pyspark, mlib - running as well as compare results with 1.1.x & 1.2.x 2.1. statisti

Sharing SparkContext across multiple Unit Test Scala files

2015-03-03 Thread spotvenky
Can someone show me a code snippet on how I can create one SparkContext and share it across multiple Unit Test files? I want the tests to run in parallel as well. (i.e. parallelExecution in Test := true) I looked up SharedSparkContext, doesnt seem to work when tests are run in parallel. Can someo

[VOTE] Release Apache Spark 1.3.0 (RC2)

2015-03-03 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.0! The tag to be voted on is v1.3.0-rc2 (commit 3af2687): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3af26870e5163438868c4eb2df88380a533bb232 The release files, including signatures, digests, etc. can

[RESULT] [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-03-03 Thread Patrick Wendell
This vote is cancelled in favor of RC2. On Thu, Feb 26, 2015 at 9:50 AM, Sandor Van Wassenhove wrote: > FWIW, I tested the first rc and saw no regressions. I ran our benchmarks > built against spark 1.3 and saw results consistent with spark 1.2/1.2.1. > > On 2/25/15, 5:51 PM, "Patrick Wendell" w

Re: ideas for MLlib development

2015-03-03 Thread Evan R. Sparks
Hi Robert, There's some work to do LDA via Gibbs sampling in this JIRA: https://issues.apache.org/jira/browse/SPARK-1405 as well as this one: https://issues.apache.org/jira/browse/SPARK-5556 It may make sense to have a more general Gibbs sampling framework, but it might be good to have a few desi

ideas for MLlib development

2015-03-03 Thread Robert Dodier
Hi, I have some ideas for MLlib that I think might be of general interest so I'd like to see what people think and maybe find some collaborators. (1) Some form of Markov chain Monte Carlo such as Gibbs sampling or Metropolis-Hastings. Any kind of Monte Carlo method is readily parallelized so Spar

Re: Using CUDA within Spark / boosting linear algebra

2015-03-03 Thread Sam Halliday
BTW, is anybody on this list going to the London Meetup in a few weeks? https://skillsmatter.com/meetups/6987-apache-spark-living-the-post-mapreduce-world#community Would be nice to meet other people working on the guts of Spark! :-) Xiangrui Meng writes: > Hey Alexander, > > I don't quite un

Deploying master and worker programatically in java

2015-03-03 Thread Niranda Perera
Hi, I want to start a Spark standalone cluster programatically in java. I have been checking these classes, - org.apache.spark.deploy.master.Master - org.apache.spark.deploy.worker.Worker I successfully started a master with this simple main class. public static void main(String[] args) {