Re: Announcing Spark 1.0.1

2014-07-11 Thread Henry Saputra
Congrats to the Spark community ! On Friday, July 11, 2014, Patrick Wendell wrote: > I am happy to announce the availability of Spark 1.0.1! This release > includes contributions from 70 developers. Spark 1.0.0 includes fixes > across several areas of Spark, including the core API, PySpark, and

Announcing Spark 1.0.1

2014-07-11 Thread Patrick Wendell
I am happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark's (alpha) SQL library, including support for J

Re: Calling Scala/Java methods which operates on RDD

2014-07-11 Thread Kan Zhang
Hi Jai, Your suspicion is correct. In general, Python RDDs are pickled into byte arrays and stored in Java land as RDDs of byte arrays. union/zip operates on byte arrays directly without deserializing. Currently, Python byte arrays only get unpickled into Java objects in special cases, like SQL fu

Re: How pySpark works?

2014-07-11 Thread Reynold Xin
Also take a look at this: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or wrote: > Hi Egor, > > Here are a few answers to your questions: > > 1) Python needs to be installed on all machines, but not pyspark. The way > the executors

Re: How pySpark works?

2014-07-11 Thread Andrew Or
Hi Egor, Here are a few answers to your questions: 1) Python needs to be installed on all machines, but not pyspark. The way the executors get the pyspark code depends on which cluster manager you use. In standalone mode, your executors need to have the actual python files in their working direct

[RESULT] [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
This vote has passed with 9 +1 votes (5 binding) and 1 -1 vote (0 binding). +1: Patrick Wendell* Mark Hamstra* DB Tsai Krishna Sankar Soren Macbeth Andrew Or Matei Zaharia* Xiangrui Meng* Tom Graves* 0: -1: Gary Malouf

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
Okay just FYI - I'm closing this vote since many people are waiting on the release and I was hoping to package it today. If we find a reproducible Mesos issue here, we can definitely spin the fix into a subsequent release. On Fri, Jul 11, 2014 at 9:37 AM, Patrick Wendell wrote: > Hey Gary, > >

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Patrick Wendell
Hey Gary, Why do you think the akka frame size changed? It didn't change - we added some fixes for cases where users were setting non-default values. On Fri, Jul 11, 2014 at 9:31 AM, Gary Malouf wrote: > Hi Matei, > > We have not had time to re-deploy the rc today, but one thing that jumps > out

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Gary Malouf
Hi Matei, We have not had time to re-deploy the rc today, but one thing that jumps out is the shrinking of the default akka frame size from 10MB to around 128KB by default. That is my first suspicion for our issue - could imagine that biting others as well. I'll try to re-test that today - eithe

Re: [VOTE] Release Apache Spark 1.0.1 (RC2)

2014-07-11 Thread Matei Zaharia
Unless you can diagnose the problem quickly, Gary, I think we need to go ahead with this release as is. This release didn't touch the Mesos support as far as I know, so the problem might be a nondeterministic issue with your application. But on the other hand the release does fix some critical b

Calling Scala/Java methods which operates on RDD

2014-07-11 Thread Jai Kumar Singh
HI, I want to write some common utility function in Scala and want to call the same from Java/Python Spark API ( may be add some wrapper code around scala calls). Calling Scala functions from Java works fine. I was reading pyspark rdd code and find out that pyspark is able to call JavaRDD functio

Re: Random forest - is it under implementation?

2014-07-11 Thread Egor Pahomov
Great. Then one question left: what would you recommend for implementation? 2014-07-11 17:43 GMT+04:00 Chester At Work : > Sung chung from alpine data labs presented the random Forrest > implementation at Spark summit 2014. The work will be open sourced and > contributed back to MLLib. > > Stay

Re: Random forest - is it under implementation?

2014-07-11 Thread Chester At Work
Sung chung from alpine data labs presented the random Forrest implementation at Spark summit 2014. The work will be open sourced and contributed back to MLLib. Stay tuned Sent from my iPad On Jul 11, 2014, at 6:02 AM, Egor Pahomov wrote: > Hi, I have intern, who wants to implement some ML

Random forest - is it under implementation?

2014-07-11 Thread Egor Pahomov
Hi, I have intern, who wants to implement some ML algorithm for spark. Which algorithm would be good idea to implement(it should be not very difficult)? I heard someone already working on random forest, but couldn't find proof of that. I'm aware of new politics, where we should implement stable, g

How pySpark works?

2014-07-11 Thread Egor Pahomov
Hi, I want to use pySpark, but can't understand how it works. Documentation doesn't provide enough information. 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in "map" function - is it shipped to cluster and just exec