Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Matei Zaharia
Hey Patrick, unfortunately you got some of the text here wrong, saying 1.1.0 instead of 1.2.0. Not sure it will matter since there can well be another RC after testing, but we should be careful. Matei > On Nov 28, 2014, at 9:16 PM, Patrick Wendell wrote: > > Please vote on releasing the follo

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Reynold Xin
Krishna, Docs don't block the rc voting because docs can be updated in parallel with release candidates, until the point a release is made. On Fri, Nov 28, 2014 at 9:55 PM, Krishna Sankar wrote: > Looks like the documentation hasn't caught up with the new features. > On the machine learning si

Re: [VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Krishna Sankar
Looks like the documentation hasn't caught up with the new features. On the machine learning side, for example org.apache.spark.ml, RandomForest, gbtree and so forth. Is a refresh of the documentation planned ? Am happy to see these capabilities, but these would need good explanations as well, espe

[VOTE] Release Apache Spark 1.2.0 (RC1)

2014-11-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. c

Re: [mllib] Which is the correct package to add a new algorithm?

2014-11-28 Thread Joseph Bradley
Hi Yu, Thanks for bringing it up for clarification. Here's a rough draft of a section for the soon-to-be-updated programming guide, which will have more info on the spark.ml package. Joseph ## spark.mllib vs. spark.ml Spark 1.2 will include a new machine learning package called spark.ml, current

Re: Creating a SchemaRDD from an existing API

2014-11-28 Thread Michael Armbrust
You probably don't need to create a new kind of SchemaRDD. Instead I'd suggest taking a look at the data sources API that we are adding in Spark 1.2. There is not a ton of documentation, but the test cases show how to implement the various interfaces