+1
1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4
-Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower
connection)
2. Tested pyspark, mlib - running as well as compare esults with 1.1.x
2.1. statistics OK
2.2. Linear/Ridge/Laso Regression OK
       Slight difference in the print method (vs. 1.1.x) of the model
object - with a label & more details. This is good.
2.3. Decision Tree, Naive Bayes OK
       Changes in print(model) - now print (model.ToDebugString()) - OK
       Some changes in NaiveBayes. Different from my 1.1.x code - had to
flatten list structures, zip required same number in partitions
       After code changes ran fine.
2.4. KMeans OK
       zip occasionally fails with error "localhost):
org.apache.spark.SparkException: Can only zip RDDs with same number of
elements in each partition"
Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ?
Made it work by doing a different transformation ie reusing an original
rdd.
2.5. rdd operations OK
       State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. recommendation OK
2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium
dataset which never worked. Works fine in 1.2.0 !
3. Scala Mlib - subset of examples as in #2 above, with Scala
3.1. statistics OK
3.2. Linear Regression OK
3.3. Decision Tree OK
3.4. KMeans OK
Cheers
<k/>
P.S: Plan to add RF and .ml mechanics to this bank

On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.2.0!
>
> The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.2.0-rc1/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1048/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.2.0!
>
> The vote is open until Tuesday, December 02, at 05:15 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.1.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == What justifies a -1 vote for this release? ==
> This vote is happening very late into the QA period compared with
> previous votes, so -1 votes should only occur for significant
> regressions from 1.0.2. Bugs already present in 1.1.X, minor
> regressions, or bugs related to new features will not block this
> release.
>
> == What default changes should I be aware of? ==
> 1. The default value of "spark.shuffle.blockTransferService" has been
> changed to "netty"
> --> Old behavior can be restored by switching to "nio"
>
> 2. The default value of "spark.shuffle.manager" has been changed to "sort".
> --> Old behavior can be restored by setting "spark.shuffle.manager" to
> "hash".
>
> == Other notes ==
> Because this vote is occurring over a weekend, I will likely extend
> the vote if this RC survives until the end of the vote period.
>
> - Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to