Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Zsolt Tóth
+1 (non-binding)

Testing environment:
-CDH5.5 single node docker
-Prebuilt spark-1.6.0-hadoop2.6.tgz
-Yarn-cluster mode

Comparing outputs of Spark 1.5.x and 1.6.0-RC3:

Pyspark
OK?: K-Means (ml) - Note: our tests show a numerical diff here compared to
the 1.5.2 output. Since K-Means has a random factor, this can be expected
behaviour - is it because of SPARK-10779? If so, I think it should be
listed in the Mllib/ml docs.
OK: Logistic Regression (ml), Linear Regression (mllib)
OK: Nested Spark SQL query

SparkR
OK: Logistic Regression
OK: Nested Spark SQL query

Machine learning - Java:
OK: Decision Tree (mllib and ml): Gini, Entropy
OK. Random Forest (ml): Gini, Entropy
OK: Linear, Lasso, Ridge Regression (mllib)
OK: Logistc Regression (mllib): SGD, L-BFGS
OK: SVM (mllib)

I/O:
OK: Reading/Writing Parquet to/from DataFrame
OK: Reading/Writing Textfile to/from RDD

2015-12-19 2:09 GMT+01:00 Marcelo Vanzin :

> +1 (non-binding)
>
> Tests the without-hadoop binaries (so didn't run Hive-related tests)
> with a test batch including standalone / client, yarn / client and
> cluster, including core, mllib and streaming (flume and kafka).
>
> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust
>  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.0!
> >
> > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v1.6.0-rc3
> > (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1174/
> >
> > The test repository (versioned as v1.6.0-rc3) for this release can be
> found
> > at:
> > https://repository.apache.org/content/repositories/orgapachespark-1173/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
> > should only occur for significant regressions from 1.5. Bugs already
> present
> > in 1.5, minor regressions, or bugs related to new features will not block
> > this release.
> >
> > ===
> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> > ===
> > 1. It is OK for documentation patches to target 1.6.0 and still go into
> > branch-1.6, since documentations will be published separately from the
> > release.
> > 2. New features for non-alpha-modules should target 1.7+.
> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> > version.
> >
> >
> > ==
> > == Major changes to help you focus your testing ==
> > ==
> >
> > Notable changes since 1.6 RC2
> >
> >
> > - SPARK_VERSION has been set correctly
> > - SPARK-12199 ML Docs are publishing correctly
> > - SPARK-12345 Mesos cluster mode has been fixed
> >
> > Notable changes since 1.6 RC1
> >
> > Spark Streaming
> >
> > SPARK-2629  trackStateByKey has been renamed to mapWithState
> >
> > Spark SQL
> >
> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
> execution.
> > SPARK-12258 correct passing null into ScalaUDF
> >
> > Notable Features Since 1.5
> >
> > Spark SQL
> >
> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> > using flat schemas.
> > SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> > even on shared clusters.
> > SPARK-  Dataset API - A type-safe API (similar to RDDs) that performs
> > many operations on serialized binary data and code generation (i.e.
> Project
> > Tungsten).
> > SPARK-1 Unified Memory Management - Shared memory for execution and
> > caching instead of exclusive division of the regions.
> > SPARK-11197 SQL Queries on Files - Concise syntax for ru

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Luciano Resende
+1 (non-binding)

Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.

On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
> *
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==
> == Major changes to help you focus your testing ==
> ==
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>- SPARK-2629  
>trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>- SPARK-12165 
>SPARK-12189  Fix
>bugs in eviction of storage memory by execution.
>- SPARK-12258  correct
>passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>- SPARK-11787  Parquet
>Performance - Improve Parquet scan performance when using flat schemas.
>- SPARK-10810 
>Session Management - Isolated devault database (i.e USE mydb) even on
>shared clusters.
>- SPARK-   Dataset
>API - A type-safe API (similar to RDDs) that performs many operations
>on serialized binary data and code generation (i.e. Project Tungsten).
>- SPARK-1  Unified
>Memory Management - Shared memory for execution and caching instead of
>exclusive division of the regions.
>- SPARK-11197  SQL
>Queries on Files - Concise syntax for running SQL queries over files
>of any supported format without registering a table.
>- SPARK-11745  Reading
>non-standard JSON files - Added options to read non-standard JSON
>files (e.g. single-quotes, unquoted attributes)
>- SPARK-10412  
> Per-operator
>Metrics for SQL Execution - Display statistics on a peroperator basis
>for memory usage and spilled data size.
>- SPARK-11329  Star
>(*) expansion for StructTypes - Makes it easier to nest and unest
>arbitrary numb

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

2015-12-19 Thread Jeff Zhang
+1 (non-binding)

All the test passed, and run it on HDP 2.3.2 sandbox successfully.

On Sun, Dec 20, 2015 at 10:43 AM, Luciano Resende 
wrote:

> +1 (non-binding)
>
> Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.
>
> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.6.0!
>>
>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.6.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is *v1.6.0-rc3
>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>> *
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>
>> The test repository (versioned as v1.6.0-rc3) for this release can be
>> found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>> should only occur for significant regressions from 1.5. Bugs already
>> present in 1.5, minor regressions, or bugs related to new features will not
>> block this release.
>>
>> ===
>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>> ===
>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>> branch-1.6, since documentations will be published separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.7+.
>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
>> version.
>>
>>
>> ==
>> == Major changes to help you focus your testing ==
>> ==
>>
>> Notable changes since 1.6 RC2
>> - SPARK_VERSION has been set correctly
>> - SPARK-12199 ML Docs are publishing correctly
>> - SPARK-12345 Mesos cluster mode has been fixed
>>
>> Notable changes since 1.6 RC1
>> Spark Streaming
>>
>>- SPARK-2629  
>>trackStateByKey has been renamed to mapWithState
>>
>> Spark SQL
>>
>>- SPARK-12165 
>>SPARK-12189  Fix
>>bugs in eviction of storage memory by execution.
>>- SPARK-12258  correct
>>passing null into ScalaUDF
>>
>> Notable Features Since 1.5Spark SQL
>>
>>- SPARK-11787  Parquet
>>Performance - Improve Parquet scan performance when using flat
>>schemas.
>>- SPARK-10810 
>>Session Management - Isolated devault database (i.e USE mydb) even on
>>shared clusters.
>>- SPARK-   Dataset
>>API - A type-safe API (similar to RDDs) that performs many operations
>>on serialized binary data and code generation (i.e. Project Tungsten).
>>- SPARK-1  Unified
>>Memory Management - Shared memory for execution and caching instead
>>of exclusive division of the regions.
>>- SPARK-11197  SQL
>>Queries on Files - Concise syntax for running SQL queries over files
>>of any supported format without registering a table.
>>- SPARK-11745  Reading
>>non-standard JSON files - Added options to read non-standard JSON
>>files (e.g. single-quotes, unquoted attributes)
>>- SPARK-10412  
>> Per-operator
>>Metrics for SQL E