Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

taishi takahashi Thu, 03 Dec 2015 01:49:01 -0800

Excuse me,

I'm working on SPARK-10259.(this parent issue is SPARK-7751.)
https://issues.apache.org/jira/browse/SPARK-10259


This issues's purpose is to add @Since annotation to stable and experimenal
methods in MLlib.

in SPARK-7751, this and this children issues' target version is v.1.6.0,
but some issue is in progress.
(One of the reasons is delay of my work. For the present, I'm waiting for
Jenkins's test.)

if these issues is merged in v.1.6.0, please run Jenkins's test for
SPARK-10259.

Thanks,
Hiroshi Takahashi

2015-12-03 11:13 GMT+09:00 Ted Yu <yuzhih...@gmail.com>:

> +1
>
> Ran through test suite (minus docker-integration-tests) which passed.
>
> Overall experience was much better compared with some of the prior RC's.
>
> [INFO] Spark Project External Kafka ....................... SUCCESS [
> 53.956 s]
> [INFO] Spark Project Examples ............................. SUCCESS [02:05
> min]
> [INFO] Spark Project External Kafka Assembly .............. SUCCESS [
> 11.298 s]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 01:42 h
> [INFO] Finished at: 2015-12-02T17:19:02-08:00
>
> On Wed, Dec 2, 2015 at 4:23 PM, Michael Armbrust <mich...@databricks.com>
>  wrote:
>
>> I'm going to kick the voting off with a +1 (binding).  We ran TPC-DS and
>> most queries are faster than 1.5.  We've also ported several production
>> pipelines to 1.6.
>>
>
>
2015-12-03 5:26 GMT+09:00 Michael Armbrust <mich...@databricks.com>:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 5, 2015 at 21:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc1
> (bf525845cef159d2d4c9f4d64e158f037179b5c4)
> <https://github.com/apache/spark/tree/v1.6.0-rc1>*
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1165/
>
> The test repository (versioned as v1.6.0-rc1) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1164/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/
>
>
> =======================================
> == How can I help test this release? ==
> =======================================
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ================================================
> == What justifies a -1 vote for this release? ==
> ================================================
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===============================================================
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===============================================================
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==================================================
> == Major changes to help you focus your testing ==
> ==================================================
>
> Spark SQL
>
>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>    Session Management - The ability to create multiple isolated SQL
>    Contexts that have their own configuration and default database.  This is
>    turned on by default in the thrift server.
>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> Dataset
>    API - A type-safe API (similar to RDDs) that performs many operations
>    on serialized binary data and code generation (i.e. Project Tungsten).
>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> Unified
>    Memory Management - Shared memory for execution and caching instead of
>    exclusive division of the regions.
>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>    Queries on Files - Concise syntax for running SQL queries over files
>    of any supported format without registering a table.
>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> Reading
>    non-standard JSON files - Added options to read non-standard JSON
>    files (e.g. single-quotes, unquoted attributes)
>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> 
> Per-operator
>    Metics for SQL Execution - Display statistics on a per-operator basis
>    for memory usage and spilled data size.
>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>    (*) expansion for StructTypes - Makes it easier to nest and unest
>    arbitrary numbers of columns
>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>    SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> In-memory
>    Columnar Cache Performance - Significant (up to 14x) speed up when
>    caching data that contains complex types in DataFrames or SQL.
>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>    null-safe joins - Joins using null-safe equality (<=>) will now
>    execute using SortMergeJoin instead of computing a cartisian product.
>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>    Execution Using Off-Heap Memory - Support for configuring query
>    execution to occur using off-heap memory to avoid GC overhead
>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> 
> Datasource
>    API Avoid Double Filter - When implementing a datasource with filter
>    pushdown, developers can now tell Spark SQL to avoid double evaluating a
>    pushed-down filter.
>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> Advanced
>    Layout of Cached Data - storing partitioning and ordering schemes in
>    In-memory table scan, and adding distributeBy and localSort to DF API
>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> Adaptive
>    query execution - Initial support for automatically selecting the
>    number of reducers for joins and aggregations.
>
> Spark Streaming
>
>    - API Updates
>       - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629> New
>       improved state management - trackStateByKey - a DStream
>       transformation for stateful stream processing, supersedes
>       updateStateByKey in functionality and performance.
>       - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> 
> Kinesis
>       record deaggregation - Kinesis streams have been upgraded to use
>       KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated 
> records.
>       - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> 
> Kinesis
>       message handler function - Allows arbitrary function to be applied
>       to a Kinesis record in the Kinesis receiver before to customize what 
> data
>       is to be stored in memory.
>       - SPARK-6328  <https://issues.apache.org/jira/browse/SPARK-6328>
>        Python Streaming Listener API - Get streaming statistics
>       (scheduling delays, batch processing times, etc.) in streaming.
>
>
>    - UI Improvements
>       - Made failures visible in the streaming tab, in the timelines,
>       batch list, and batch details page.
>       - Made output operations visible in the streaming tab as progress
>       bars
>
> MLlibNew algorithms/models
>
>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> Survival
>    analysis - Log-linear model for survival analysis
>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>    equation for least squares - Normal equation solver, providing R-like
>    model summary statistics
>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> Online
>    hypothesis testing - A/B testing in the Spark Streaming framework
>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>    transformer
>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> Bisecting
>    K-Means clustering - Fast top-down clustering variant of K-Means
>
> API improvements
>
>    - ML Pipelines
>       - SPARK-6725  <https://issues.apache.org/jira/browse/SPARK-6725> 
> Pipeline
>       persistence - Save/load for ML Pipelines, with partial coverage of
>       spark.ml algorithms
>       - SPARK-5565  <https://issues.apache.org/jira/browse/SPARK-5565> LDA
>       in ML Pipelines - API for Latent Dirichlet Allocation in ML
>       Pipelines
>    - R API
>       - SPARK-9836  <https://issues.apache.org/jira/browse/SPARK-9836> R-like
>       statistics for GLMs - (Partial) R-like stats for ordinary least
>       squares via summary(model)
>       - SPARK-9681  <https://issues.apache.org/jira/browse/SPARK-9681> Feature
>       interactions in R formula - Interaction operator ":" in R formula
>    - Python API - Many improvements to Python API to approach feature
>    parity
>
> Misc improvements
>
>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>,
>    SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> Instance
>    weights for GLMs - Logistic and Linear Regression can take instance
>    weights
>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>    SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> Univariate
>    and bivariate statistics in DataFrames - Variance, stddev,
>    correlations, etc.
>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM
>    data source - LIBSVM as a SQL data sourceDocumentation improvements
>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> @since
>    versions - Documentation includes initial version when classes and
>    methods were added
>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> Testable
>    example code - Automated testing for code in user guide examples
>
> Deprecations
>
>    - In spark.mllib.clustering.KMeans, the "runs" parameter has been
>    deprecated.
>    - In spark.ml.classification.LogisticRegressionModel and
>    spark.ml.regression.LinearRegressionModel, the "weights" field has been
>    deprecated, in favor of the new name "coefficients." This helps
>    disambiguate from instance (row) weights given to algorithms.
>
> Changes of behavior
>
>    - spark.mllib.tree.GradientBoostedTrees validationTol has changed
>    semantics in 1.6. Previously, it was a threshold for absolute change in
>    error. Now, it resembles the behavior of GradientDescent convergenceTol:
>    For large errors, it uses relative error (relative to the previous error);
>    for small errors (< 0.01), it uses absolute error.
>    - spark.ml.feature.RegexTokenizer: Previously, it did not convert
>    strings to lowercase before tokenizing. Now, it converts to lowercase by
>    default, with an option not to. This matches the behavior of the simpler
>    Tokenizer transformer.
>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

Reply via email to