Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Zsolt Tóth Wed, 23 Dec 2015 00:50:14 -0800

+1 (non binding)

(Pyspark K-Means still shows the numeric diff, of course.)


2015-12-23 9:33 GMT+01:00 Kousuke Saruta <[email protected]>:

> +1
>
>
> On 2015/12/23 16:14, Jean-Baptiste Onofré wrote:
>
>> +1 (non binding)
>>
>> Tested with samples on standalone and yarn.
>>
>> Regards
>> JB
>>
>> On 12/22/2015 09:10 PM, Michael Armbrust wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is _v1.6.0-rc4
>>> (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
>>> <https://github.com/apache/spark/tree/v1.6.0-rc4>_
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1176/
>>>
>>> The test repository (versioned as v1.6.0-rc4) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1175/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>>>
>>> =======================================
>>> == How can I help test this release? ==
>>> =======================================
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> ================================================
>>> == What justifies a -1 vote for this release? ==
>>> ================================================
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will
>>> not block this release.
>>>
>>> ===============================================================
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===============================================================
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==================================================
>>> == Major changes to help you focus your testing ==
>>> ==================================================
>>>
>>>
>>>   Notable changes since 1.6 RC3
>>>
>>>
>>>    - SPARK-12404 - Fix serialization error for Datasets with
>>> Timestamps/Arrays/Decimal
>>>    - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>>>    - SPARK-12395 - Fix join columns of outer join for DataFrame using
>>>    - SPARK-12413 - Fix mesos HA
>>>
>>>
>>>
>>>   Notable changes since 1.6 RC2
>>>
>>>
>>> - SPARK_VERSION has been set correctly
>>> - SPARK-12199 ML Docs are publishing correctly
>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>
>>>
>>>   Notable changes since 1.6 RC1
>>>
>>>
>>>       Spark Streaming
>>>
>>>   * SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629>
>>>     |trackStateByKey| has been renamed to |mapWithState|
>>>
>>>
>>>       Spark SQL
>>>
>>>   * SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165>
>>>     SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix
>>>     bugs in eviction of storage memory by execution.
>>>   * SPARK-12258
>>>     <https://issues.apache.org/jira/browse/SPARK-12258> correct passing
>>>     null into ScalaUDF
>>>
>>>
>>>     Notable Features Since 1.5
>>>
>>>
>>>       Spark SQL
>>>
>>>   * SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787>
>>>     Parquet Performance - Improve Parquet scan performance when using
>>>     flat schemas.
>>>   * SPARK-10810
>>> <https://issues.apache.org/jira/browse/SPARK-10810>Session
>>>     Management - Isolated devault database (i.e |USE mydb|) even on
>>>     shared clusters.
>>>   * SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999>
>>>     Dataset API - A type-safe API (similar to RDDs) that performs many
>>>     operations on serialized binary data and code generation (i.e.
>>>     Project Tungsten).
>>>   * SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000>
>>>     Unified Memory Management - Shared memory for execution and caching
>>>     instead of exclusive division of the regions.
>>>   * SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>>>     Queries on Files - Concise syntax for running SQL queries over files
>>>     of any supported format without registering a table.
>>>   * SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745>
>>>     Reading non-standard JSON files - Added options to read non-standard
>>>     JSON files (e.g. single-quotes, unquoted attributes)
>>>   * SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412>
>>>     Per-operator Metrics for SQL Execution - Display statistics on a
>>>     peroperator basis for memory usage and spilled data size.
>>>   * SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>>>     (*) expansion for StructTypes - Makes it easier to nest and unest
>>>     arbitrary numbers of columns
>>>   * SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>>>     SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149>
>>>     In-memory Columnar Cache Performance - Significant (up to 14x) speed
>>>     up when caching data that contains complex types in DataFrames or
>>> SQL.
>>>   * SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>>>     null-safe joins - Joins using null-safe equality (|<=>|) will now
>>>     execute using SortMergeJoin instead of computing a cartisian product.
>>>   * SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>>>     Execution Using Off-Heap Memory - Support for configuring query
>>>     execution to occur using off-heap memory to avoid GC overhead
>>>   * SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978>
>>>     Datasource API Avoid Double Filter - When implemeting a datasource
>>>     with filter pushdown, developers can now tell Spark SQL to avoid
>>>     double evaluating a pushed-down filter.
>>>   * SPARK-4849 <https://issues.apache.org/jira/browse/SPARK-4849>
>>>     Advanced Layout of Cached Data - storing partitioning and ordering
>>>     schemes in In-memory table scan, and adding distributeBy and
>>>     localSort to DF API
>>>   * SPARK-9858 <https://issues.apache.org/jira/browse/SPARK-9858>
>>>     Adaptive query execution - Intial support for automatically
>>>     selecting the number of reducers for joins and aggregations.
>>>   * SPARK-9241 <https://issues.apache.org/jira/browse/SPARK-9241>
>>>     Improved query planner for queries having distinct aggregations -
>>>     Query plans of distinct aggregations are more robust when distinct
>>>     columns have high cardinality.
>>>
>>>
>>>       Spark Streaming
>>>
>>>   * API Updates
>>>       o SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629>
>>>         New improved state management - |mapWithState| - a DStream
>>>         transformation for stateful stream processing, supercedes
>>>         |updateStateByKey| in functionality and performance.
>>>       o SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198>
>>>         Kinesis record deaggregation - Kinesis streams have been
>>>         upgraded to use KCL 1.4.0 and supports transparent deaggregation
>>>         of KPL-aggregated records.
>>>       o SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891>
>>>         Kinesis message handler function - Allows arbitraray function to
>>>         be applied to a Kinesis record in the Kinesis receiver before to
>>>         customize what data is to be stored in memory.
>>>       o SPARK-6328 <https://issues.apache.org/jira/browse/SPARK-6328>
>>>         Python Streamng Listener API - Get streaming statistics
>>>         (scheduling delays, batch processing times, etc.) in streaming.
>>>
>>>   * UI Improvements
>>>       o Made failures visible in the streaming tab, in the timelines,
>>>         batch list, and batch details page.
>>>       o Made output operations visible in the streaming tab as progress
>>>         bars.
>>>
>>>
>>>       MLlib
>>>
>>>
>>>         New algorithms/models
>>>
>>>   * SPARK-8518 <https://issues.apache.org/jira/browse/SPARK-8518>
>>>     Survival analysis - Log-linear model for survival analysis
>>>   * SPARK-9834 <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>>>     equation for least squares - Normal equation solver, providing
>>>     R-like model summary statistics
>>>   * SPARK-3147 <https://issues.apache.org/jira/browse/SPARK-3147> Online
>>>     hypothesis testing - A/B testing in the Spark Streaming framework
>>>   * SPARK-9930 <https://issues.apache.org/jira/browse/SPARK-9930> New
>>>     feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>>>     transformer
>>>   * SPARK-6517 <https://issues.apache.org/jira/browse/SPARK-6517>
>>>     Bisecting K-Means clustering - Fast top-down clustering variant of
>>>     K-Means
>>>
>>>
>>>         API improvements
>>>
>>>   * ML Pipelines
>>>       o SPARK-6725 <https://issues.apache.org/jira/browse/SPARK-6725>
>>>         Pipeline persistence - Save/load for ML Pipelines, with partial
>>>         coverage of spark.ml <http://spark.ml/>algorithms
>>>       o SPARK-5565 <https://issues.apache.org/jira/browse/SPARK-5565>
>>>         LDA in ML Pipelines - API for Latent Dirichlet Allocation in ML
>>>         Pipelines
>>>   * R API
>>>       o SPARK-9836 <https://issues.apache.org/jira/browse/SPARK-9836>
>>>         R-like statistics for GLMs - (Partial) R-like stats for ordinary
>>>         least squares via summary(model)
>>>       o SPARK-9681 <https://issues.apache.org/jira/browse/SPARK-9681>
>>>         Feature interactions in R formula - Interaction operator ":" in
>>>         R formula
>>>   * Python API - Many improvements to Python API to approach feature
>>> parity
>>>
>>>
>>>         Misc improvements
>>>
>>>   * SPARK-7685 <https://issues.apache.org/jira/browse/SPARK-7685>,
>>>     SPARK-9642 <https://issues.apache.org/jira/browse/SPARK-9642>
>>>     Instance weights for GLMs - Logistic and Linear Regression can take
>>>     instance weights
>>>   * SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>>>     SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385>
>>>     Univariate and bivariate statistics in DataFrames - Variance,
>>>     stddev, correlations, etc.
>>>   * SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117>
>>>     LIBSVM data source - LIBSVM as a SQL data source
>>>
>>>
>>>             Documentation improvements
>>>
>>>   * SPARK-7751 <https://issues.apache.org/jira/browse/SPARK-7751> @since
>>>     versions - Documentation includes initial version when classes and
>>>     methods were added
>>>   * SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337>
>>>     Testable example code - Automated testing for code in user guide
>>>     examples
>>>
>>>
>>>     Deprecations
>>>
>>>   * In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>     deprecated.
>>>   * In spark.ml.classification.LogisticRegressionModel and
>>>     spark.ml.regression.LinearRegressionModel, the "weights" field has
>>>     been deprecated, in favor of the new name "coefficients." This helps
>>>     disambiguate from instance (row) weights given to algorithms.
>>>
>>>
>>>     Changes of behavior
>>>
>>>   * spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>     semantics in 1.6. Previously, it was a threshold for absolute change
>>>     in error. Now, it resembles the behavior of GradientDescent
>>>     convergenceTol: For large errors, it uses relative error (relative
>>>     to the previous error); for small errors (< 0.01), it uses absolute
>>>     error.
>>>   * spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>     strings to lowercase before tokenizing. Now, it converts to
>>>     lowercase by default, with an option not to. This matches the
>>>     behavior of the simpler Tokenizer transformer.
>>>   * Spark SQL's partition discovery has been changed to only discover
>>>     partition directories that are children of the given path. (i.e. if
>>>     |path="/my/data/x=1"| then |x=1| will no longer be considered a
>>>     partition but only children of |x=1|.) This behavior can be
>>>     overridden by manually specifying the |basePath| that partitioning
>>>     discovery should start with (SPARK-11678
>>>     <https://issues.apache.org/jira/browse/SPARK-11678>).
>>>   * When casting a value of an integral type to timestamp (e.g. casting
>>>     a long value to timestamp), the value is treated as being in seconds
>>>     instead of milliseconds (SPARK-11724
>>>     <https://issues.apache.org/jira/browse/SPARK-11724>).
>>>   * With the improved query planner for queries having distinct
>>>     aggregations (SPARK-9241
>>>     <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a
>>>     query having a single distinct aggregation has been changed to a
>>>     more robust version. To switch back to the plan generated by Spark
>>>     1.5's planner, please set
>>>     |spark.sql.specializeSingleDistinctAggPlanning| to
>>>     |true| (SPARK-12077
>>>     <https://issues.apache.org/jira/browse/SPARK-12077>).
>>>
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Reply via email to