Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Sean Owen Fri, 18 Dec 2015 09:23:37 -0800

For me, mostly the same as before: tests are mostly passing, but I can
never get the docker tests to pass. If anyone knows a special profile
or package that needs to be enabled, I can try that and/or
fix/document it. Just wondering if it's me.


I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver
-Phadoop-2.6

On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust
<[email protected]> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v1.6.0-rc3
> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1174/
>
> The test repository (versioned as v1.6.0-rc3) for this release can be found
> at:
> https://repository.apache.org/content/repositories/orgapachespark-1173/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>
> =======================================
> == How can I help test this release? ==
> =======================================
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ================================================
> == What justifies a -1 vote for this release? ==
> ================================================
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already present
> in 1.5, minor regressions, or bugs related to new features will not block
> this release.
>
> ===============================================================
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===============================================================
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==================================================
> == Major changes to help you focus your testing ==
> ==================================================
>
> Notable changes since 1.6 RC2
>
>
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
>
> Spark Streaming
>
> SPARK-2629  trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
> SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by execution.
> SPARK-12258 correct passing null into ScalaUDF
>
> Notable Features Since 1.5
>
> Spark SQL
>
> SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> using flat schemas.
> SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> even on shared clusters.
> SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that performs
> many operations on serialized binary data and code generation (i.e. Project
> Tungsten).
> SPARK-10000 Unified Memory Management - Shared memory for execution and
> caching instead of exclusive division of the regions.
> SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> over files of any supported format without registering a table.
> SPARK-11745 Reading non-standard JSON files - Added options to read
> non-standard JSON files (e.g. single-quotes, unquoted attributes)
> SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on a
> peroperator basis for memory usage and spilled data size.
> SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and
> unest arbitrary numbers of columns
> SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant
> (up to 14x) speed up when caching data that contains complex types in
> DataFrames or SQL.
> SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>) will
> now execute using SortMergeJoin instead of computing a cartisian product.
> SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring
> query execution to occur using off-heap memory to avoid GC overhead
> SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
> datasource with filter pushdown, developers can now tell Spark SQL to avoid
> double evaluating a pushed-down filter.
> SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
> ordering schemes in In-memory table scan, and adding distributeBy and
> localSort to DF API
> SPARK-9858  Adaptive query execution - Intial support for automatically
> selecting the number of reducers for joins and aggregations.
> SPARK-9241  Improved query planner for queries having distinct aggregations
> - Query plans of distinct aggregations are more robust when distinct columns
> have high cardinality.
>
> Spark Streaming
>
> API Updates
>
> SPARK-2629  New improved state management - mapWithState - a DStream
> transformation for stateful stream processing, supercedes updateStateByKey
> in functionality and performance.
> SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
> upgraded to use KCL 1.4.0 and supports transparent deaggregation of
> KPL-aggregated records.
> SPARK-10891 Kinesis message handler function - Allows arbitraray function to
> be applied to a Kinesis record in the Kinesis receiver before to customize
> what data is to be stored in memory.
> SPARK-6328  Python Streamng Listener API - Get streaming statistics
> (scheduling delays, batch processing times, etc.) in streaming.
>
> UI Improvements
>
> Made failures visible in the streaming tab, in the timelines, batch list,
> and batch details page.
> Made output operations visible in the streaming tab as progress bars.
>
> MLlib
>
> New algorithms/models
>
> SPARK-8518  Survival analysis - Log-linear model for survival analysis
> SPARK-9834  Normal equation for least squares - Normal equation solver,
> providing R-like model summary statistics
> SPARK-3147  Online hypothesis testing - A/B testing in the Spark Streaming
> framework
> SPARK-9930  New feature transformers - ChiSqSelector, QuantileDiscretizer,
> SQL transformer
> SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering variant
> of K-Means
>
> API improvements
>
> ML Pipelines
>
> SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with partial
> coverage of spark.mlalgorithms
> SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation in ML
> Pipelines
>
> R API
>
> SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for ordinary
> least squares via summary(model)
> SPARK-9681  Feature interactions in R formula - Interaction operator ":" in
> R formula
>
> Python API - Many improvements to Python API to approach feature parity
>
> Misc improvements
>
> SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and Linear
> Regression can take instance weights
> SPARK-10384, SPARK-10385 Univariate and bivariate statistics in DataFrames -
> Variance, stddev, correlations, etc.
> SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>
> Documentation improvements
>
> SPARK-7751  @since versions - Documentation includes initial version when
> classes and methods were added
> SPARK-11337 Testable example code - Automated testing for code in user guide
> examples
>
> Deprecations
>
> In spark.mllib.clustering.KMeans, the "runs" parameter has been deprecated.
> In spark.ml.classification.LogisticRegressionModel and
> spark.ml.regression.LinearRegressionModel, the "weights" field has been
> deprecated, in favor of the new name "coefficients." This helps disambiguate
> from instance (row) weights given to algorithms.
>
> Changes of behavior
>
> spark.mllib.tree.GradientBoostedTrees validationTol has changed semantics in
> 1.6. Previously, it was a threshold for absolute change in error. Now, it
> resembles the behavior of GradientDescent convergenceTol: For large errors,
> it uses relative error (relative to the previous error); for small errors (<
> 0.01), it uses absolute error.
> spark.ml.feature.RegexTokenizer: Previously, it did not convert strings to
> lowercase before tokenizing. Now, it converts to lowercase by default, with
> an option not to. This matches the behavior of the simpler Tokenizer
> transformer.
> Spark SQL's partition discovery has been changed to only discover partition
> directories that are children of the given path. (i.e. if
> path="/my/data/x=1" then x=1 will no longer be considered a partition but
> only children of x=1.) This behavior can be overridden by manually
> specifying the basePath that partitioning discovery should start with
> (SPARK-11678).
> When casting a value of an integral type to timestamp (e.g. casting a long
> value to timestamp), the value is treated as being in seconds instead of
> milliseconds (SPARK-11724).
> With the improved query planner for queries having distinct aggregations
> (SPARK-9241), the plan of a query having a single distinct aggregation has
> been changed to a more robust version. To switch back to the plan generated
> by Spark 1.5's planner, please set
> spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [VOTE] Release Apache Spark 1.6.0 (RC3)

Reply via email to