Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Bhupendra Mishra Fri, 25 Dec 2015 07:30:46 -0800

+1

On Fri, Dec 25, 2015 at 8:31 PM, vaquar khan <vaquar.k...@gmail.com> wrote:


> +1
> On 24 Dec 2015 22:01, "Vinay Shukla" <vinayshu...@gmail.com> wrote:
>
>> +1
>> Tested on HDP 2.3, YARN cluster mode, spark-shell
>>
>> On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang <allenzhang...@126.com>
>> wrote:
>>
>>>
>>> +1 (non-binding)
>>>
>>> I have just tarball a new binary and tested am.nodelabelexpression and
>>> executor.nodelabelexpression manully, result is expected.
>>>
>>>
>>>
>>>
>>> At 2015-12-23 21:44:08, "Iulian Dragoș" <iulian.dra...@typesafe.com>
>>> wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Tested Mesos deployments (client and cluster-mode, fine-grained and
>>> coarse-grained). Things look good
>>> <https://ci.typesafe.com/view/Spark/job/mit-docker-test-ref/8/console>.
>>>
>>> iulian
>>>
>>> On Wed, Dec 23, 2015 at 2:35 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> Docker integration tests still fail for Mark and I, and should
>>>> probably be disabled:
>>>> https://issues.apache.org/jira/browse/SPARK-12426
>>>>
>>>> ... but if anyone else successfully runs these (and I assume Jenkins
>>>> does) then not a blocker.
>>>>
>>>> I'm having intermittent trouble with other tests passing, but nothing
>>>> unusual.
>>>> Sigs and hashes are OK.
>>>>
>>>> We have 30 issues fixed for 1.6.1. All but those resolved in the last
>>>> 24 hours or so should be fixed for 1.6.0 right? I can touch that up.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 22, 2015 at 8:10 PM, Michael Armbrust
>>>> <mich...@databricks.com> wrote:
>>>> > Please vote on releasing the following candidate as Apache Spark
>>>> version
>>>> > 1.6.0!
>>>> >
>>>> > The vote is open until Friday, December 25, 2015 at 18:00 UTC and
>>>> passes if
>>>> > a majority of at least 3 +1 PMC votes are cast.
>>>> >
>>>> > [ ] +1 Release this package as Apache Spark 1.6.0
>>>> > [ ] -1 Do not release this package because ...
>>>> >
>>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>>> >
>>>> > The tag to be voted on is v1.6.0-rc4
>>>> > (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
>>>> >
>>>> > The release files, including signatures, digests, etc. can be found
>>>> at:
>>>> >
>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>>>> >
>>>> > Release artifacts are signed with the following key:
>>>> > https://people.apache.org/keys/committer/pwendell.asc
>>>> >
>>>> > The staging repository for this release can be found at:
>>>> >
>>>> https://repository.apache.org/content/repositories/orgapachespark-1176/
>>>> >
>>>> > The test repository (versioned as v1.6.0-rc4) for this release can be
>>>> found
>>>> > at:
>>>> >
>>>> https://repository.apache.org/content/repositories/orgapachespark-1175/
>>>> >
>>>> > The documentation corresponding to this release can be found at:
>>>> >
>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>>>> >
>>>> > =======================================
>>>> > == How can I help test this release? ==
>>>> > =======================================
>>>> > If you are a Spark user, you can help us test this release by taking
>>>> an
>>>> > existing Spark workload and running on this release candidate, then
>>>> > reporting any regressions.
>>>> >
>>>> > ================================================
>>>> > == What justifies a -1 vote for this release? ==
>>>> > ================================================
>>>> > This vote is happening towards the end of the 1.6 QA period, so -1
>>>> votes
>>>> > should only occur for significant regressions from 1.5. Bugs already
>>>> present
>>>> > in 1.5, minor regressions, or bugs related to new features will not
>>>> block
>>>> > this release.
>>>> >
>>>> > ===============================================================
>>>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>>>> > ===============================================================
>>>> > 1. It is OK for documentation patches to target 1.6.0 and still go
>>>> into
>>>> > branch-1.6, since documentations will be published separately from the
>>>> > release.
>>>> > 2. New features for non-alpha-modules should target 1.7+.
>>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>>> target
>>>> > version.
>>>> >
>>>> >
>>>> > ==================================================
>>>> > == Major changes to help you focus your testing ==
>>>> > ==================================================
>>>> >
>>>> > Notable changes since 1.6 RC3
>>>> >
>>>> >
>>>> >   - SPARK-12404 - Fix serialization error for Datasets with
>>>> > Timestamps/Arrays/Decimal
>>>> >   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>>>> >   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>>>> >   - SPARK-12413 - Fix mesos HA
>>>> >
>>>> >
>>>> > Notable changes since 1.6 RC2
>>>> >
>>>> >
>>>> > - SPARK_VERSION has been set correctly
>>>> > - SPARK-12199 ML Docs are publishing correctly
>>>> > - SPARK-12345 Mesos cluster mode has been fixed
>>>> >
>>>> > Notable changes since 1.6 RC1
>>>> >
>>>> > Spark Streaming
>>>> >
>>>> > SPARK-2629  trackStateByKey has been renamed to mapWithState
>>>> >
>>>> > Spark SQL
>>>> >
>>>> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
>>>> execution.
>>>> > SPARK-12258 correct passing null into ScalaUDF
>>>> >
>>>> > Notable Features Since 1.5
>>>> >
>>>> > Spark SQL
>>>> >
>>>> > SPARK-11787 Parquet Performance - Improve Parquet scan performance
>>>> when
>>>> > using flat schemas.
>>>> > SPARK-10810 Session Management - Isolated devault database (i.e USE
>>>> mydb)
>>>> > even on shared clusters.
>>>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>>>> performs
>>>> > many operations on serialized binary data and code generation (i.e.
>>>> Project
>>>> > Tungsten).
>>>> > SPARK-10000 Unified Memory Management - Shared memory for execution
>>>> and
>>>> > caching instead of exclusive division of the regions.
>>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>>>> queries
>>>> > over files of any supported format without registering a table.
>>>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>>> > SPARK-10412 Per-operator Metrics for SQL Execution - Display
>>>> statistics on a
>>>> > peroperator basis for memory usage and spilled data size.
>>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>>>> nest and
>>>> > unest arbitrary numbers of columns
>>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>>>> Significant
>>>> > (up to 14x) speed up when caching data that contains complex types in
>>>> > DataFrames or SQL.
>>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
>>>> (<=>) will
>>>> > now execute using SortMergeJoin instead of computing a cartisian
>>>> product.
>>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>>>> configuring
>>>> > query execution to occur using off-heap memory to avoid GC overhead
>>>> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
>>>> > datasource with filter pushdown, developers can now tell Spark SQL to
>>>> avoid
>>>> > double evaluating a pushed-down filter.
>>>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>>>> > ordering schemes in In-memory table scan, and adding distributeBy and
>>>> > localSort to DF API
>>>> > SPARK-9858  Adaptive query execution - Intial support for
>>>> automatically
>>>> > selecting the number of reducers for joins and aggregations.
>>>> > SPARK-9241  Improved query planner for queries having distinct
>>>> aggregations
>>>> > - Query plans of distinct aggregations are more robust when distinct
>>>> columns
>>>> > have high cardinality.
>>>> >
>>>> > Spark Streaming
>>>> >
>>>> > API Updates
>>>> >
>>>> > SPARK-2629  New improved state management - mapWithState - a DStream
>>>> > transformation for stateful stream processing, supercedes
>>>> updateStateByKey
>>>> > in functionality and performance.
>>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>>>> > KPL-aggregated records.
>>>> > SPARK-10891 Kinesis message handler function - Allows arbitraray
>>>> function to
>>>> > be applied to a Kinesis record in the Kinesis receiver before to
>>>> customize
>>>> > what data is to be stored in memory.
>>>> > SPARK-6328  Python Streamng Listener API - Get streaming statistics
>>>> > (scheduling delays, batch processing times, etc.) in streaming.
>>>> >
>>>> > UI Improvements
>>>> >
>>>> > Made failures visible in the streaming tab, in the timelines, batch
>>>> list,
>>>> > and batch details page.
>>>> > Made output operations visible in the streaming tab as progress bars.
>>>> >
>>>> > MLlib
>>>> >
>>>> > New algorithms/models
>>>> >
>>>> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
>>>> > SPARK-9834  Normal equation for least squares - Normal equation
>>>> solver,
>>>> > providing R-like model summary statistics
>>>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>>>> Streaming
>>>> > framework
>>>> > SPARK-9930  New feature transformers - ChiSqSelector,
>>>> QuantileDiscretizer,
>>>> > SQL transformer
>>>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>>>> variant
>>>> > of K-Means
>>>> >
>>>> > API improvements
>>>> >
>>>> > ML Pipelines
>>>> >
>>>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>>>> partial
>>>> > coverage of spark.mlalgorithms
>>>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation
>>>> in ML
>>>> > Pipelines
>>>> >
>>>> > R API
>>>> >
>>>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>>>> ordinary
>>>> > least squares via summary(model)
>>>> > SPARK-9681  Feature interactions in R formula - Interaction operator
>>>> ":" in
>>>> > R formula
>>>> >
>>>> > Python API - Many improvements to Python API to approach feature
>>>> parity
>>>> >
>>>> > Misc improvements
>>>> >
>>>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
>>>> Linear
>>>> > Regression can take instance weights
>>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>>>> DataFrames -
>>>> > Variance, stddev, correlations, etc.
>>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>>>> >
>>>> > Documentation improvements
>>>> >
>>>> > SPARK-7751  @since versions - Documentation includes initial version
>>>> when
>>>> > classes and methods were added
>>>> > SPARK-11337 Testable example code - Automated testing for code in
>>>> user guide
>>>> > examples
>>>> >
>>>> > Deprecations
>>>> >
>>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>> deprecated.
>>>> > In spark.ml.classification.LogisticRegressionModel and
>>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has
>>>> been
>>>> > deprecated, in favor of the new name "coefficients." This helps
>>>> disambiguate
>>>> > from instance (row) weights given to algorithms.
>>>> >
>>>> > Changes of behavior
>>>> >
>>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>> semantics in
>>>> > 1.6. Previously, it was a threshold for absolute change in error.
>>>> Now, it
>>>> > resembles the behavior of GradientDescent convergenceTol: For large
>>>> errors,
>>>> > it uses relative error (relative to the previous error); for small
>>>> errors (<
>>>> > 0.01), it uses absolute error.
>>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>> strings to
>>>> > lowercase before tokenizing. Now, it converts to lowercase by
>>>> default, with
>>>> > an option not to. This matches the behavior of the simpler Tokenizer
>>>> > transformer.
>>>> > Spark SQL's partition discovery has been changed to only discover
>>>> partition
>>>> > directories that are children of the given path. (i.e. if
>>>> > path="/my/data/x=1" then x=1 will no longer be considered a partition
>>>> but
>>>> > only children of x=1.) This behavior can be overridden by manually
>>>> > specifying the basePath that partitioning discovery should start with
>>>> > (SPARK-11678).
>>>> > When casting a value of an integral type to timestamp (e.g. casting a
>>>> long
>>>> > value to timestamp), the value is treated as being in seconds instead
>>>> of
>>>> > milliseconds (SPARK-11724).
>>>> > With the improved query planner for queries having distinct
>>>> aggregations
>>>> > (SPARK-9241), the plan of a query having a single distinct
>>>> aggregation has
>>>> > been changed to a more robust version. To switch back to the plan
>>>> generated
>>>> > by Spark 1.5's planner, please set
>>>> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> --
>>> Iulian Dragos
>>>
>>> ------
>>> Reactive Apps on the JVM
>>> www.typesafe.com
>>>
>>>
>>

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Reply via email to