Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

vaquar khan Fri, 25 Dec 2015 07:02:17 -0800

+1
On 24 Dec 2015 22:01, "Vinay Shukla" <[email protected]> wrote:


> +1
> Tested on HDP 2.3, YARN cluster mode, spark-shell
>
> On Wed, Dec 23, 2015 at 6:14 AM, Allen Zhang <[email protected]>
> wrote:
>
>>
>> +1 (non-binding)
>>
>> I have just tarball a new binary and tested am.nodelabelexpression and
>> executor.nodelabelexpression manully, result is expected.
>>
>>
>>
>>
>> At 2015-12-23 21:44:08, "Iulian Dragoș" <[email protected]>
>> wrote:
>>
>> +1 (non-binding)
>>
>> Tested Mesos deployments (client and cluster-mode, fine-grained and
>> coarse-grained). Things look good
>> <https://ci.typesafe.com/view/Spark/job/mit-docker-test-ref/8/console>.
>>
>> iulian
>>
>> On Wed, Dec 23, 2015 at 2:35 PM, Sean Owen <[email protected]> wrote:
>>
>>> Docker integration tests still fail for Mark and I, and should
>>> probably be disabled:
>>> https://issues.apache.org/jira/browse/SPARK-12426
>>>
>>> ... but if anyone else successfully runs these (and I assume Jenkins
>>> does) then not a blocker.
>>>
>>> I'm having intermittent trouble with other tests passing, but nothing
>>> unusual.
>>> Sigs and hashes are OK.
>>>
>>> We have 30 issues fixed for 1.6.1. All but those resolved in the last
>>> 24 hours or so should be fixed for 1.6.0 right? I can touch that up.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Dec 22, 2015 at 8:10 PM, Michael Armbrust
>>> <[email protected]> wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > 1.6.0!
>>> >
>>> > The vote is open until Friday, December 25, 2015 at 18:00 UTC and
>>> passes if
>>> > a majority of at least 3 +1 PMC votes are cast.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 1.6.0
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v1.6.0-rc4
>>> > (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>>> >
>>> > Release artifacts are signed with the following key:
>>> > https://people.apache.org/keys/committer/pwendell.asc
>>> >
>>> > The staging repository for this release can be found at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1176/
>>> >
>>> > The test repository (versioned as v1.6.0-rc4) for this release can be
>>> found
>>> > at:
>>> >
>>> https://repository.apache.org/content/repositories/orgapachespark-1175/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> >
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>>> >
>>> > =======================================
>>> > == How can I help test this release? ==
>>> > =======================================
>>> > If you are a Spark user, you can help us test this release by taking an
>>> > existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > ================================================
>>> > == What justifies a -1 vote for this release? ==
>>> > ================================================
>>> > This vote is happening towards the end of the 1.6 QA period, so -1
>>> votes
>>> > should only occur for significant regressions from 1.5. Bugs already
>>> present
>>> > in 1.5, minor regressions, or bugs related to new features will not
>>> block
>>> > this release.
>>> >
>>> > ===============================================================
>>> > == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> > ===============================================================
>>> > 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> > branch-1.6, since documentations will be published separately from the
>>> > release.
>>> > 2. New features for non-alpha-modules should target 1.7+.
>>> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target
>>> > version.
>>> >
>>> >
>>> > ==================================================
>>> > == Major changes to help you focus your testing ==
>>> > ==================================================
>>> >
>>> > Notable changes since 1.6 RC3
>>> >
>>> >
>>> >   - SPARK-12404 - Fix serialization error for Datasets with
>>> > Timestamps/Arrays/Decimal
>>> >   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>>> >   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>>> >   - SPARK-12413 - Fix mesos HA
>>> >
>>> >
>>> > Notable changes since 1.6 RC2
>>> >
>>> >
>>> > - SPARK_VERSION has been set correctly
>>> > - SPARK-12199 ML Docs are publishing correctly
>>> > - SPARK-12345 Mesos cluster mode has been fixed
>>> >
>>> > Notable changes since 1.6 RC1
>>> >
>>> > Spark Streaming
>>> >
>>> > SPARK-2629  trackStateByKey has been renamed to mapWithState
>>> >
>>> > Spark SQL
>>> >
>>> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
>>> execution.
>>> > SPARK-12258 correct passing null into ScalaUDF
>>> >
>>> > Notable Features Since 1.5
>>> >
>>> > Spark SQL
>>> >
>>> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
>>> > using flat schemas.
>>> > SPARK-10810 Session Management - Isolated devault database (i.e USE
>>> mydb)
>>> > even on shared clusters.
>>> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that
>>> performs
>>> > many operations on serialized binary data and code generation (i.e.
>>> Project
>>> > Tungsten).
>>> > SPARK-10000 Unified Memory Management - Shared memory for execution and
>>> > caching instead of exclusive division of the regions.
>>> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL
>>> queries
>>> > over files of any supported format without registering a table.
>>> > SPARK-11745 Reading non-standard JSON files - Added options to read
>>> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
>>> > SPARK-10412 Per-operator Metrics for SQL Execution - Display
>>> statistics on a
>>> > peroperator basis for memory usage and spilled data size.
>>> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to
>>> nest and
>>> > unest arbitrary numbers of columns
>>> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
>>> Significant
>>> > (up to 14x) speed up when caching data that contains complex types in
>>> > DataFrames or SQL.
>>> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality
>>> (<=>) will
>>> > now execute using SortMergeJoin instead of computing a cartisian
>>> product.
>>> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for
>>> configuring
>>> > query execution to occur using off-heap memory to avoid GC overhead
>>> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
>>> > datasource with filter pushdown, developers can now tell Spark SQL to
>>> avoid
>>> > double evaluating a pushed-down filter.
>>> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
>>> > ordering schemes in In-memory table scan, and adding distributeBy and
>>> > localSort to DF API
>>> > SPARK-9858  Adaptive query execution - Intial support for automatically
>>> > selecting the number of reducers for joins and aggregations.
>>> > SPARK-9241  Improved query planner for queries having distinct
>>> aggregations
>>> > - Query plans of distinct aggregations are more robust when distinct
>>> columns
>>> > have high cardinality.
>>> >
>>> > Spark Streaming
>>> >
>>> > API Updates
>>> >
>>> > SPARK-2629  New improved state management - mapWithState - a DStream
>>> > transformation for stateful stream processing, supercedes
>>> updateStateByKey
>>> > in functionality and performance.
>>> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
>>> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
>>> > KPL-aggregated records.
>>> > SPARK-10891 Kinesis message handler function - Allows arbitraray
>>> function to
>>> > be applied to a Kinesis record in the Kinesis receiver before to
>>> customize
>>> > what data is to be stored in memory.
>>> > SPARK-6328  Python Streamng Listener API - Get streaming statistics
>>> > (scheduling delays, batch processing times, etc.) in streaming.
>>> >
>>> > UI Improvements
>>> >
>>> > Made failures visible in the streaming tab, in the timelines, batch
>>> list,
>>> > and batch details page.
>>> > Made output operations visible in the streaming tab as progress bars.
>>> >
>>> > MLlib
>>> >
>>> > New algorithms/models
>>> >
>>> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
>>> > SPARK-9834  Normal equation for least squares - Normal equation solver,
>>> > providing R-like model summary statistics
>>> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
>>> Streaming
>>> > framework
>>> > SPARK-9930  New feature transformers - ChiSqSelector,
>>> QuantileDiscretizer,
>>> > SQL transformer
>>> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
>>> variant
>>> > of K-Means
>>> >
>>> > API improvements
>>> >
>>> > ML Pipelines
>>> >
>>> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
>>> partial
>>> > coverage of spark.mlalgorithms
>>> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation
>>> in ML
>>> > Pipelines
>>> >
>>> > R API
>>> >
>>> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
>>> ordinary
>>> > least squares via summary(model)
>>> > SPARK-9681  Feature interactions in R formula - Interaction operator
>>> ":" in
>>> > R formula
>>> >
>>> > Python API - Many improvements to Python API to approach feature parity
>>> >
>>> > Misc improvements
>>> >
>>> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and
>>> Linear
>>> > Regression can take instance weights
>>> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
>>> DataFrames -
>>> > Variance, stddev, correlations, etc.
>>> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
>>> >
>>> > Documentation improvements
>>> >
>>> > SPARK-7751  @since versions - Documentation includes initial version
>>> when
>>> > classes and methods were added
>>> > SPARK-11337 Testable example code - Automated testing for code in user
>>> guide
>>> > examples
>>> >
>>> > Deprecations
>>> >
>>> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>> deprecated.
>>> > In spark.ml.classification.LogisticRegressionModel and
>>> > spark.ml.regression.LinearRegressionModel, the "weights" field has been
>>> > deprecated, in favor of the new name "coefficients." This helps
>>> disambiguate
>>> > from instance (row) weights given to algorithms.
>>> >
>>> > Changes of behavior
>>> >
>>> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>> semantics in
>>> > 1.6. Previously, it was a threshold for absolute change in error. Now,
>>> it
>>> > resembles the behavior of GradientDescent convergenceTol: For large
>>> errors,
>>> > it uses relative error (relative to the previous error); for small
>>> errors (<
>>> > 0.01), it uses absolute error.
>>> > spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>> strings to
>>> > lowercase before tokenizing. Now, it converts to lowercase by default,
>>> with
>>> > an option not to. This matches the behavior of the simpler Tokenizer
>>> > transformer.
>>> > Spark SQL's partition discovery has been changed to only discover
>>> partition
>>> > directories that are children of the given path. (i.e. if
>>> > path="/my/data/x=1" then x=1 will no longer be considered a partition
>>> but
>>> > only children of x=1.) This behavior can be overridden by manually
>>> > specifying the basePath that partitioning discovery should start with
>>> > (SPARK-11678).
>>> > When casting a value of an integral type to timestamp (e.g. casting a
>>> long
>>> > value to timestamp), the value is treated as being in seconds instead
>>> of
>>> > milliseconds (SPARK-11724).
>>> > With the improved query planner for queries having distinct
>>> aggregations
>>> > (SPARK-9241), the plan of a query having a single distinct aggregation
>>> has
>>> > been changed to a more robust version. To switch back to the plan
>>> generated
>>> > by Spark 1.5's planner, please set
>>> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>>
>> --
>>
>> --
>> Iulian Dragos
>>
>> ------
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>>
>

Re: [VOTE] Release Apache Spark 1.6.0 (RC4)

Reply via email to