For me, mostly the same as before: tests are mostly passing, but I can never get the docker tests to pass. If anyone knows a special profile or package that needs to be enabled, I can try that and/or fix/document it. Just wondering if it's me.
I'm on Java 7 + Ubuntu 15.10, with -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 On Wed, Dec 16, 2015 at 9:32 PM, Michael Armbrust <mich...@databricks.com> wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.0! > > The vote is open until Saturday, December 19, 2015 at 18:00 UTC and passes > if a majority of at least 3 +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Spark 1.6.0 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v1.6.0-rc3 > (168c89e07c51fa24b0bb88582c739cec0acb44d7) > > The release files, including signatures, digests, etc. can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1174/ > > The test repository (versioned as v1.6.0-rc3) for this release can be found > at: > https://repository.apache.org/content/repositories/orgapachespark-1173/ > > The documentation corresponding to this release can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/ > > ======================================= > == How can I help test this release? == > ======================================= > If you are a Spark user, you can help us test this release by taking an > existing Spark workload and running on this release candidate, then > reporting any regressions. > > ================================================ > == What justifies a -1 vote for this release? == > ================================================ > This vote is happening towards the end of the 1.6 QA period, so -1 votes > should only occur for significant regressions from 1.5. Bugs already present > in 1.5, minor regressions, or bugs related to new features will not block > this release. > > =============================================================== > == What should happen to JIRA tickets still targeting 1.6.0? == > =============================================================== > 1. It is OK for documentation patches to target 1.6.0 and still go into > branch-1.6, since documentations will be published separately from the > release. > 2. New features for non-alpha-modules should target 1.7+. > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target > version. > > > ================================================== > == Major changes to help you focus your testing == > ================================================== > > Notable changes since 1.6 RC2 > > > - SPARK_VERSION has been set correctly > - SPARK-12199 ML Docs are publishing correctly > - SPARK-12345 Mesos cluster mode has been fixed > > Notable changes since 1.6 RC1 > > Spark Streaming > > SPARK-2629 trackStateByKey has been renamed to mapWithState > > Spark SQL > > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by execution. > SPARK-12258 correct passing null into ScalaUDF > > Notable Features Since 1.5 > > Spark SQL > > SPARK-11787 Parquet Performance - Improve Parquet scan performance when > using flat schemas. > SPARK-10810 Session Management - Isolated devault database (i.e USE mydb) > even on shared clusters. > SPARK-9999 Dataset API - A type-safe API (similar to RDDs) that performs > many operations on serialized binary data and code generation (i.e. Project > Tungsten). > SPARK-10000 Unified Memory Management - Shared memory for execution and > caching instead of exclusive division of the regions. > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries > over files of any supported format without registering a table. > SPARK-11745 Reading non-standard JSON files - Added options to read > non-standard JSON files (e.g. single-quotes, unquoted attributes) > SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics on a > peroperator basis for memory usage and spilled data size. > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest and > unest arbitrary numbers of columns > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - Significant > (up to 14x) speed up when caching data that contains complex types in > DataFrames or SQL. > SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>) will > now execute using SortMergeJoin instead of computing a cartisian product. > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring > query execution to occur using off-heap memory to avoid GC overhead > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a > datasource with filter pushdown, developers can now tell Spark SQL to avoid > double evaluating a pushed-down filter. > SPARK-4849 Advanced Layout of Cached Data - storing partitioning and > ordering schemes in In-memory table scan, and adding distributeBy and > localSort to DF API > SPARK-9858 Adaptive query execution - Intial support for automatically > selecting the number of reducers for joins and aggregations. > SPARK-9241 Improved query planner for queries having distinct aggregations > - Query plans of distinct aggregations are more robust when distinct columns > have high cardinality. > > Spark Streaming > > API Updates > > SPARK-2629 New improved state management - mapWithState - a DStream > transformation for stateful stream processing, supercedes updateStateByKey > in functionality and performance. > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been > upgraded to use KCL 1.4.0 and supports transparent deaggregation of > KPL-aggregated records. > SPARK-10891 Kinesis message handler function - Allows arbitraray function to > be applied to a Kinesis record in the Kinesis receiver before to customize > what data is to be stored in memory. > SPARK-6328 Python Streamng Listener API - Get streaming statistics > (scheduling delays, batch processing times, etc.) in streaming. > > UI Improvements > > Made failures visible in the streaming tab, in the timelines, batch list, > and batch details page. > Made output operations visible in the streaming tab as progress bars. > > MLlib > > New algorithms/models > > SPARK-8518 Survival analysis - Log-linear model for survival analysis > SPARK-9834 Normal equation for least squares - Normal equation solver, > providing R-like model summary statistics > SPARK-3147 Online hypothesis testing - A/B testing in the Spark Streaming > framework > SPARK-9930 New feature transformers - ChiSqSelector, QuantileDiscretizer, > SQL transformer > SPARK-6517 Bisecting K-Means clustering - Fast top-down clustering variant > of K-Means > > API improvements > > ML Pipelines > > SPARK-6725 Pipeline persistence - Save/load for ML Pipelines, with partial > coverage of spark.mlalgorithms > SPARK-5565 LDA in ML Pipelines - API for Latent Dirichlet Allocation in ML > Pipelines > > R API > > SPARK-9836 R-like statistics for GLMs - (Partial) R-like stats for ordinary > least squares via summary(model) > SPARK-9681 Feature interactions in R formula - Interaction operator ":" in > R formula > > Python API - Many improvements to Python API to approach feature parity > > Misc improvements > > SPARK-7685 , SPARK-9642 Instance weights for GLMs - Logistic and Linear > Regression can take instance weights > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in DataFrames - > Variance, stddev, correlations, etc. > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source > > Documentation improvements > > SPARK-7751 @since versions - Documentation includes initial version when > classes and methods were added > SPARK-11337 Testable example code - Automated testing for code in user guide > examples > > Deprecations > > In spark.mllib.clustering.KMeans, the "runs" parameter has been deprecated. > In spark.ml.classification.LogisticRegressionModel and > spark.ml.regression.LinearRegressionModel, the "weights" field has been > deprecated, in favor of the new name "coefficients." This helps disambiguate > from instance (row) weights given to algorithms. > > Changes of behavior > > spark.mllib.tree.GradientBoostedTrees validationTol has changed semantics in > 1.6. Previously, it was a threshold for absolute change in error. Now, it > resembles the behavior of GradientDescent convergenceTol: For large errors, > it uses relative error (relative to the previous error); for small errors (< > 0.01), it uses absolute error. > spark.ml.feature.RegexTokenizer: Previously, it did not convert strings to > lowercase before tokenizing. Now, it converts to lowercase by default, with > an option not to. This matches the behavior of the simpler Tokenizer > transformer. > Spark SQL's partition discovery has been changed to only discover partition > directories that are children of the given path. (i.e. if > path="/my/data/x=1" then x=1 will no longer be considered a partition but > only children of x=1.) This behavior can be overridden by manually > specifying the basePath that partitioning discovery should start with > (SPARK-11678). > When casting a value of an integral type to timestamp (e.g. casting a long > value to timestamp), the value is treated as being in seconds instead of > milliseconds (SPARK-11724). > With the improved query planner for queries having distinct aggregations > (SPARK-9241), the plan of a query having a single distinct aggregation has > been changed to a more robust version. To switch back to the plan generated > by Spark 1.5's planner, please set > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077). --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org