In order to facilitate community testing of Spark 1.6.0, I'm excited to announce the availability of an early preview of the release. This is not a release candidate, so there is no voting involved. However, it'd be awesome if community members can start testing with this preview package and report any problems they encounter.
This preview package contains all the commits to branch-1.6 <https://github.com/apache/spark/tree/branch-1.6> till commit 308381420f51b6da1007ea09a02d740613a226e0 <https://github.com/apache/spark/tree/v1.6.0-preview2>. The staging maven repository for this preview build can be found here: https://repository.apache.org/content/repositories/orgapachespark-1162 Binaries for this preview build can be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-bin/ A build of the docs can also be found here: http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-preview2-docs/ The full change log for this release can be found on JIRA <https://issues.apache.org/jira/browse/SPARK-11908?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.6.0> . *== How can you help? ==* If you are a Spark user, you can help us test this release by taking a Spark workload and running on this preview release, then reporting any regressions. *== Major Features ==* When testing, we'd appreciate it if users could focus on areas that have changed in this release. Some notable new features include: SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> *Parquet Performance* - Improve Parquet scan performance when using flat schemas. SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810> *Session * *Management* - Multiple users of the thrift (JDBC/ODBC) server now have isolated sessions including their own default database (i.e USE mydb) even on shared clusters. SPARK-9999 <https://issues.apache.org/jira/browse/SPARK-9999> *Dataset API* - A new, experimental type-safe API (similar to RDDs) that performs many operations on serialized binary data and code generation (i.e. Project Tungsten) SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> *Unified Memory Management* - Shared memory for execution and caching instead of exclusive division of the regions. SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> *Datasource API Avoid Double Filter* - When implementing a datasource with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter. SPARK-2629 <https://issues.apache.org/jira/browse/SPARK-2629> *New improved state management* - trackStateByKey - a DStream transformation for stateful stream processing, supersedes updateStateByKey in functionality and performance. Happy testing! Michael