0 Currently figuring out who is responsible for the regression that I am seeing in some user code ScalaUDFs that make use of Timestamps and where NULL from a CSV file read in via a TestHive#registerTestTable is now producing 1969-12-31 23:59:59.999999 instead of null.
On Thu, Dec 3, 2015 at 1:57 PM, Sean Owen <so...@cloudera.com> wrote: > Licenses and signature are all fine. > > Docker integration tests consistently fail for me with Java 7 / Ubuntu > and "-Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver" > > *** RUN ABORTED *** > java.lang.NoSuchMethodError: > > org.apache.http.impl.client.HttpClientBuilder.setConnectionManagerShared(Z)Lorg/apache/http/impl/client/HttpClientBuilder; > at > org.glassfish.jersey.apache.connector.ApacheConnector.<init>(ApacheConnector.java:240) > at > org.glassfish.jersey.apache.connector.ApacheConnectorProvider.getConnector(ApacheConnectorProvider.java:115) > at > org.glassfish.jersey.client.ClientConfig$State.initRuntime(ClientConfig.java:418) > at > org.glassfish.jersey.client.ClientConfig$State.access$000(ClientConfig.java:88) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:120) > at > org.glassfish.jersey.client.ClientConfig$State$3.get(ClientConfig.java:117) > at > org.glassfish.jersey.internal.util.collection.Values$LazyValueImpl.get(Values.java:340) > at > org.glassfish.jersey.client.ClientConfig.getRuntime(ClientConfig.java:726) > at > org.glassfish.jersey.client.ClientRequest.getConfiguration(ClientRequest.java:285) > at > org.glassfish.jersey.client.JerseyInvocation.validateHttpMethodAndEntity(JerseyInvocation.java:126) > > I also get this failure consistently: > > DirectKafkaStreamSuite > - offset recovery *** FAILED *** > recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time, > Array[org.apache.spark.streaming.kafka.OffsetRange])) => > > earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time, > > scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1, > > scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange])))) > was false Recovered ranges are not the same as the ones generated > (DirectKafkaStreamSuite.scala:301) > > On Wed, Dec 2, 2015 at 8:26 PM, Michael Armbrust <mich...@databricks.com> > wrote: > > Please vote on releasing the following candidate as Apache Spark version > > 1.6.0! > > > > The vote is open until Saturday, December 5, 2015 at 21:00 UTC and > passes if > > a majority of at least 3 +1 PMC votes are cast. > > > > [ ] +1 Release this package as Apache Spark 1.6.0 > > [ ] -1 Do not release this package because ... > > > > To learn more about Apache Spark, please see http://spark.apache.org/ > > > > The tag to be voted on is v1.6.0-rc1 > > (bf525845cef159d2d4c9f4d64e158f037179b5c4) > > > > The release files, including signatures, digests, etc. can be found at: > > http://people.apache.org/~pwendell/spark-releases/spark-v1.6.0-rc1-bin/ > > > > Release artifacts are signed with the following key: > > https://people.apache.org/keys/committer/pwendell.asc > > > > The staging repository for this release can be found at: > > https://repository.apache.org/content/repositories/orgapachespark-1165/ > > > > The test repository (versioned as v1.6.0-rc1) for this release can be > found > > at: > > https://repository.apache.org/content/repositories/orgapachespark-1164/ > > > > The documentation corresponding to this release can be found at: > > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc1-docs/ > > > > > > ======================================= > > == How can I help test this release? == > > ======================================= > > If you are a Spark user, you can help us test this release by taking an > > existing Spark workload and running on this release candidate, then > > reporting any regressions. > > > > ================================================ > > == What justifies a -1 vote for this release? == > > ================================================ > > This vote is happening towards the end of the 1.6 QA period, so -1 votes > > should only occur for significant regressions from 1.5. Bugs already > present > > in 1.5, minor regressions, or bugs related to new features will not block > > this release. > > > > =============================================================== > > == What should happen to JIRA tickets still targeting 1.6.0? == > > =============================================================== > > 1. It is OK for documentation patches to target 1.6.0 and still go into > > branch-1.6, since documentations will be published separately from the > > release. > > 2. New features for non-alpha-modules should target 1.7+. > > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target > > version. > > > > > > ================================================== > > == Major changes to help you focus your testing == > > ================================================== > > > > Spark SQL > > > > SPARK-10810 Session Management - The ability to create multiple isolated > SQL > > Contexts that have their own configuration and default database. This is > > turned on by default in the thrift server. > > SPARK-9999 Dataset API - A type-safe API (similar to RDDs) that performs > > many operations on serialized binary data and code generation (i.e. > Project > > Tungsten). > > SPARK-10000 Unified Memory Management - Shared memory for execution and > > caching instead of exclusive division of the regions. > > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries > > over files of any supported format without registering a table. > > SPARK-11745 Reading non-standard JSON files - Added options to read > > non-standard JSON files (e.g. single-quotes, unquoted attributes) > > SPARK-10412 Per-operator Metics for SQL Execution - Display statistics > on a > > per-operator basis for memory usage and spilled data size. > > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest > and > > unest arbitrary numbers of columns > > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance - > Significant > > (up to 14x) speed up when caching data that contains complex types in > > DataFrames or SQL. > > SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>) > will > > now execute using SortMergeJoin instead of computing a cartisian product. > > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring > > query execution to occur using off-heap memory to avoid GC overhead > > SPARK-10978 Datasource API Avoid Double Filter - When implementing a > > datasource with filter pushdown, developers can now tell Spark SQL to > avoid > > double evaluating a pushed-down filter. > > SPARK-4849 Advanced Layout of Cached Data - storing partitioning and > > ordering schemes in In-memory table scan, and adding distributeBy and > > localSort to DF API > > SPARK-9858 Adaptive query execution - Initial support for automatically > > selecting the number of reducers for joins and aggregations. > > > > Spark Streaming > > > > API Updates > > > > SPARK-2629 New improved state management - trackStateByKey - a DStream > > transformation for stateful stream processing, supersedes > updateStateByKey > > in functionality and performance. > > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been > > upgraded to use KCL 1.4.0 and supports transparent deaggregation of > > KPL-aggregated records. > > SPARK-10891 Kinesis message handler function - Allows arbitrary function > to > > be applied to a Kinesis record in the Kinesis receiver before to > customize > > what data is to be stored in memory. > > SPARK-6328 Python Streaming Listener API - Get streaming statistics > > (scheduling delays, batch processing times, etc.) in streaming. > > > > UI Improvements > > > > Made failures visible in the streaming tab, in the timelines, batch list, > > and batch details page. > > Made output operations visible in the streaming tab as progress bars > > > > MLlib > > > > New algorithms/models > > > > SPARK-8518 Survival analysis - Log-linear model for survival analysis > > SPARK-9834 Normal equation for least squares - Normal equation solver, > > providing R-like model summary statistics > > SPARK-3147 Online hypothesis testing - A/B testing in the Spark > Streaming > > framework > > SPARK-9930 New feature transformers - ChiSqSelector, > QuantileDiscretizer, > > SQL transformer > > SPARK-6517 Bisecting K-Means clustering - Fast top-down clustering > variant > > of K-Means > > > > API improvements > > > > ML Pipelines > > > > SPARK-6725 Pipeline persistence - Save/load for ML Pipelines, with > partial > > coverage of spark.ml algorithms > > SPARK-5565 LDA in ML Pipelines - API for Latent Dirichlet Allocation in > ML > > Pipelines > > > > R API > > > > SPARK-9836 R-like statistics for GLMs - (Partial) R-like stats for > ordinary > > least squares via summary(model) > > SPARK-9681 Feature interactions in R formula - Interaction operator ":" > in > > R formula > > > > Python API - Many improvements to Python API to approach feature parity > > > > Misc improvements > > > > SPARK-7685 , SPARK-9642 Instance weights for GLMs - Logistic and Linear > > Regression can take instance weights > > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in > DataFrames - > > Variance, stddev, correlations, etc. > > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source > > > > Documentation improvements > > > > SPARK-7751 @since versions - Documentation includes initial version when > > classes and methods were added > > SPARK-11337 Testable example code - Automated testing for code in user > guide > > examples > > > > Deprecations > > > > In spark.mllib.clustering.KMeans, the "runs" parameter has been > deprecated. > > In spark.ml.classification.LogisticRegressionModel and > > spark.ml.regression.LinearRegressionModel, the "weights" field has been > > deprecated, in favor of the new name "coefficients." This helps > disambiguate > > from instance (row) weights given to algorithms. > > > > Changes of behavior > > > > spark.mllib.tree.GradientBoostedTrees validationTol has changed > semantics in > > 1.6. Previously, it was a threshold for absolute change in error. Now, it > > resembles the behavior of GradientDescent convergenceTol: For large > errors, > > it uses relative error (relative to the previous error); for small > errors (< > > 0.01), it uses absolute error. > > spark.ml.feature.RegexTokenizer: Previously, it did not convert strings > to > > lowercase before tokenizing. Now, it converts to lowercase by default, > with > > an option not to. This matches the behavior of the simpler Tokenizer > > transformer. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >