Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

chester Tue, 01 Sep 2015 06:49:30 -0700

Sorry, I am still not follow. I assume the release would build from 1.5.0 
before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from 1.5.1 
snapshot during release ? Or 1.5.0 rc3 would build from the last commit of 
1.5.0 (before changing to 1.5.1 snapshot) ?




Sent from my iPad

> On Sep 1, 2015, at 1:52 AM, Sean Owen <so...@cloudera.com> wrote:
> 
> That's correct for the 1.5 branch, right? this doesn't mean that the
> next RC would have this value. You choose the release version during
> the release process.
> 
>> On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen <ches...@alpinenow.com> wrote:
>> Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT,
>> 
>> I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ?
>> 
>> Chester
>> 
>>> On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin <r...@databricks.com> wrote:
>>> 
>>> I'm going to -1 the release myself since the issue @yhuai identified is
>>> pretty serious. It basically OOMs the driver for reading any files with a
>>> large number of partitions. Looks like the patch for that has already been
>>> merged.
>>> 
>>> I'm going to cut rc3 momentarily.
>>> 
>>> 
>>> On Sun, Aug 30, 2015 at 11:30 AM, Sandy Ryza <sandy.r...@cloudera.com>
>>> wrote:
>>>> 
>>>> +1 (non-binding)
>>>> built from source and ran some jobs against YARN
>>>> 
>>>> -Sandy
>>>> 
>>>> On Sat, Aug 29, 2015 at 5:50 AM, vaquar khan <vaquar.k...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> 
>>>>> +1 (1.5.0 RC2)Compiled on Windows with YARN.
>>>>> 
>>>>> Regards,
>>>>> Vaquar khan
>>>>> 
>>>>> +1 (non-binding, of course)
>>>>> 
>>>>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min
>>>>>     mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
>>>>> 2. Tested pyspark, mllib
>>>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>>>>> 2.2. Linear/Ridge/Laso Regression OK
>>>>> 2.3. Decision Tree, Naive Bayes OK
>>>>> 2.4. KMeans OK
>>>>>       Center And Scale OK
>>>>> 2.5. RDD operations OK
>>>>>      State of the Union Texts - MapReduce, Filter,sortByKey (word
>>>>> count)
>>>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>>>       Model evaluation/optimization (rank, numIter, lambda) with
>>>>> itertools OK
>>>>> 3. Scala - MLlib
>>>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>>>>> 3.2. LinearRegressionWithSGD OK
>>>>> 3.3. Decision Tree OK
>>>>> 3.4. KMeans OK
>>>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>>> 3.6. saveAsParquetFile OK
>>>>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
>>>>> registerTempTable, sql OK
>>>>> 3.8. result = sqlContext.sql("SELECT
>>>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>>>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>>>>> 4.0. Spark SQL from Python OK
>>>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'")
>>>>> OK
>>>>> 5.0. Packages
>>>>> 5.1. com.databricks.spark.csv - read/write OK
>>>>> (--packages com.databricks:spark-csv_2.11:1.2.0-s_2.11 didn’t work. But
>>>>> com.databricks:spark-csv_2.11:1.2.0 worked)
>>>>> 6.0. DataFrames
>>>>> 6.1. cast,dtypes OK
>>>>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
>>>>> 6.3. joins,sql,set operations,udf OK
>>>>> 
>>>>> Cheers
>>>>> <k/>
>>>>> 
>>>>> On Tue, Aug 25, 2015 at 9:28 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>>> 
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 1.5.0. The vote is open until Friday, Aug 29, 2015 at 5:00 UTC 
>>>>>> and
>>>>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>> 
>>>>>> [ ] +1 Release this package as Apache Spark 1.5.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>> 
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>> 
>>>>>> 
>>>>>> The tag to be voted on is v1.5.0-rc2:
>>>>>> 
>>>>>> https://github.com/apache/spark/tree/727771352855dbb780008c449a877f5aaa5fc27a
>>>>>> 
>>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/
>>>>>> 
>>>>>> Release artifacts are signed with the following key:
>>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>> 
>>>>>> The staging repository for this release (published as 1.5.0-rc2) can be
>>>>>> found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1141/
>>>>>> 
>>>>>> The staging repository for this release (published as 1.5.0) can be
>>>>>> found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1140/
>>>>>> 
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-docs/
>>>>>> 
>>>>>> 
>>>>>> =======================================
>>>>>> How can I help test this release?
>>>>>> =======================================
>>>>>> If you are a Spark user, you can help us test this release by taking an
>>>>>> existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>> 
>>>>>> 
>>>>>> ================================================
>>>>>> What justifies a -1 vote for this release?
>>>>>> ================================================
>>>>>> This vote is happening towards the end of the 1.5 QA period, so -1
>>>>>> votes should only occur for significant regressions from 1.4. Bugs 
>>>>>> already
>>>>>> present in 1.4, minor regressions, or bugs related to new features will 
>>>>>> not
>>>>>> block this release.
>>>>>> 
>>>>>> 
>>>>>> ===============================================================
>>>>>> What should happen to JIRA tickets still targeting 1.5.0?
>>>>>> ===============================================================
>>>>>> 1. It is OK for documentation patches to target 1.5.0 and still go into
>>>>>> branch-1.5, since documentations will be packaged separately from the
>>>>>> release.
>>>>>> 2. New features for non-alpha-modules should target 1.6+.
>>>>>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop the
>>>>>> target version.
>>>>>> 
>>>>>> 
>>>>>> ==================================================
>>>>>> Major changes to help you focus your testing
>>>>>> ==================================================
>>>>>> 
>>>>>> As of today, Spark 1.5 contains more than 1000 commits from 220+
>>>>>> contributors. I've curated a list of important changes for 1.5. For the
>>>>>> complete list, please refer to Apache JIRA changelog.
>>>>>> 
>>>>>> RDD/DataFrame/SQL APIs
>>>>>> 
>>>>>> - New UDAF interface
>>>>>> - DataFrame hints for broadcast join
>>>>>> - expr function for turning a SQL expression into DataFrame column
>>>>>> - Improved support for NaN values
>>>>>> - StructType now supports ordering
>>>>>> - TimestampType precision is reduced to 1us
>>>>>> - 100 new built-in expressions, including date/time, string, math
>>>>>> - memory and local disk only checkpointing
>>>>>> 
>>>>>> DataFrame/SQL Backend Execution
>>>>>> 
>>>>>> - Code generation on by default
>>>>>> - Improved join, aggregation, shuffle, sorting with cache friendly
>>>>>> algorithms and external algorithms
>>>>>> - Improved window function performance
>>>>>> - Better metrics instrumentation and reporting for DF/SQL execution
>>>>>> plans
>>>>>> 
>>>>>> Data Sources, Hive, Hadoop, Mesos and Cluster Management
>>>>>> 
>>>>>> - Dynamic allocation support in all resource managers (Mesos, YARN,
>>>>>> Standalone)
>>>>>> - Improved Mesos support (framework authentication, roles, dynamic
>>>>>> allocation, constraints)
>>>>>> - Improved YARN support (dynamic allocation with preferred locations)
>>>>>> - Improved Hive support (metastore partition pruning, metastore
>>>>>> connectivity to 0.13 to 1.2, internal Hive upgrade to 1.2)
>>>>>> - Support persisting data in Hive compatible format in metastore
>>>>>> - Support data partitioning for JSON data sources
>>>>>> - Parquet improvements (upgrade to 1.7, predicate pushdown, faster
>>>>>> metadata discovery and schema merging, support reading non-standard 
>>>>>> legacy
>>>>>> Parquet files generated by other libraries)
>>>>>> - Faster and more robust dynamic partition insert
>>>>>> - DataSourceRegister interface for external data sources to specify
>>>>>> short names
>>>>>> 
>>>>>> SparkR
>>>>>> 
>>>>>> - YARN cluster mode in R
>>>>>> - GLMs with R formula, binomial/Gaussian families, and elastic-net
>>>>>> regularization
>>>>>> - Improved error messages
>>>>>> - Aliases to make DataFrame functions more R-like
>>>>>> 
>>>>>> Streaming
>>>>>> 
>>>>>> - Backpressure for handling bursty input streams.
>>>>>> - Improved Python support for streaming sources (Kafka offsets,
>>>>>> Kinesis, MQTT, Flume)
>>>>>> - Improved Python streaming machine learning algorithms (K-Means,
>>>>>> linear regression, logistic regression)
>>>>>> - Native reliable Kinesis stream support
>>>>>> - Input metadata like Kafka offsets made visible in the batch details
>>>>>> UI
>>>>>> - Better load balancing and scheduling of receivers across cluster
>>>>>> - Include streaming storage in web UI
>>>>>> 
>>>>>> Machine Learning and Advanced Analytics
>>>>>> 
>>>>>> - Feature transformers: CountVectorizer, Discrete Cosine
>>>>>> transformation, MinMaxScaler, NGram, PCA, RFormula, StopWordsRemover, and
>>>>>> VectorSlicer.
>>>>>> - Estimators under pipeline APIs: naive Bayes, k-means, and isotonic
>>>>>> regression.
>>>>>> - Algorithms: multilayer perceptron classifier, PrefixSpan for
>>>>>> sequential pattern mining, association rule generation, 1-sample
>>>>>> Kolmogorov-Smirnov test.
>>>>>> - Improvements to existing algorithms: LDA, trees/ensembles, GMMs
>>>>>> - More efficient Pregel API implementation for GraphX
>>>>>> - Model summary for linear and logistic regression.
>>>>>> - Python API: distributed matrices, streaming k-means and linear
>>>>>> models, LDA, power iteration clustering, etc.
>>>>>> - Tuning and evaluation: train-validation split and multiclass
>>>>>> classification evaluator.
>>>>>> - Documentation: document the release version of public API methods
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

Reply via email to