Thanks Sean, that make it clear. On Tue, Sep 1, 2015 at 7:17 AM, Sean Owen <so...@cloudera.com> wrote:
> Any 1.5 RC comes from the latest state of the 1.5 branch at some point > in time. The next RC will be cut from whatever the latest commit is. > You can see the tags in git for the specific commits for each RC. > There's no such thing as "1.5.1 SNAPSHOT" commits, just commits to > branch 1.5. I would ignore the "SNAPSHOT" version for your purpose. > > You can always build from the exact commit that an RC did by looking > at tags. There is no 1.5.0 yet so you can't build that, but once it's > released, you would be able to find its tag as well. You can always > build the latest 1.5.x branch by building from HEAD of that branch. > > On Tue, Sep 1, 2015 at 3:13 PM, <ches...@alpinenow.com> wrote: > > Thanks for the explanation. Since 1.5.0 rc3 is not yet released, I > assume it would cut from 1.5 branch, doesn't that bring 1.5.1 snapshot code > ? > > > > The reason I am asking these questions is that I would like to know If I > want build 1.5.0 myself, which commit should I use ? > > > > Sent from my iPad > > > >> On Sep 1, 2015, at 6:57 AM, Sean Owen <so...@cloudera.com> wrote: > >> > >> The head of branch 1.5 will always be a "1.5.x-SNAPSHOT" version. Yeah > >> technically you would expect it to be 1.5.0-SNAPSHOT until 1.5.0 is > >> released. In practice I think it's simpler to follow the defaults of > >> the Maven release plugin, which will set this to 1.5.1-SNAPSHOT after > >> any 1.5.0-rc is released. It doesn't affect later RCs. This has > >> nothing to do with what commits go into 1.5.0; it's an ignorable > >> detail of the version in POMs in the source tree, which don't mean > >> much anyway as the source tree itself is not a released version. > >> > >>> On Tue, Sep 1, 2015 at 2:48 PM, <ches...@alpinenow.com> wrote: > >>> Sorry, I am still not follow. I assume the release would build from > 1.5.0 before moving to 1.5.1. Are you saying the 1.5.0 rc3 could build from > 1.5.1 snapshot during release ? Or 1.5.0 rc3 would build from the last > commit of 1.5.0 (before changing to 1.5.1 snapshot) ? > >>> > >>> > >>> > >>> Sent from my iPad > >>> > >>>> On Sep 1, 2015, at 1:52 AM, Sean Owen <so...@cloudera.com> wrote: > >>>> > >>>> That's correct for the 1.5 branch, right? this doesn't mean that the > >>>> next RC would have this value. You choose the release version during > >>>> the release process. > >>>> > >>>>> On Tue, Sep 1, 2015 at 2:40 AM, Chester Chen <ches...@alpinenow.com> > wrote: > >>>>> Seems that Github branch-1.5 already changing the version to > 1.5.1-SNAPSHOT, > >>>>> > >>>>> I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ? > >>>>> > >>>>> Chester > >>>>> > >>>>>> On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin <r...@databricks.com> > wrote: > >>>>>> > >>>>>> I'm going to -1 the release myself since the issue @yhuai > identified is > >>>>>> pretty serious. It basically OOMs the driver for reading any files > with a > >>>>>> large number of partitions. Looks like the patch for that has > already been > >>>>>> merged. > >>>>>> > >>>>>> I'm going to cut rc3 momentarily. > >>>>>> > >>>>>> > >>>>>> On Sun, Aug 30, 2015 at 11:30 AM, Sandy Ryza < > sandy.r...@cloudera.com> > >>>>>> wrote: > >>>>>>> > >>>>>>> +1 (non-binding) > >>>>>>> built from source and ran some jobs against YARN > >>>>>>> > >>>>>>> -Sandy > >>>>>>> > >>>>>>> On Sat, Aug 29, 2015 at 5:50 AM, vaquar khan < > vaquar.k...@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> +1 (1.5.0 RC2)Compiled on Windows with YARN. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Vaquar khan > >>>>>>>> > >>>>>>>> +1 (non-binding, of course) > >>>>>>>> > >>>>>>>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min > >>>>>>>> mvn clean package -Pyarn -Phadoop-2.6 -DskipTests > >>>>>>>> 2. Tested pyspark, mllib > >>>>>>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK > >>>>>>>> 2.2. Linear/Ridge/Laso Regression OK > >>>>>>>> 2.3. Decision Tree, Naive Bayes OK > >>>>>>>> 2.4. KMeans OK > >>>>>>>> Center And Scale OK > >>>>>>>> 2.5. RDD operations OK > >>>>>>>> State of the Union Texts - MapReduce, Filter,sortByKey (word > >>>>>>>> count) > >>>>>>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK > >>>>>>>> Model evaluation/optimization (rank, numIter, lambda) with > >>>>>>>> itertools OK > >>>>>>>> 3. Scala - MLlib > >>>>>>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK > >>>>>>>> 3.2. LinearRegressionWithSGD OK > >>>>>>>> 3.3. Decision Tree OK > >>>>>>>> 3.4. KMeans OK > >>>>>>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK > >>>>>>>> 3.6. saveAsParquetFile OK > >>>>>>>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, > >>>>>>>> registerTempTable, sql OK > >>>>>>>> 3.8. result = sqlContext.sql("SELECT > >>>>>>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM > Orders INNER > >>>>>>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK > >>>>>>>> 4.0. Spark SQL from Python OK > >>>>>>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = > 'WA'") > >>>>>>>> OK > >>>>>>>> 5.0. Packages > >>>>>>>> 5.1. com.databricks.spark.csv - read/write OK > >>>>>>>> (--packages com.databricks:spark-csv_2.11:1.2.0-s_2.11 didn’t > work. But > >>>>>>>> com.databricks:spark-csv_2.11:1.2.0 worked) > >>>>>>>> 6.0. DataFrames > >>>>>>>> 6.1. cast,dtypes OK > >>>>>>>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK > >>>>>>>> 6.3. joins,sql,set operations,udf OK > >>>>>>>> > >>>>>>>> Cheers > >>>>>>>> <k/> > >>>>>>>> > >>>>>>>> On Tue, Aug 25, 2015 at 9:28 PM, Reynold Xin <r...@databricks.com > > > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Please vote on releasing the following candidate as Apache Spark > >>>>>>>>> version 1.5.0. The vote is open until Friday, Aug 29, 2015 at > 5:00 UTC and > >>>>>>>>> passes if a majority of at least 3 +1 PMC votes are cast. > >>>>>>>>> > >>>>>>>>> [ ] +1 Release this package as Apache Spark 1.5.0 > >>>>>>>>> [ ] -1 Do not release this package because ... > >>>>>>>>> > >>>>>>>>> To learn more about Apache Spark, please see > http://spark.apache.org/ > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> The tag to be voted on is v1.5.0-rc2: > >>>>>>>>> > >>>>>>>>> > https://github.com/apache/spark/tree/727771352855dbb780008c449a877f5aaa5fc27a > >>>>>>>>> > >>>>>>>>> The release files, including signatures, digests, etc. can be > found at: > >>>>>>>>> > http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/ > >>>>>>>>> > >>>>>>>>> Release artifacts are signed with the following key: > >>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc > >>>>>>>>> > >>>>>>>>> The staging repository for this release (published as 1.5.0-rc2) > can be > >>>>>>>>> found at: > >>>>>>>>> > https://repository.apache.org/content/repositories/orgapachespark-1141/ > >>>>>>>>> > >>>>>>>>> The staging repository for this release (published as 1.5.0) can > be > >>>>>>>>> found at: > >>>>>>>>> > https://repository.apache.org/content/repositories/orgapachespark-1140/ > >>>>>>>>> > >>>>>>>>> The documentation corresponding to this release can be found at: > >>>>>>>>> > http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-docs/ > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ======================================= > >>>>>>>>> How can I help test this release? > >>>>>>>>> ======================================= > >>>>>>>>> If you are a Spark user, you can help us test this release by > taking an > >>>>>>>>> existing Spark workload and running on this release candidate, > then > >>>>>>>>> reporting any regressions. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ================================================ > >>>>>>>>> What justifies a -1 vote for this release? > >>>>>>>>> ================================================ > >>>>>>>>> This vote is happening towards the end of the 1.5 QA period, so > -1 > >>>>>>>>> votes should only occur for significant regressions from 1.4. > Bugs already > >>>>>>>>> present in 1.4, minor regressions, or bugs related to new > features will not > >>>>>>>>> block this release. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> =============================================================== > >>>>>>>>> What should happen to JIRA tickets still targeting 1.5.0? > >>>>>>>>> =============================================================== > >>>>>>>>> 1. It is OK for documentation patches to target 1.5.0 and still > go into > >>>>>>>>> branch-1.5, since documentations will be packaged separately > from the > >>>>>>>>> release. > >>>>>>>>> 2. New features for non-alpha-modules should target 1.6+. > >>>>>>>>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop > the > >>>>>>>>> target version. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ================================================== > >>>>>>>>> Major changes to help you focus your testing > >>>>>>>>> ================================================== > >>>>>>>>> > >>>>>>>>> As of today, Spark 1.5 contains more than 1000 commits from 220+ > >>>>>>>>> contributors. I've curated a list of important changes for 1.5. > For the > >>>>>>>>> complete list, please refer to Apache JIRA changelog. > >>>>>>>>> > >>>>>>>>> RDD/DataFrame/SQL APIs > >>>>>>>>> > >>>>>>>>> - New UDAF interface > >>>>>>>>> - DataFrame hints for broadcast join > >>>>>>>>> - expr function for turning a SQL expression into DataFrame > column > >>>>>>>>> - Improved support for NaN values > >>>>>>>>> - StructType now supports ordering > >>>>>>>>> - TimestampType precision is reduced to 1us > >>>>>>>>> - 100 new built-in expressions, including date/time, string, math > >>>>>>>>> - memory and local disk only checkpointing > >>>>>>>>> > >>>>>>>>> DataFrame/SQL Backend Execution > >>>>>>>>> > >>>>>>>>> - Code generation on by default > >>>>>>>>> - Improved join, aggregation, shuffle, sorting with cache > friendly > >>>>>>>>> algorithms and external algorithms > >>>>>>>>> - Improved window function performance > >>>>>>>>> - Better metrics instrumentation and reporting for DF/SQL > execution > >>>>>>>>> plans > >>>>>>>>> > >>>>>>>>> Data Sources, Hive, Hadoop, Mesos and Cluster Management > >>>>>>>>> > >>>>>>>>> - Dynamic allocation support in all resource managers (Mesos, > YARN, > >>>>>>>>> Standalone) > >>>>>>>>> - Improved Mesos support (framework authentication, roles, > dynamic > >>>>>>>>> allocation, constraints) > >>>>>>>>> - Improved YARN support (dynamic allocation with preferred > locations) > >>>>>>>>> - Improved Hive support (metastore partition pruning, metastore > >>>>>>>>> connectivity to 0.13 to 1.2, internal Hive upgrade to 1.2) > >>>>>>>>> - Support persisting data in Hive compatible format in metastore > >>>>>>>>> - Support data partitioning for JSON data sources > >>>>>>>>> - Parquet improvements (upgrade to 1.7, predicate pushdown, > faster > >>>>>>>>> metadata discovery and schema merging, support reading > non-standard legacy > >>>>>>>>> Parquet files generated by other libraries) > >>>>>>>>> - Faster and more robust dynamic partition insert > >>>>>>>>> - DataSourceRegister interface for external data sources to > specify > >>>>>>>>> short names > >>>>>>>>> > >>>>>>>>> SparkR > >>>>>>>>> > >>>>>>>>> - YARN cluster mode in R > >>>>>>>>> - GLMs with R formula, binomial/Gaussian families, and > elastic-net > >>>>>>>>> regularization > >>>>>>>>> - Improved error messages > >>>>>>>>> - Aliases to make DataFrame functions more R-like > >>>>>>>>> > >>>>>>>>> Streaming > >>>>>>>>> > >>>>>>>>> - Backpressure for handling bursty input streams. > >>>>>>>>> - Improved Python support for streaming sources (Kafka offsets, > >>>>>>>>> Kinesis, MQTT, Flume) > >>>>>>>>> - Improved Python streaming machine learning algorithms (K-Means, > >>>>>>>>> linear regression, logistic regression) > >>>>>>>>> - Native reliable Kinesis stream support > >>>>>>>>> - Input metadata like Kafka offsets made visible in the batch > details > >>>>>>>>> UI > >>>>>>>>> - Better load balancing and scheduling of receivers across > cluster > >>>>>>>>> - Include streaming storage in web UI > >>>>>>>>> > >>>>>>>>> Machine Learning and Advanced Analytics > >>>>>>>>> > >>>>>>>>> - Feature transformers: CountVectorizer, Discrete Cosine > >>>>>>>>> transformation, MinMaxScaler, NGram, PCA, RFormula, > StopWordsRemover, and > >>>>>>>>> VectorSlicer. > >>>>>>>>> - Estimators under pipeline APIs: naive Bayes, k-means, and > isotonic > >>>>>>>>> regression. > >>>>>>>>> - Algorithms: multilayer perceptron classifier, PrefixSpan for > >>>>>>>>> sequential pattern mining, association rule generation, 1-sample > >>>>>>>>> Kolmogorov-Smirnov test. > >>>>>>>>> - Improvements to existing algorithms: LDA, trees/ensembles, GMMs > >>>>>>>>> - More efficient Pregel API implementation for GraphX > >>>>>>>>> - Model summary for linear and logistic regression. > >>>>>>>>> - Python API: distributed matrices, streaming k-means and linear > >>>>>>>>> models, LDA, power iteration clustering, etc. > >>>>>>>>> - Tuning and evaluation: train-validation split and multiclass > >>>>>>>>> classification evaluator. > >>>>>>>>> - Documentation: document the release version of public API > methods > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> >