Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Holden Karau
Thanks for the specific mention of the new PySpark packaging Shivaram,

For *nix (Linux, Unix, OS X, etc.) Python users interested in helping test
the new artifacts you can do as follows:

Setup PySpark with pip by:

1. Download the artifact from
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/pyspark-2.1.0+hadoop2.7.tar.gz
2. (Optional): Create a virtual env (e.g. virtualenv /tmp/pysparktest;
source /tmp/pysparktest/bin/activate)
3. (Possibly required depending on pip version): Upgrade pip to a recent
version (e.g. pip install --upgrade pip)
3. Install the package with pip install pyspark-2.1.0+hadoop2.7.tar.gz
4. If you have SPARK_HOME set to any specific path unset it to force the
pip installed pyspark to run with its provided jars

In the future we hope to publish to PyPI allowing you to skip the download
step, but there just wasn't a chance to get that part included for this
release. If everything goes smoothly hopefully we can add that soon (see
SPARK-18128 ) :)

Some things to verify:
1) Verify you can start the PySpark shell (e.g. run pyspark)
2) Verify you can start PySpark from python (e.g. run python, verify you
can import pyspark and construct a SparkContext).
3) Verify you PySpark programs works with pip installed PySpark as well as
regular spark (e.g. spark-submit my-workload.py)
4) Have a different version of Spark downloaded locally as well? Verify
that launches and runs correctly & pip installed PySpark is not taking
precedence (make sure to use the fully qualified path when executing).

Some things that are explicitly not supported in pip installed PySpark:
1) Starting a new standalone cluster with pip installed PySpark (connecting
to an existing standalone cluster is expected to work)
2) non-Python Spark interfaces (e.g. don't pip install pypsark for SparkR,
use the SparkR packaging instead :)).
3) PyPi - if things go well coming in a future release (track the progress
on https://issues.apache.org/jira/browse/SPARK-18128)
4) Python versions prior to 2.7
5) Full Windows support - later follow up task (if your interested in this
please chat with me or see https://issues.apache.org/jira/browse/SPARK-18136
)

Post verification cleanup:
1. Uninstall the pip installed PySpark since it is just an RC and you don't
want it getting in the way later (e.g. pip uninstall pypsark-2.1.0 )
2 (Optional). deactivate your pip environment

If anyone has any questions about the new PySpark packaging I'm more than
happy to chat :)

Cheers,

Holden :)



On Thu, Dec 15, 2016 at 9:44 PM, Reynold Xin  wrote:

> I'm going to start this with a +1!
>
>
> On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman <
> shiva...@eecs.berkeley.edu> wrote:
>
>> In addition to usual binary artifacts, this is the first release where
>> we have installable packages for Python [1] and R [2] that are part of
>> the release.  I'm including instructions to test the R package below.
>> Holden / other Python developers can chime in if there are special
>> instructions to test the pip package.
>>
>> To test the R source package you can follow the following commands.
>> 1. Download the SparkR source package from
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
>> 0-rc5-bin/SparkR_2.1.0.tar.gz
>> 2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
>> 3. As the SparkR package doesn't contain Spark JARs (this is due to
>> package size limits from CRAN), we'll need to run [3]
>> export SPARKR_RELEASE_DOWNLOAD_URL="http://people.apache.org/~pwend
>> ell/spark-releases/spark-2.1.0-rc5-bin/spark-2.1.0-bin-hadoop2.6.tgz"
>> 4. Launch R. You can now use include SparkR with `library(SparkR)` and
>> test it with your applications.
>> 5. Note that the first time a SparkSession is created the binary
>> artifacts will the downloaded.
>>
>> Thanks
>> Shivaram
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-18267
>> [2] https://issues.apache.org/jira/browse/SPARK-18590
>> [3] Note that this isn't required once 2.1.0 has been released as
>> SparkR can automatically resolve and download releases.
>>
>> On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin  wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.0-rc5
>> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
>> >
>> > List of JIRA tickets resolved are:
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20SPARK%20AND%20fixVersion%20%3D%202.1.0
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.

Re: Mistake in Apache Spark Java.

2016-12-16 Thread Liang-Chi Hsieh

Hi,

I tried your example with latest Spark master branch and branch-2.0. It
works well.





-
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Mistake-in-Apache-Spark-Java-tp20233p20254.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Expand the Spark SQL programming guide?

2016-12-16 Thread Thakrar, Jayesh
Yes - that sounds good Anton, I can work on documenting the window functions.

From: Anton Okolnychyi 
Date: Thursday, December 15, 2016 at 4:34 PM
To: Conversant 
Cc: Michael Armbrust , Jim Hughes , 
"dev@spark.apache.org" 
Subject: Re: Expand the Spark SQL programming guide?

I think it will make sense to show a sample implementation of 
UserDefinedAggregateFunction for DataFrames, and an example of the Aggregator 
API for typed Datasets.

Jim, what if I submit a PR and you join the review process? I also do not mind 
to split this if you want, but it seems to be an overkill for this part.

Jayesh, shall I skip the window functions part since you are going to work on 
that?

2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh 
mailto:jthak...@conversantmedia.com>>:
I too am interested in expanding the documentation for Spark SQL.
For my work I needed to get some info/examples/guidance on window functions and 
have been using 
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
 .
How about divide and conquer?


From: Michael Armbrust mailto:mich...@databricks.com>>
Date: Thursday, December 15, 2016 at 3:21 PM
To: Jim Hughes mailto:jn...@ccri.com>>
Cc: "dev@spark.apache.org" 
mailto:dev@spark.apache.org>>
Subject: Re: Expand the Spark SQL programming guide?

Pull requests would be welcome for any major missing features in the guide: 
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md

On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes 
mailto:jn...@ccri.com>> wrote:
Hi Anton,

I'd like to see this as well.  I've been working on implementing geospatial 
user-defined types and functions.  Having examples of aggregations and window 
functions would be awesome!

I did test out implementing a distributed convex hull as a 
UserDefinedAggregateFunction, and that seemed to work sensibly.

Cheers,

Jim

On 12/15/2016 03:28 AM, Anton Okolnychyi wrote:
Hi,

I am wondering whether it makes sense to expand the Spark SQL programming guide 
with examples of aggregations (including user-defined via the Aggregator API) 
and window functions.  For instance, there might be a separate subsection under 
"Getting Started" for each functionality.

SPARK-16046 seems to be related but there is no activity for more than 4 months.

Best regards,
Anton






Mesos Spark Fine Grained Execution - CPU count

2016-12-16 Thread Chawla,Sumit
Hi

I am using Spark 1.6. I have one query about Fine Grained model in Spark.
I have a simple Spark application which transforms A -> B.  Its a single
stage application.  To begin the program, It starts with 48 partitions.
When the program starts running, in mesos UI it shows 48 tasks and 48 CPUs
allocated to job.  Now as the tasks get done, the number of active tasks
number starts decreasing.  How ever, the number of CPUs does not decrease
propotionally.  When the job was about to finish, there was a single
remaininig task, however CPU count was still 20.

My questions, is why there is no one to one mapping between tasks and cpus
in Fine grained?  How can these CPUs be released when the job is done, so
that other jobs can start.


Regards
Sumit Chawla


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Sean Owen
(If you have a template for these emails, maybe update it to use https
links. They work for apache.org domains. After all we are asking people to
verify the integrity of release artifacts, so it might as well be secure.)

(Also the new archives use .tar.gz instead of .tgz like the others. No big
deal, my OCD eye just noticed it.)

I don't see an Apache license / notice for the Pyspark or SparkR artifacts.
It would be good practice to include this in a convenience binary. I'm not
sure if it's strictly mandatory, but something to adjust in any event. I
think that's all there is to do for SparkR. For Pyspark, which packages a
bunch of dependencies, it does include the licenses (good) but I think it
should include the NOTICE file.

This is the first time I recall getting 0 test failures off the bat!
I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

I think I'd +1 this therefore unless someone knows that the license issue
above is real and a blocker.

On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Dongjoon Hyun
RC5 is also tested on CentOS 6.8, OpenJDK 1.8.0_111, R 3.3.2 with profiles 
`-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr`.

BTW, there still exist five on-going issues in JIRA (with target version 2.1.0).

1. SPARK-16845  
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
grows beyond 64 KB
2. SPARK-18669 Update Apache docs regard watermarking in Structured Streaming
3. SPARK-18894 Event time watermark delay threshold specified in months or 
years gives incorrect results
4. SPARK-18899 append data to a bucketed table with mismatched bucketing should 
fail

+1 with known issues for now.

Bests,
Dongjoon.

On 2016-12-16 09:57 (-0800), Sean Owen  wrote: 
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people to
> verify the integrity of release artifacts, so it might as well be secure.)
> 
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
> 
> I don't see an Apache license / notice for the Pyspark or SparkR artifacts.
> It would be good practice to include this in a convenience binary. I'm not
> sure if it's strictly mandatory, but something to adjust in any event. I
> think that's all there is to do for SparkR. For Pyspark, which packages a
> bunch of dependencies, it does include the licenses (good) but I think it
> should include the NOTICE file.
> 
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
> 
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
> 
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
> 
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.1.0
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.1.0-rc5
> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
> >
> > List of JIRA tickets resolved are:
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1223/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
> >
> >
> > *FAQ*
> >
> > *How can I help test this release?*
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > *What should happen to JIRA tickets still targeting 2.1.0?*
> >
> > Committers should look at those and triage. Extremely important bug fixes,
> > documentation, and API tweaks that impact compatibility should be worked on
> > immediately. Everything else please retarget to 2.1.1 or 2.2.0.
> >
> > *What happened to RC3/RC5?*
> >
> > They had issues withe release packaging and as a result were skipped.
> >
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Expand the Spark SQL programming guide?

2016-12-16 Thread Jim Hughes
I'd be happy to review a PR.  At the minute, I'm still learning Spark 
SQL, so writing documentation might be a bit of a stretch, but reviewing 
would be fine.


Thanks!

On 12/16/2016 08:39 AM, Thakrar, Jayesh wrote:


Yes - that sounds good Anton, I can work on documenting the window 
functions.


*From: *Anton Okolnychyi 
*Date: *Thursday, December 15, 2016 at 4:34 PM
*To: *Conversant 
*Cc: *Michael Armbrust , Jim Hughes 
, "dev@spark.apache.org" 

*Subject: *Re: Expand the Spark SQL programming guide?

I think it will make sense to show a sample implementation of 
UserDefinedAggregateFunction for DataFrames, and an example of the 
Aggregator API for typed Datasets.


Jim, what if I submit a PR and you join the review process? I also do 
not mind to split this if you want, but it seems to be an overkill for 
this part.


Jayesh, shall I skip the window functions part since you are going to 
work on that?


2016-12-15 22:48 GMT+01:00 Thakrar, Jayesh 
mailto:jthak...@conversantmedia.com>>:


I too am interested in expanding the documentation for Spark SQL.

For my work I needed to get some info/examples/guidance on window
functions and have been using

https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
.

How about divide and conquer?

*From: *Michael Armbrust mailto:mich...@databricks.com>>
*Date: *Thursday, December 15, 2016 at 3:21 PM
*To: *Jim Hughes mailto:jn...@ccri.com>>
*Cc: *"dev@spark.apache.org "
mailto:dev@spark.apache.org>>
*Subject: *Re: Expand the Spark SQL programming guide?

Pull requests would be welcome for any major missing features in
the guide:
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md

On Thu, Dec 15, 2016 at 11:48 AM, Jim Hughes mailto:jn...@ccri.com>> wrote:

Hi Anton,

I'd like to see this as well.  I've been working on
implementing geospatial user-defined types and functions. 
Having examples of aggregations and window functions would be

awesome!

I did test out implementing a distributed convex hull as a
UserDefinedAggregateFunction, and that seemed to work sensibly.

Cheers,

Jim

On 12/15/2016 03:28 AM, Anton Okolnychyi wrote:

Hi,

I am wondering whether it makes sense to expand the Spark
SQL programming guide with examples of aggregations
(including user-defined via the Aggregator API) and window
functions. For instance, there might be a separate
subsection under "Getting Started" for each functionality.

SPARK-16046 seems to be related but there is no activity
for more than 4 months.

Best regards,

Anton





Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Felix Cheung
For R we have a license field in the DESCRIPTION, and this is standard practice 
(and requirement) for R packages.

https://cran.r-project.org/doc/manuals/R-exts.html#Licensing


From: Sean Owen 
Sent: Friday, December 16, 2016 9:57:15 AM
To: Reynold Xin; dev@spark.apache.org
Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5)

(If you have a template for these emails, maybe update it to use https links. 
They work for apache.org domains. After all we are asking 
people to verify the integrity of release artifacts, so it might as well be 
secure.)

(Also the new archives use .tar.gz instead of .tgz like the others. No big 
deal, my OCD eye just noticed it.)

I don't see an Apache license / notice for the Pyspark or SparkR artifacts. It 
would be good practice to include this in a convenience binary. I'm not sure if 
it's strictly mandatory, but something to adjust in any event. I think that's 
all there is to do for SparkR. For Pyspark, which packages a bunch of 
dependencies, it does include the licenses (good) but I think it should include 
the NOTICE file.

This is the first time I recall getting 0 test failures off the bat!
I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

I think I'd +1 this therefore unless someone knows that the license issue above 
is real and a blocker.

On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin 
mailto:r...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.1.0. 
The vote is open until Sun, December 18, 2016 at 21:30 PT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116bf56fa62c76)

List of JIRA tickets resolved are:  
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1223/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions.

What should happen to JIRA tickets still targeting 2.1.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked on 
immediately. Everything else please retarget to 2.1.1 or 2.2.0.

What happened to RC3/RC5?

They had issues withe release packaging and as a result were skipped.



Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Xiao Li
+1

Xiao Li

2016-12-16 12:19 GMT-08:00 Felix Cheung :

> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
> --
> *From:* Sean Owen 
> *Sent:* Friday, December 16, 2016 9:57:15 AM
> *To:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to do for SparkR. For Pyspark, which
> packages a bunch of dependencies, it does include the licenses (good) but I
> think it should include the NOTICE file.
>
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b
>> f56fa62c76)
>>
>> List of JIRA tickets resolved are:  https://issues.apache.org/
>> jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>
>> *What happened to RC3/RC5?*
>>
>> They had issues withe release packaging and as a result were skipped.
>>
>>


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Herman van Hövell tot Westerflier
+1

On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li  wrote:

> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung :
>
>> For R we have a license field in the DESCRIPTION, and this is standard
>> practice (and requirement) for R packages.
>>
>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>
>> --
>> *From:* Sean Owen 
>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>> *To:* Reynold Xin; dev@spark.apache.org
>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>
>> (If you have a template for these emails, maybe update it to use https
>> links. They work for apache.org domains. After all we are asking people
>> to verify the integrity of release artifacts, so it might as well be
>> secure.)
>>
>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>> big deal, my OCD eye just noticed it.)
>>
>> I don't see an Apache license / notice for the Pyspark or SparkR
>> artifacts. It would be good practice to include this in a convenience
>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>> packages a bunch of dependencies, it does include the licenses (good) but I
>> think it should include the NOTICE file.
>>
>> This is the first time I recall getting 0 test failures off the bat!
>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>
>> I think I'd +1 this therefore unless someone knows that the license issue
>> above is real and a blocker.
>>
>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
>>> bf56fa62c76)
>>>
>>> List of JIRA tickets resolved are:  https://issues.apache.org/jir
>>> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>>
>>> *What happened to RC3/RC5?*
>>>
>>> They had issues withe release packaging and as a result were skipped.
>>>
>>>
>


-- 

Herman van Hövell

Software Engineer

Databricks Inc.

hvanhov...@databricks.com

+31 6 420 590 27

databricks.com

[image: http://databricks.com] 


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Joseph Bradley
+1

On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
hvanhov...@databricks.com> wrote:

> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li  wrote:
>
>> +1
>>
>> Xiao Li
>>
>> 2016-12-16 12:19 GMT-08:00 Felix Cheung :
>>
>>> For R we have a license field in the DESCRIPTION, and this is standard
>>> practice (and requirement) for R packages.
>>>
>>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>>
>>> --
>>> *From:* Sean Owen 
>>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>>> *To:* Reynold Xin; dev@spark.apache.org
>>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>>
>>> (If you have a template for these emails, maybe update it to use https
>>> links. They work for apache.org domains. After all we are asking people
>>> to verify the integrity of release artifacts, so it might as well be
>>> secure.)
>>>
>>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>>> big deal, my OCD eye just noticed it.)
>>>
>>> I don't see an Apache license / notice for the Pyspark or SparkR
>>> artifacts. It would be good practice to include this in a convenience
>>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>>> packages a bunch of dependencies, it does include the licenses (good) but I
>>> think it should include the NOTICE file.
>>>
>>> This is the first time I recall getting 0 test failures off the bat!
>>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>>
>>> I think I'd +1 this therefore unless someone knows that the license
>>> issue above is real and a blocker.
>>>
>>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin  wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT
 and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.1.0
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
 bf56fa62c76)

 List of JIRA tickets resolved are:  https://issues.apache.org/jir
 a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0

 The release files, including signatures, digests, etc. can be found at:
 http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1223/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/


 *FAQ*

 *How can I help test this release?*

 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 *What should happen to JIRA tickets still targeting 2.1.0?*

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should be
 worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.

 *What happened to RC3/RC5?*

 They had issues withe release packaging and as a result were skipped.


>>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhov...@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] 
>



-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] 


Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Yuming Wang
I hope https://github.com/apache/spark/pull/16252 can be fixed until
release 2.1.0. It's a fix for broadcast cannot fit in memory.

On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley 
wrote:

> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhov...@databricks.com> wrote:
>
>> +1
>>
>> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li  wrote:
>>
>>> +1
>>>
>>> Xiao Li
>>>
>>> 2016-12-16 12:19 GMT-08:00 Felix Cheung :
>>>
 For R we have a license field in the DESCRIPTION, and this is standard
 practice (and requirement) for R packages.

 https://cran.r-project.org/doc/manuals/R-exts.html#Licensing

 --
 *From:* Sean Owen 
 *Sent:* Friday, December 16, 2016 9:57:15 AM
 *To:* Reynold Xin; dev@spark.apache.org
 *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)

 (If you have a template for these emails, maybe update it to use https
 links. They work for apache.org domains. After all we are asking
 people to verify the integrity of release artifacts, so it might as well be
 secure.)

 (Also the new archives use .tar.gz instead of .tgz like the others. No
 big deal, my OCD eye just noticed it.)

 I don't see an Apache license / notice for the Pyspark or SparkR
 artifacts. It would be good practice to include this in a convenience
 binary. I'm not sure if it's strictly mandatory, but something to adjust in
 any event. I think that's all there is to do for SparkR. For Pyspark, which
 packages a bunch of dependencies, it does include the licenses (good) but I
 think it should include the NOTICE file.

 This is the first time I recall getting 0 test failures off the bat!
 I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

 I think I'd +1 this therefore unless someone knows that the license
 issue above is real and a blocker.

 On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin 
 wrote:

> Please vote on releasing the following candidate as Apache Spark
> version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT
> and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
> bf56fa62c76)
>
> List of JIRA tickets resolved are:  https://issues.apache.org/jir
> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapache
> spark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
> 0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should be
> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>
>>>
>>
>>
>> --
>>
>> Herman van Hövell
>>
>> Software Engineer
>>
>> Databricks Inc.
>>
>> hvanhov...@databricks.com
>>
>> +31 6 420 590 27
>>
>> databricks.com
>>
>> [image: http://databricks.com] 
>>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] 
>