Re: Welcoming two new committers

2016-02-08 Thread Corey Nolet
Congrats guys! On Mon, Feb 8, 2016 at 12:23 PM, Ted Yu wrote: > Congratulations, Herman and Wenchen. > > On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC has recently added two new Spark committers -- Herman van Hovell >> and Wenchen Fan. Both have been heavily

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread Corey Nolet
David, Thank you very much for announcing this! It looks like it could be very useful. Would you mind providing a link to the github? On Tue, Jan 12, 2016 at 10:03 AM, David wrote: > Hi all, > > I'd like to share news of the recent release of a new Spark package, ROSE. > > > ROSE is a Scala lib

Re: Forecasting Library For Apache Spark

2015-09-21 Thread Corey Nolet
Mohamed, Have you checked out the Spark Timeseries [1] project? Non-seasonal ARIMA was added to this recently and seasonal ARIMA should be following shortly. [1] https://github.com/cloudera/spark-timeseries On Mon, Sep 21, 2015 at 7:47 AM, Mohamed Baddar wrote: > Hello everybody , this my firs

Re: MongoDB and Spark

2015-09-11 Thread Corey Nolet
Unfortunately, MongoDB does not directly expose its locality via its client API so the problem with trying to schedule Spark tasks against it is that the tasks themselves cannot be scheduled locally on nodes containing query results- which means you can only assume most results will be sent over th

Re: Welcoming some new committers

2015-06-20 Thread Corey Nolet
Congrats guys! Keep up the awesome work! On Sat, Jun 20, 2015 at 3:28 PM, Guru Medasani wrote: > Congratulations to all the new committers! > > Guru Medasani > gdm...@gmail.com > > > > > On Jun 17, 2015, at 5:12 PM, Matei Zaharia > wrote: > > > > Hey all, > > > > Over the past 1.5 months we add

Re: [VOTE] Release Apache Spark 1.3.0 (RC3)

2015-03-09 Thread Corey Nolet
+1 (non-binding) - Verified signatures - Built on Mac OS X and Fedora 21. On Mon, Mar 9, 2015 at 11:01 PM, Krishna Sankar wrote: > Excellent, Thanks Xiangrui. The mystery is solved. > Cheers > > > > On Mon, Mar 9, 2015 at 3:30 PM, Xiangrui Meng wrote: > > > Krishna, I tested your linear regre

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Corey Nolet
x Parquet filter push-down > SPARK-5310 SPARK-5166 Update SQL programming guide for 1.3 > SPARK-5183 SPARK-5180 Document data source API > SPARK-3650 Triangle Count handles reverse edges incorrectly > SPARK-3511 Create a RELEASE-NOTES.txt file in the repo > > > On Mon, Feb 23, 20

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-23 Thread Corey Nolet
This vote was supposed to close on Saturday but it looks like no PMCs voted (other than the implicit vote from Patrick). Was there a discussion offline to cut an RC2? Was the vote extended? On Mon, Feb 23, 2015 at 6:59 AM, Robin East wrote: > Running ec2 launch scripts gives me the following err

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

2015-02-19 Thread Corey Nolet
+1 (non-binding) - Verified signatures using [1] - Built on MacOSX Yosemite - Built on Fedora 21 Each build was run with and Hadoop-2.4 version with yarn, hive, and hive-thriftserver profiles I am having trouble getting all the tests passing on a single run on both machines but we have this same

Re: Replacing Jetty with TomCat

2015-02-17 Thread Corey Nolet
Niranda, I'm not sure if I'd say Spark's use of Jetty to expose its UI monitoring layer constitutes a use of "two web servers in a single product". Hadoop uses Jetty as well as do many other applications today that need embedded http layers for serving up their monitoring UI to users. This is comp

Re: Welcoming three new committers

2015-02-03 Thread Corey Nolet
Congrats guys! On Tue, Feb 3, 2015 at 7:01 PM, Evan Chan wrote: > Congrats everyone!!! > > On Tue, Feb 3, 2015 at 3:17 PM, Timothy Chen wrote: > > Congrats all! > > > > Tim > > > > > >> On Feb 4, 2015, at 7:10 AM, Pritish Nawlakhe < > prit...@nirvana-international.com> wrote: > >> > >> Congrats

Re: Spark SQL API changes and stabilization

2015-01-15 Thread Corey Nolet
Reynold, One thing I'd like worked into the public portion of the API is the json inferencing logic that creates a Set[(String, StructType)] out of Map[String,Any]. SPARK-5260 addresses this so that I can use Accumulators to infer my schema instead of forcing a map/reduce phase to occur on an RDD

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Corey Nolet
I was actually about to post this myself- I have a complex join that could benefit from something like a GroupComparator vs having to do multiple grouyBy operations. This is probably the wrong thread for a full discussion on this but I didn't see a JIRA ticket for this or anything similar- any reas

Re: Spark & Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
ere is no further > specialization needed beyond that. The profile sets hadoop.version to > 2.4.0 by default, but this can be overridden. > > On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet wrote: > > I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is > >

Spark & Hadoop 2.5.1

2014-11-14 Thread Corey Nolet
I noticed Spark 1.2.0-SNAPSHOT still has 2.4.x in the pom. Since 2.5.x is the current stable Hadoop 2.x, would it make sense for us to update the poms?

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
I'm actually going to change my non-binding to +0 for the proposal as-is. I overlooked some parts of the original proposal that, when reading over them again, do not sit well with me. "one of the maintainers needs to sign off on each patch to the component", as Greg has pointed out, does seem to i

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
PMC [1] is responsible for oversight and does not designate partial or full committer. There are projects where all committers become PMC and others where PMC is reserved for committers with the most merit (and willingness to take on the responsibility of project oversight, releases, etc...). Commu

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Corey Nolet
+1 (non-binding) [for original process proposal] Greg, the first time I've seen the word "ownership" on this thread is in your message. The first time the word "lead" has appeared in this thread is in your message as well. I don't think that was the intent. The PMC and Committers have a responsibi

Re: Raise Java dependency from 6 to 7

2014-10-19 Thread Corey Nolet
A concrete plan and a definite version upon which the upgrade would be applied sounds like it would benefit the community. If you plan far enough out (as Hadoop has done) and give the community enough of a notice, I can't see it being a problem as they would have ample time upgrade. On Sat, Oct