Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-07 Thread Felix C
Or you could build an uber jar ( you could google that ) https://eradiating.wordpress.com/2015/02/15/getting-spark-streaming-on-kafka-to-work/ --- Original Message --- From: "Akhil Das" Sent: April 4, 2015 11:52 PM To: "Priya Ch" Cc: u...@spark.apache.org, "dev" Subject: Re: Spark streaming w

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-07 Thread Sean Owen
I think that's close enough for a +1: Signatures and hashes are good. LICENSE, NOTICE still check out. Compiles for a Hadoop 2.6 + YARN + Hive profile. JIRAs with target version = 1.2.x look legitimate; no blockers. I still observe several Hive test failures with: mvn -Phadoop-2.4 -Pyarn -Phive

not in gzip format

2015-04-07 Thread prabeesh k
Please check the apache mirror http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.0/spark-1.3.0.tgz file. It is not in the gzip format.

Re: not in gzip format

2015-04-07 Thread Sean Owen
Er, click the link? It is indeed a redirector HTML page. This is how all Apache releases are served. On Apr 7, 2015 8:32 AM, "prabeesh k" wrote: > Please check the apache mirror > http://www.apache.org/dyn/closer.cgi/spark/spark-1.3.0/spark-1.3.0.tgz > file. It is not in the gzip format. >

Re: not in gzip format

2015-04-07 Thread prabeesh k
but name just confusing On 7 April 2015 at 16:35, Sean Owen wrote: > Er, click the link? It is indeed a redirector HTML page. This is how all > Apache releases are served. > On Apr 7, 2015 8:32 AM, "prabeesh k" wrote: > >> Please check the apache mirror >> http://www.apache.org/dyn/closer.cgi/s

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Marcelo Vanzin
+1 (non-binding) Ran standalone and yarn tests on the hadoop-2.6 tarball, with and without the external shuffle service in yarn mode. On Sat, Apr 4, 2015 at 5:09 PM, Patrick Wendell wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.3.1! > > The tag to be vote

Regularization in MLlib

2015-04-07 Thread Ulanov, Alexander
Hi, Could anyone elaborate on the regularization in Spark? I've found that L1 and L2 are implemented with Updaters (L1Updater, SquaredL2Updater). 1)Why the loss reported by L2 is (0.5 * regParam * norm * norm) where norm is Norm(weights, 2.0)? It should be 0.5*regParam*norm (0.5 to disappear aft

Re: Regularization in MLlib

2015-04-07 Thread DB Tsai
1) Norm(weights, N) will return (w_1^N + w_2^N +)^(1/N), so norm * norm is required. 2) This is bug as you said. I intend to fix this using weighted regularization, and intercept term will be regularized with weight zero. https://github.com/apache/spark/pull/1518 But I never actually have tim

RE: Regularization in MLlib

2015-04-07 Thread Ulanov, Alexander
Hi DB, Thank you! In general case (not only for regression), I think that Regularizer should be tightly coupled with Gradient otherwise it will have no idea which weights are bias (intercept). Best regards, Alexander -Original Message- From: DB Tsai [mailto:dbt...@dbtsai.com] Sent: T

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
Hey All, Today SPARK-6737 came to my attention. This is a bug that causes a memory leak for any long running program that repeatedly saves data out to a Hadoop FileSystem. For that reason, it is problematic for Spark Streaming. My sense is that this is severe enough to cut another RC once the fix

Re: extended jenkins downtime, thursday april 9th 7am-noon PDT (moving to anaconda python & more)

2015-04-07 Thread shane knapp
reminder! this is happening thurday morning. On Fri, Apr 3, 2015 at 9:59 AM, shane knapp wrote: > welcome to python2.7+, java 8 and more! :) > > i'll be doing a major upgrade to our build system next thursday morning. > here's a quick list of what's going on: > > * installation of anaconda py

Re: [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Josh Rosen
The leak will impact long running streaming jobs even if they don't write Hadoop files, although the problem may take much longer to manifest itself for those jobs. I think we currently leak an empty HashMap per stage submitted in the common case, so it could take a very long time for this to t

Contributor CLAs

2015-04-07 Thread Nicholas Chammas
I've seen many other OSS projects ask contributors to sign CLAs. I've never seen us do that. I assume it's not an issue, since people opening PRs generally understand what it means. But legally I'm sure there's some danger in taking an implied vs. explicit license to do something. So: Do we need

Re: Contributor CLAs

2015-04-07 Thread Sean Owen
Yeah, this is why this pops up when you open a PR: https://github.com/apache/spark/blob/master/CONTRIBUTING.md Mostly, I want to take all reasonable steps to ensure that when somebody offers a code contribution, that they are fine with the ways in which it actually used (redistributed under the te

Re: Spark + Kinesis

2015-04-07 Thread Vadim Bichutskiy
Hey y'all, While I haven't been able to get Spark + Kinesis integration working, I pivoted to plan B: I now push data to S3 where I set up a DStream to monitor an S3 bucket with textFileStream, and that works great. I <3 Spark! Best, Vadim ᐧ On Mon, Apr 6, 2015 at 12:23 PM, Vadim Bichutskiy <

Re: 1.3 Build Error with Scala-2.11

2015-04-07 Thread Imran Rashid
did you run dev/change-version-to-2.11.sh before compiling? When I ran this on current master, it mostly worked: dev/change-version-to-2.11.sh mvn -Pyarn -Phadoop-2.4 -Pscala-2.11 -DskipTests clean package There was a failure in building catalyst, but core built just fine for me. The error I g

Re: Contributor CLAs

2015-04-07 Thread Nicholas Chammas
SGTM. On Tue, Apr 7, 2015 at 9:11 PM Sean Owen wrote: > Yeah, this is why this pops up when you open a PR: > https://github.com/apache/spark/blob/master/CONTRIBUTING.md > > Mostly, I want to take all reasonable steps to ensure that when > somebody offers a code contribution, that they are fine w

Re: Contributor CLAs

2015-04-07 Thread Matei Zaharia
You do actually sign a CLA when you become a committer, and in general, we should ask for CLAs from anyone who contributes a large piece of code. This is the individual CLA: https://www.apache.org/licenses/icla.txt. Some people have sent them proactively because their employer asks them too. Ma

Re: 1.3 Build Error with Scala-2.11

2015-04-07 Thread Marty Bower
Yes - ran dev/change-version-to-2.11.sh But was missing -Dscala-2.11 on mvn command after a -2.10 build. Building successfully again now after adding that. On Tue, Apr 7, 2015 at 7:04 PM Imran Rashid wrote: > did you run > > dev/change-version-to-2.11.sh > > before compiling? When I ran this o

[RESULT] [VOTE] Release Apache Spark 1.3.1

2015-04-07 Thread Patrick Wendell
This vote is cancelled in favor of RC2. On Tue, Apr 7, 2015 at 8:13 PM, Josh Rosen wrote: > The leak will impact long running streaming jobs even if they don't write > Hadoop files, although the problem may take much longer to manifest itself > for those jobs. > > I think we currently leak an e

[VOTE] Release Apache Spark 1.3.1 (RC2)

2015-04-07 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.3.1! The tag to be voted on is v1.3.1-rc2 (commit 7c4473a): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=7c4473aa5a7f5de0323394aaedeefbf9738e8eb5 The list of fixes present in this release can be found at: