Re: Scala 2.11 external dependencies

2014-08-05 Thread Anand Avati
On Mon, Aug 4, 2014 at 1:01 PM, Anand Avati wrote: > > > > On Sun, Aug 3, 2014 at 9:09 PM, Patrick Wendell > wrote: > >> Hey Anand, >> >> Thanks for looking into this - it's great to see momentum towards Scala >> 2.11 and I'd love if this land in Spark 1.2. >> >> For the external dependencies, i

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-05 Thread Debasish Das
Hi Xiangrui, I used your idea and kept a cherry picked version of ALS.scala in my application and call it ALSQp.scala...this is a OK workaround for now till a version adds up to master for example... For the bug with userClassPathFirst, looks like Koert already found this issue in the following J

Unit test best practice for Spark-derived projects

2014-08-05 Thread Dmitriy Lyubimov
Hello, I 've been switching Mahout from Spark 0.9 to Spark 1.0.x [1] and noticed that tests now run much slower compared to 0.9 with CPU running idle most of the time. I had to conclude that most of that time is spent on tearing down/resetting Spark context which apparently now takes significantly

Re: Hello All

2014-08-05 Thread Burak Yavuz
Hi Guru, Take a look at: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark It has all the information you need on how to contribute to Spark. Also take a look at: https://issues.apache.org/jira/browse/SPARK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

Hello All

2014-08-05 Thread Gurumurthy Yeleswarapu
Im new to Spark community. Actively working on Hadoop eco system ( more specifically YARN). I'm very keen on getting my hands dirtily with Spark. Please let me know any pointers to start with. Thanks in advance Best regards Guru Yeleswarapu

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
The existing callback does take care of it: within the DAGScheduler there is a finally block to ensure the callbacks are executed. try { val result = job.func(taskContext, rdd.iterator(split, taskContext)) job.listener.taskSucceeded(0, result) } finally { task

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Reynold Xin
Yes it is. I actually commented on it: https://github.com/apache/spark/pull/1792/files#r15840899 On Tue, Aug 5, 2014 at 1:58 PM, Cody Koeninger wrote: > The stmt.isClosed just looks like stupidity on my part, no secret > motivation :) Thanks for noticing it. > > As for the leaking in the case

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
Hi yes that callback takes care of it. thanks! 2014-08-05 13:58 GMT-07:00 Cody Koeninger : > The stmt.isClosed just looks like stupidity on my part, no secret > motivation :) Thanks for noticing it. > > As for the leaking in the case of malformed statements, isn't that > addressed by > > contex

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Cody Koeninger
The stmt.isClosed just looks like stupidity on my part, no secret motivation :) Thanks for noticing it. As for the leaking in the case of malformed statements, isn't that addressed by context.addOnCompleteCallback{ () => closeIfNeeded() } or am I misunderstanding? On Tue, Aug 5, 2014 at 3:15

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Reynold Xin
Thanks. Those are definitely great problems to fix! On Tue, Aug 5, 2014 at 1:11 PM, Stephen Boesch wrote: > Thanks Reynold, Ted Yu did mention offline and I put in a jira already. > Another small concern: there appears to be no exception handling from the > creation of the prepared statement (

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
Thanks Reynold, Ted Yu did mention offline and I put in a jira already. Another small concern: there appears to be no exception handling from the creation of the prepared statement (line 74) through to the executeQuery (line 86). In case of error/exception it would seem to be leaking connections

Re: Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Reynold Xin
I'm pretty sure it is an oversight. Would you like to submit a pull request to fix that? On Tue, Aug 5, 2014 at 12:14 PM, Stephen Boesch wrote: > Within its compute.close method, the JdbcRDD class has this interesting > logic for closing jdbc connection: > > > try { > if (null !=

Tiny curiosity question on closing the jdbc connection

2014-08-05 Thread Stephen Boesch
Within its compute.close method, the JdbcRDD class has this interesting logic for closing jdbc connection: try { if (null != conn && ! stmt.isClosed()) conn.close() logInfo("closed connection") } catch { case e: Exception => logWarning("Exception closing connec

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-05 Thread Xiangrui Meng
If you cannot change the Spark jar deployed on the cluster, an easy solution would be renaming ALS in your jar. If userClassPathFirst doesn't work, could you create a JIRA and attach the log? Thanks! -Xiangrui On Tue, Aug 5, 2014 at 9:10 AM, Debasish Das wrote: > I created the assembly file but s

Re: -1s on pull requests?

2014-08-05 Thread Xiangrui Meng
I think the build number is included in the SparkQA message, for example: https://github.com/apache/spark/pull/1788 The build number 17941 is in the URL "https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17941/consoleFull";. Just need to be careful to match the number. Another so

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-05 Thread Debasish Das
I created the assembly file but still it wants to pick the mllib from the cluster: jar tf ./target/ml-0.0.1-SNAPSHOT-jar-with-dependencies.jar | grep QuadraticMinimizer org/apache/spark/mllib/optimization/QuadraticMinimizer$$anon$1.class /Users/v606014/dist-1.0.1/bin/spark-submit --master spark:

Spark maven project with the latest Spark jars

2014-08-05 Thread Ulanov, Alexander
Hi, I'm trying to create a maven project that references the latest build of Spark. 1)downloaded sources and compiled the latest version of Spark. 2)added new spark-core jar to the a new local maven repo 3)created Scala maven project with net.alchim31.maven (scala-archetype-simple v 1.5) 4)added

Re: -1s on pull requests?

2014-08-05 Thread Nicholas Chammas
> > 1. Include the commit hash in the "tests have started/completed" > FYI: Looks like Xiangrui's already got a JIRA issue for this. SPARK-2622: Add Jenkins build numbers to SparkQA messages 2. "Pin" a message to the start or end of the PR Sho

Re: Low Level Kafka Consumer for Spark

2014-08-05 Thread Dibyendu Bhattacharya
Hi This fault tolerant aspect already taken care in the Kafka-Spark Consumer code , like if Leader of a partition changes etc.. in ZkCoordinator.java. Basically it does a refresh of PartitionManagers every X seconds to make sure Partition details is correct and consumer don't fail. Dib On Tue,

any interest in something like rdd.parent[T](n) (equivalent to firstParent[T] for n==0) ?

2014-08-05 Thread Erik Erlandson
Not that rdd.dependencies(n).rdd.asInstanceOf[RDD[T]] is terrible, but rdd.parent[T](n) better captures the intent. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apac

RE: Low Level Kafka Consumer for Spark

2014-08-05 Thread Shao, Saisai
Hi, I think this is an awesome feature for Spark Streaming Kafka interface to offer user the controllability of partition offset, so user can have more applications based on this. What I concern is that if we want to do offset management, fault tolerant related control and others, we have to t

Re: -1s on pull requests?

2014-08-05 Thread Mridul Muralidharan
Just came across this mail, thanks for initiating this discussion Kay. To add; another issue which recurs is very rapid commit's: before most contributors have had a chance to even look at the changes proposed. There is not much prior discussion on the jira or pr, and the time between submitting th

Re: Low Level Kafka Consumer for Spark

2014-08-05 Thread Dibyendu Bhattacharya
Thanks Jonathan, Yes, till non-ZK based offset management is available in Kafka, I need to maintain the offset in ZK. And yes, both cases explicit commit is necessary. I modified the Low Level Kafka Spark Consumer little bit to have Receiver spawns threads for every partition of the topic and perf