Re: Assorted project updates (tests, build, etc)

2014-07-03 Thread Patrick Wendell
Sorry all, I sent the wrong pull request to refer to Prashant's work: https://github.com/apache/spark/pull/772 On Thu, Jul 3, 2014 at 1:37 PM, Patrick Wendell wrote: > Just a reminder here - we'll soon be merging a patch that changes the > SBT build internals significantly. We've tried to make t

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-03 Thread Reynold Xin
Yes, that number is likely == 0 in any real workload ... On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan wrote: > On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin wrote: > > On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan > > wrote: > > > >> > >> > > >> > The other thing we do need is the

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-03 Thread Reynold Xin
Note that in my original proposal, I was suggesting we could track whether block size = 0 using a compressed bitmap. That way we can still avoid requests for zero-sized blocks. On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin wrote: > Yes, that number is likely == 0 in any real workload ... > > > O

Re: Assorted project updates (tests, build, etc)

2014-07-03 Thread Patrick Wendell
Just a reminder here - we'll soon be merging a patch that changes the SBT build internals significantly. We've tried to make this fully backwards compatible, but there may be issues (which we'll resolve as they arrive). https://github.com/apache/spark/pull/77 - Patrick On Sun, Jun 22, 2014 at 10

Re: PLSA

2014-07-03 Thread Debasish Das
Hi Denis, Are you using matrix factorization to generate the latent factors ? Thanks. Deb On Thu, Jul 3, 2014 at 8:49 AM, Denis Turdakov wrote: > Hello guys, > > We made pull request with PLSA and its modifications: > - https://github.com/apache/spark/pull/1269 > - JIRA issue SPARK-2199 > Co

Re: task always lost

2014-07-03 Thread Aaron Davidson
The issue you're seeing is not the same as the one you linked to -- your serialized task sizes are very small, and Mesos fine-grained mode doesn't use Akka anyway. The error log you printed seems to be from some sort of Mesos logs, but do you happen to have the logs from the actual executors thems

Re: Pass parameters to RDD functions

2014-07-03 Thread Aaron Davidson
Either Serializable works, scala Serializable extends Java's (originally intended a common interface for people who didn't want to run Scala on a JVM). Class fields require the class be serialized along with the object to access. If you declared "val n" inside a method's scope instead, though, we

PLSA

2014-07-03 Thread Denis Turdakov
Hello guys, We made pull request with PLSA and its modifications: - https://github.com/apache/spark/pull/1269 - JIRA issue SPARK-2199 Could somebody look at the code and provide some feedback what we should improve. Best regards, Denis Turdakov -- View this message in context: http://apache-

Re: Contributing to MLlib

2014-07-03 Thread salexln
thanks for the input. at the moment , I don't have any code commits yet. I wanted to discuss the algorithm implementation prior to the code submission. (never work with Git\ GutHub - so I hope this isn't very basic stuff) -- View this message in context: http://apache-spark-developer

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-03 Thread Mridul Muralidharan
On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin wrote: > On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan > wrote: > >> >> > >> > The other thing we do need is the location of blocks. This is actually >> just >> > O(n) because we just need to know where the map was run. >> >> For well partitioned

RE: Pass parameters to RDD functions

2014-07-03 Thread Ulanov, Alexander
Thanks, this works both with Scala and Java Serializable. Which one should I use? Related question: I would like only the particular val to be used instead of the whole class, what should I do? As far as I understand, the whole class is serialized and transferred between nodes (am I right?) Al

Re: Pass parameters to RDD functions

2014-07-03 Thread Sean Owen
Declare this class with "extends Serializable", meaning java.io.Serializable? On Thu, Jul 3, 2014 at 12:24 PM, Ulanov, Alexander wrote: > Hi, > > I wonder how I can pass parameters to RDD functions with closures. If I do it > in a following way, Spark crashes with NotSerializableException: > > c

Pass parameters to RDD functions

2014-07-03 Thread Ulanov, Alexander
Hi, I wonder how I can pass parameters to RDD functions with closures. If I do it in a following way, Spark crashes with NotSerializableException: class TextToWordVector(csvData:RDD[Array[String]]) { val n = 1 lazy val x = csvData.map{ stringArr => stringArr(n)}.collect() } Exception: Job

Re: Constraint Solver for Spark

2014-07-03 Thread Debasish Das
Hi Xiangrui, I did some out-of-box comparisons with ECOS and PDCO from SOL. Both of them seems to be running at par but I will do more detailed analysis. I used pdco's testQP randomized problem generation. pdcotestQP(m, n) means m constraints and n variables For ECOS runtime reference here is t

RE: Artificial Neural Network in Spark?

2014-07-03 Thread Bert Greevenbosch
Hi Alexander, all, I now have uploaded the code (see links below), and look forward to learn about the outcome of your experiments! Best regards, Bert --- https://github.com/apache/spark/pull/1290 https://issues.apache.org/jira/browse/SPARK-2352 > -Original Message- > From: Ulanov, A

RE: Artificial Neural Network in Spark?

2014-07-03 Thread Bert Greevenbosch
Hi Debasish, all, Thanks for your feedback. I have submitted the code to GitHub and created a Jira ticket (links below). The ANN uses back-propagation with the Steepest Gradient Descent (SGD) method. Best regards, Bert https://github.com/apache/spark/pull/1290 https://issues.apache.org/jira/br

Re: Contributing to MLlib

2014-07-03 Thread Xiangrui Meng
Alex, please send the pull request to apache/spark instead of your own repo, following the instructions in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Thanks, Xiangrui On Wed, Jul 2, 2014 at 12:41 PM, RJ Nowling wrote: > Hey Alex, > > I'm also a new contributor. I c