LabeledPoint dump LibSVM if SparseVector

2014-05-11 Thread Debasish Das
Hi, I need to change the toString on LabeledPoint to libsvm format so that I can dump RDD[LabeledPoint] as a format that could be read by sparse glmnet-R and other packages to benchmark mllib classification accuracy... Basically I have to change the toString of LabeledPoint and toString of Sparse

Re: Spark on Scala 2.11

2014-05-11 Thread Matei Zaharia
We do want to support it eventually, possibly as early as Spark 1.1 (which we’d cross-build on Scala 2.10 and 2.11). If someone wants to look at it before, feel free to do so! Scala 2.11 is very close to 2.10 so I think things will mostly work, except for possibly the REPL (which has require por

Re: Spark on Scala 2.11

2014-05-11 Thread Koert Kuipers
i believe matei has said before that he would like to crossbuild for 2.10 and 2.11, given that the difference is not as big as between 2.9 and 2.10. but dont know when this would happen... On Sat, May 10, 2014 at 11:02 PM, Gary Malouf wrote: > Considering the team just bumped to 2.10 in 0.9, I

Re: Updating docs for running on Mesos

2014-05-11 Thread Andy Konwinski
Thanks for suggesting this and volunteering to do it. On May 11, 2014 3:32 AM, "Andrew Ash" wrote: > > The docs for how to run Spark on Mesos have changed very little since > 0.6.0, but setting it up is much easier now than then. Does it make sense > to revamp with the below changes? > > > You n

Re: Updating docs for running on Mesos

2014-05-11 Thread Patrick Wendell
Andrew, Updating these docs would be great! I think this would be a welcome change. In terms of packaging, it would be good to mention the binaries produced by the upstream project as well, in addition to Mesosphere. - Patrick On Thu, May 8, 2014 at 12:51 AM, Andrew Ash wrote: > The docs for h

Re: mllib vector templates

2014-05-11 Thread Debasish Das
Hi, I see ALS is still using Array[Int] but for other mllib algorithm we moved to Vector[Double] so that it can support either dense and sparse formats... ALS can stay in Array[Int] due to the Netflix format for input datasets which is well defined but it helps if we move ALS to Vector[Double] as