Hi, I need to change the toString on LabeledPoint to libsvm format so that I can dump RDD[LabeledPoint] as a format that could be read by sparse glmnet-R and other packages to benchmark mllib classification accuracy...
Basically I have to change the toString of LabeledPoint and toString of SparseVector.... Should I add it as a PR or is it already being added ? I added these functions toLibSvm in my internal util class for now... def toLibSvm(labelPoint: LabeledPoint): String = { labelPoint.label.toString + " " + toLibSvm(labelPoint.features .asInstanceOf[SparseVector]) } def toLibSvm(features: SparseVector): String = { val indices = features.indices val values = features.values indices.zip(values).mkString(" ").replace(',', ':').replace("(", "" ).replace(")","") } Thanks. Deb On Fri, May 9, 2014 at 10:09 PM, mateiz <g...@git.apache.org> wrote: > Github user mateiz commented on a diff in the pull request: > > https://github.com/apache/spark/pull/685#discussion_r12502569 > > --- Diff: > mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala --- > @@ -100,4 +100,27 @@ class VectorsSuite extends FunSuite { > assert(vec2(6) === 4.0) > assert(vec2(7) === 0.0) > } > + > + test("parse vectors") { > + val vectors = Seq( > + Vectors.dense(Array.empty[Double]), > + Vectors.dense(1.0), > + Vectors.dense(1.0, 0.0, -2.0), > + Vectors.sparse(0, Array.empty[Int], Array.empty[Double]), > + Vectors.sparse(1, Array(0), Array(1.0)), > + Vectors.sparse(3, Array(0, 2), Array(1.0, -2.0))) > + vectors.foreach { v => > + val v1 = Vectors.parse(v.toString) > + assert(v.getClass === v1.getClass) > + assert(v === v1) > + } > + > + val malformatted = Seq("1", "[1,,]", "[1,2", "(1,[1,2])", > "(1,[1],[2.0,1.0])") > + malformatted.foreach { s => > + intercept[RuntimeException] { > --- End diff -- > > Should be Exception instead > > > --- > If your project is set up for it, you can reply to this email and have your > reply appear on GitHub as well. If your project does not have this feature > enabled and wishes so, or if the feature is enabled but not working, please > contact infrastructure at infrastruct...@apache.org or file a JIRA ticket > with INFRA. > --- >