Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Xiangrui Meng
You need > val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable]) to load the data. After that, you can do > val data = raw.values.map(_.get) To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you launch spark-shell to include mahout-math. Best, Xiangr

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Mahout now supports doing its distributed linalg natively on Spark so the problem of sequence file input load into Spark is already solved there (trunk, http://mahout.apache.org/users/sparkbindings/home.html, drmFromHDFS() call -- and then you can access to the direct rdd via "rdd" matrix property

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Koert Kuipers
Means/HAR/KMeans_dataset_seq/part-r-0", > classOf[Text], classOf[VectorWritable]) > >^ > Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 > > Thanks > Stuti > > > > -Original Message- > From: Xiangrui Meng [mailto:men...@gmail.com] &

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Of[VectorWritable]) >> >>^ >> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 >> >> Thanks >> Stuti >> >> >> >> -Original Message- >> From: Xiangrui Meng [mailto:men...@gmail.com] >> Se

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
^ > Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 > > Thanks > Stuti > > > > -Original Message- > From: Xiangrui Meng [mailto:men...@gmail.com] > Sent: Wednesday, May 14, 2014 11:56 AM > To: user@spark.apache.org > Subject: Re: How to

Re: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Debasish Das
You will get 10x speedup by not using mahout vector and use breeze sparse vector from mllib in your mllib kmeans run @Xiangrui showed the comparison chart sometime back... On May 14, 2014 6:33 AM, "Xiangrui Meng" wrote: > You need > > > val raw = sc.sequenceFile(path, classOf[Text], classOf[

RE: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Stuti Awasthi
Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, May 14, 2014 11:56 AM To: user@spark.apache.org Subject: Re: How to use Mahout VectorWritable in Spark. You need > val raw = sc.sequenceFile(path, classOf[Text], > classOf[VectorWriteable]) to load the data.

RE: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Stuti Awasthi
nt: Wednesday, May 14, 2014 1:13 PM To: user@spark.apache.org Subject: RE: How to use Mahout VectorWritable in Spark. Hi Xiangrui, Thanks for the response .. I tried few ways to include mahout-math jar while launching Spark shell.. but no success.. Can you please point what I am doing wrong 1. ma