Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Xiangrui Meng
You need > val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable]) to load the data. After that, you can do > val data = raw.values.map(_.get) To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you launch spark-shell to include mahout-math. Best, Xiangr

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Mahout now supports doing its distributed linalg natively on Spark so the problem of sequence file input load into Spark is already solved there (trunk, http://mahout.apache.org/users/sparkbindings/home.html, drmFromHDFS() call -- and then you can access to the direct rdd via "rdd" matrix property

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Koert Kuipers
Means/HAR/KMeans_dataset_seq/part-r-0", > classOf[Text], classOf[VectorWritable]) > >^ > Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 > > Thanks > Stuti > > > > -Original Message- > From: Xiangrui Meng [mailto:men...@gmail.com] &

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
Of[VectorWritable]) >> >>^ >> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 >> >> Thanks >> Stuti >> >> >> >> -Original Message- >> From: Xiangrui Meng [mailto:men...@gmail.com] >> Se

Re: How to use Mahout VectorWritable in Spark.

2014-05-15 Thread Dmitriy Lyubimov
^ > Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 > > Thanks > Stuti > > > > -Original Message- > From: Xiangrui Meng [mailto:men...@gmail.com] > Sent: Wednesday, May 14, 2014 11:56 AM > To: user@spark.apache.org > Subject: Re: How to

Re: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Debasish Das
You will get 10x speedup by not using mahout vector and use breeze sparse vector from mllib in your mllib kmeans run @Xiangrui showed the comparison chart sometime back... On May 14, 2014 6:33 AM, "Xiangrui Meng" wrote: > You need > > > val raw = sc.sequenceFile(path, classOf[Text], classOf[

RE: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Stuti Awasthi
Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, May 14, 2014 11:56 AM To: user@spark.apache.org Subject: Re: How to use Mahout VectorWritable in Spark. You need > val raw = sc.sequenceFile(path, classOf[Text], > classOf[VectorWriteable]) to load the data.

RE: How to use Mahout VectorWritable in Spark.

2014-05-14 Thread Stuti Awasthi
nt: Wednesday, May 14, 2014 1:13 PM To: user@spark.apache.org Subject: RE: How to use Mahout VectorWritable in Spark. Hi Xiangrui, Thanks for the response .. I tried few ways to include mahout-math jar while launching Spark shell.. but no success.. Can you please point what I am doing wrong 1. ma

How to use Mahout VectorWritable in Spark.

2014-05-13 Thread Stuti Awasthi
Hi All, I am very new to Spark and trying to play around with Mllib hence apologies for the basic question. I am trying to run KMeans algorithm using Mahout and Spark MLlib to see the performance. Now initial datasize was 10 GB. Mahout converts the data in Sequence File which is used for KMean