You need
> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable])
to load the data. After that, you can do
> val data = raw.values.map(_.get)
To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar`
when you launch spark-shell to include mahout-math.
Best,
Xiangr
Mahout now supports doing its distributed linalg natively on Spark so the
problem of sequence file input load into Spark is already solved there
(trunk, http://mahout.apache.org/users/sparkbindings/home.html,
drmFromHDFS() call -- and then you can access to the direct rdd via "rdd"
matrix property
Means/HAR/KMeans_dataset_seq/part-r-0",
> classOf[Text], classOf[VectorWritable])
>
>^
> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>
> Thanks
> Stuti
>
>
>
> -Original Message-
> From: Xiangrui Meng [mailto:men...@gmail.com]
&
Of[VectorWritable])
>>
>>^
>> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>>
>> Thanks
>> Stuti
>>
>>
>>
>> -Original Message-
>> From: Xiangrui Meng [mailto:men...@gmail.com]
>> Se
^
> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>
> Thanks
> Stuti
>
>
>
> -Original Message-
> From: Xiangrui Meng [mailto:men...@gmail.com]
> Sent: Wednesday, May 14, 2014 11:56 AM
> To: user@spark.apache.org
> Subject: Re: How to
You will get 10x speedup by not using mahout vector and use breeze sparse
vector from mllib in your mllib kmeans run
@Xiangrui showed the comparison chart sometime back...
On May 14, 2014 6:33 AM, "Xiangrui Meng" wrote:
> You need
>
> > val raw = sc.sequenceFile(path, classOf[Text], classOf[
Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Wednesday, May 14, 2014 11:56 AM
To: user@spark.apache.org
Subject: Re: How to use Mahout VectorWritable in Spark.
You need
> val raw = sc.sequenceFile(path, classOf[Text],
> classOf[VectorWriteable])
to load the data.
nt: Wednesday, May 14, 2014 1:13 PM
To: user@spark.apache.org
Subject: RE: How to use Mahout VectorWritable in Spark.
Hi Xiangrui,
Thanks for the response .. I tried few ways to include mahout-math jar while
launching Spark shell.. but no success.. Can you please point what I am doing
wrong
1. ma