You need
> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWriteable])
to load the data. After that, you can do
> val data = raw.values.map(_.get)
To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar`
when you launch spark-shell to include mahout-math.
Best,
Xiangr
Mahout now supports doing its distributed linalg natively on Spark so the
problem of sequence file input load into Spark is already solved there
(trunk, http://mahout.apache.org/users/sparkbindings/home.html,
drmFromHDFS() call -- and then you can access to the direct rdd via "rdd"
matrix property
Means/HAR/KMeans_dataset_seq/part-r-0",
> classOf[Text], classOf[VectorWritable])
>
>^
> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>
> Thanks
> Stuti
>
>
>
> -Original Message-
> From: Xiangrui Meng [mailto:men...@gmail.com]
&
Of[VectorWritable])
>>
>>^
>> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>>
>> Thanks
>> Stuti
>>
>>
>>
>> -Original Message-
>> From: Xiangrui Meng [mailto:men...@gmail.com]
>> Se
^
> Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7
>
> Thanks
> Stuti
>
>
>
> -Original Message-
> From: Xiangrui Meng [mailto:men...@gmail.com]
> Sent: Wednesday, May 14, 2014 11:56 AM
> To: user@spark.apache.org
> Subject: Re: How to
You will get 10x speedup by not using mahout vector and use breeze sparse
vector from mllib in your mllib kmeans run
@Xiangrui showed the comparison chart sometime back...
On May 14, 2014 6:33 AM, "Xiangrui Meng" wrote:
> You need
>
> > val raw = sc.sequenceFile(path, classOf[Text], classOf[
Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Wednesday, May 14, 2014 11:56 AM
To: user@spark.apache.org
Subject: Re: How to use Mahout VectorWritable in Spark.
You need
> val raw = sc.sequenceFile(path, classOf[Text],
> classOf[VectorWriteable])
to load the data.
nt: Wednesday, May 14, 2014 1:13 PM
To: user@spark.apache.org
Subject: RE: How to use Mahout VectorWritable in Spark.
Hi Xiangrui,
Thanks for the response .. I tried few ways to include mahout-math jar while
launching Spark shell.. but no success.. Can you please point what I am doing
wrong
1. ma
Hi All,
I am very new to Spark and trying to play around with Mllib hence apologies for
the basic question.
I am trying to run KMeans algorithm using Mahout and Spark MLlib to see the
performance. Now initial datasize was 10 GB. Mahout converts the data in
Sequence File which is used for KMean