Hi Xiangrui, Thanks for the response .. I tried few ways to include mahout-math jar while launching Spark shell.. but no success.. Can you please point what I am doing wrong
1. mahout-math.jar exported in CLASSPATH, and PATH 2. Tried Launching Spark Shell by : MASTER=spark://<HOSTNAME>:<PORT> ADD_JARS=~/installations/work-space/mahout-math-0.7.jar park-0.9.0/bin/spark-shell After launching, I checked the environment details on WebUi: It looks like mahout-math jar is included. spark.jars /home/hduser/installations/work-space/mahout-math-0.7.jar Then I try : scala> import org.apache.mahout.math.VectorWritable <console>:10: error: object mahout is not a member of package org.apache import org.apache.mahout.math.VectorWritable scala> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWritable]) <console>:12: error: not found: type Text val data = sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000", classOf[Text], classOf[VectorWritable]) ^ Im using Spark 0.9 and Hadoop 1.0.4 and Mahout 0.7 Thanks Stuti -----Original Message----- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Wednesday, May 14, 2014 11:56 AM To: user@spark.apache.org Subject: Re: How to use Mahout VectorWritable in Spark. You need > val raw = sc.sequenceFile(path, classOf[Text], > classOf[VectorWriteable]) to load the data. After that, you can do > val data = raw.values.map(_.get) To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you launch spark-shell to include mahout-math. Best, Xiangrui On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <stutiawas...@hcl.com> wrote: > Hi All, > > I am very new to Spark and trying to play around with Mllib hence > apologies for the basic question. > > > > I am trying to run KMeans algorithm using Mahout and Spark MLlib to > see the performance. Now initial datasize was 10 GB. Mahout converts > the data in Sequence File <Text,VectorWritable> which is used for KMeans > Clustering. > The Sequence File crated was ~ 6GB in size. > > > > Now I wanted if I can use the Mahout Sequence file to be executed in > Spark MLlib for KMeans . I have read that SparkContext.sequenceFile > may be used here. Hence I tried to read my sequencefile as below but getting > the error : > > > > Command on Spark Shell : > > scala> val data = sc.sequenceFile[String,VectorWritable]("/ > KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > <console>:12: error: not found: type VectorWritable > > val data = sc.sequenceFile[String,VectorWritable](" > /KMeans_dataset_seq/part-r-00000",String,VectorWritable) > > > > Here I have 2 ques: > > 1. Mahout has “Text” as Key but Spark is printing “not found: type:Text” > hence I changed it to String.. Is this correct ??? > > 2. How will VectorWritable be found in Spark. Do I need to include > Mahout jar in Classpath or any other option ?? > > > > Please Suggest > > > > Regards > > Stuti Awasthi > > > > ::DISCLAIMER:: > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > -------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive > late or incomplete, or may contain viruses in transmission. The e mail > and its contents (with or without referred errors) shall therefore not > attach any liability on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of > the author and may not necessarily reflect the views or opinions of > HCL or its affiliates. Any form of reproduction, dissemination, > copying, disclosure, modification, distribution and / or publication > of this message without the prior written consent of authorized > representative of HCL is strictly prohibited. If you have received > this email in error please delete it and notify the sender > immediately. > Before opening any email and/or attachments, please check them for > viruses and other defects. > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > --------