The issue of "<console>:12: error: not found: type Text" is resolved by import
statement.. But still facing issue with imports of VectorWritable.
Mahout math jar is added to classpath as I can check on WebUI as well on shell
scala> System.getenv
res1: java.util.Map[String,String] = {TERM=xterm,
JAVA_HOME=/usr/lib/jvm/java-6-openjdk, SHLVL=2,
SHELL_JARS=/home/hduser/installations/work-space/mahout-math-0.7.jar,
SPARK_MASTER_WEBUI_PORT=5050, LESSCLOSE=/usr/bin/lesspipe %s %s,
SSH_CLIENT=10.112.67.149 55123 22,
SPARK_HOME=/home/hduser/installations/spark-0.9.0, MAIL=/var/mail/hduser,
SPARK_WORKER_DIR=/tmp/spark-hduser-worklogs/work,
XDG_SESSION_COOKIE=fbd2e4304c8c75dd606c361000000186-1400039480.256868-916349946,
https_proxy=https://DS-1078D2486320:3128/, NICKNAME=vm01, JAVA_OPTS=
-Djava.library.path= -Xms512m -Xmx512m,
PWD=/home/hduser/installations/work-space/KMeansClustering_1,
SSH_TTY=/dev/pts/0, SPARK_MASTER_PORT=7077, LOGNAME=hduser,
MASTER=spark://VM-52540048731A:7077, SPARK_WORKER_MEMORY=2g,
HADOOP_HOME=/usr/lib/hadoop, SS...
Still not able to import Mahout Classes.. Any ideas ??
Thanks
Stuti Awasthi
-----Original Message-----
From: Stuti Awasthi
Sent: Wednesday, May 14, 2014 1:13 PM
To: [email protected]
Subject: RE: How to use Mahout VectorWritable in Spark.
Hi Xiangrui,
Thanks for the response .. I tried few ways to include mahout-math jar while
launching Spark shell.. but no success.. Can you please point what I am doing
wrong
1. mahout-math.jar exported in CLASSPATH, and PATH 2. Tried Launching Spark
Shell by : MASTER=spark://<HOSTNAME>:<PORT>
ADD_JARS=~/installations/work-space/mahout-math-0.7.jar
park-0.9.0/bin/spark-shell
After launching, I checked the environment details on WebUi: It looks like
mahout-math jar is included.
spark.jars /home/hduser/installations/work-space/mahout-math-0.7.jar
Then I try :
scala> import org.apache.mahout.math.VectorWritable
<console>:10: error: object mahout is not a member of package org.apache
import org.apache.mahout.math.VectorWritable
scala> val raw = sc.sequenceFile(path, classOf[Text], classOf[VectorWritable])
<console>:12: error: not found: type Text
val data =
sc.sequenceFile("/stuti/ML/Clustering/KMeans/HAR/KMeans_dataset_seq/part-r-00000",
classOf[Text], classOf[VectorWritable])
^ Im using Spark 0.9 and Hadoop 1.0.4 and Mahout
0.7
Thanks
Stuti
-----Original Message-----
From: Xiangrui Meng [mailto:[email protected]]
Sent: Wednesday, May 14, 2014 11:56 AM
To: [email protected]
Subject: Re: How to use Mahout VectorWritable in Spark.
You need
> val raw = sc.sequenceFile(path, classOf[Text],
> classOf[VectorWriteable])
to load the data. After that, you can do
> val data = raw.values.map(_.get)
To get an RDD of mahout's Vector. You can use `--jar mahout-math.jar` when you
launch spark-shell to include mahout-math.
Best,
Xiangrui
On Tue, May 13, 2014 at 10:37 PM, Stuti Awasthi <[email protected]> wrote:
> Hi All,
>
> I am very new to Spark and trying to play around with Mllib hence
> apologies for the basic question.
>
>
>
> I am trying to run KMeans algorithm using Mahout and Spark MLlib to
> see the performance. Now initial datasize was 10 GB. Mahout converts
> the data in Sequence File <Text,VectorWritable> which is used for KMeans
> Clustering.
> The Sequence File crated was ~ 6GB in size.
>
>
>
> Now I wanted if I can use the Mahout Sequence file to be executed in
> Spark MLlib for KMeans . I have read that SparkContext.sequenceFile
> may be used here. Hence I tried to read my sequencefile as below but getting
> the error :
>
>
>
> Command on Spark Shell :
>
> scala> val data = sc.sequenceFile[String,VectorWritable]("/
> KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
> <console>:12: error: not found: type VectorWritable
>
> val data = sc.sequenceFile[String,VectorWritable]("
> /KMeans_dataset_seq/part-r-00000",String,VectorWritable)
>
>
>
> Here I have 2 ques:
>
> 1. Mahout has “Text” as Key but Spark is printing “not found: type:Text”
> hence I changed it to String.. Is this correct ???
>
> 2. How will VectorWritable be found in Spark. Do I need to include
> Mahout jar in Classpath or any other option ??
>
>
>
> Please Suggest
>
>
>
> Regards
>
> Stuti Awasthi
>
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted, lost, destroyed, arrive
> late or incomplete, or may contain viruses in transmission. The e mail
> and its contents (with or without referred errors) shall therefore not
> attach any liability on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of
> the author and may not necessarily reflect the views or opinions of
> HCL or its affiliates. Any form of reproduction, dissemination,
> copying, disclosure, modification, distribution and / or publication
> of this message without the prior written consent of authorized
> representative of HCL is strictly prohibited. If you have received
> this email in error please delete it and notify the sender
> immediately.
> Before opening any email and/or attachments, please check them for
> viruses and other defects.
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------