Re: example of hashing vectorizer for text data using mapreduce code

chirag lakhani Tue, 06 Jan 2015 10:48:16 -0800

I believe I may have found a solution to this problem which I will try to
eventually put on github but now I am not sure how to run this on the
cluster.  I have created the code on my eclipse IDE as a maven project and
then copied the jar file to the Hadoop cluster (vectorCode-1.0.jar)


I know try to run it as follow

hadoop jar vectorCode-1.0.jar vectorCode.vectorMapReduce hdfs://
172.28.104.198/trainingFourColumns hdfs://
172.28.104.198/trainingMahoutVectors3


but I get the following error

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/mahout/math/VectorWritable
at transactionCode.transactionMapReduce.main(transactionMapReduce.java:130)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException:
org.apache.mahout.math.VectorWritable
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)



It seems that the mahout vector libraries are not being included.  How
would I include them in my MapReduce job?

Chirag

On Mon, Jan 5, 2015 at 5:28 PM, chirag lakhani <[email protected]>
wrote:

> I am trying to emulate something similar to what was done in this chimpler
> example
>
>
> https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
>
>
> If you have data like this
>
> tech    308215054011194110      Limited 3-Box $20 BOGO, Supreme $9 BOGO,
>
> art     308215054011194118      Purchase The Jeopardy! Book by Alex Trebek
>
> apparel 308215054011194146      #Shopping #Bargain #Deals Designer KATHY Van 
> Zeeland
>
>
>
> I would like to write map-reduce code that will take each record and
> ultimately create a sequence file of mahout vectors that can then be used
> by the Naive Bayes algorithm.  I have not been able to find any examples of
> this seemingly basic task online.  A few things that confuse me about
> writing such code is how do you call Lucene analyzers and vectorizers so
> that they are consistent among each map-task.  Could someone provide either
> an example of this online or some advice about how I would do such a
> thing?  My understanding is that I would want the first column to be the
> key and the vectorized form of the third column to be the value of this
> sequence file.
>
> Chimpler provides some code but it seems to be done using a local file
> system instead of in the map-reduce framework.
>
> Chirag
>
>
>

Re: example of hashing vectorizer for text data using mapreduce code

Reply via email to