I believe I may have found a solution to this problem which I will try to eventually put on github but now I am not sure how to run this on the cluster. I have created the code on my eclipse IDE as a maven project and then copied the jar file to the Hadoop cluster (vectorCode-1.0.jar)
I know try to run it as follow hadoop jar vectorCode-1.0.jar vectorCode.vectorMapReduce hdfs:// 172.28.104.198/trainingFourColumns hdfs:// 172.28.104.198/trainingMahoutVectors3 but I get the following error Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/VectorWritable at transactionCode.transactionMapReduce.main(transactionMapReduce.java:130) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.VectorWritable at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) It seems that the mahout vector libraries are not being included. How would I include them in my MapReduce job? Chirag On Mon, Jan 5, 2015 at 5:28 PM, chirag lakhani <[email protected]> wrote: > I am trying to emulate something similar to what was done in this chimpler > example > > > https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ > > > If you have data like this > > tech 308215054011194110 Limited 3-Box $20 BOGO, Supreme $9 BOGO, > > art 308215054011194118 Purchase The Jeopardy! Book by Alex Trebek > > apparel 308215054011194146 #Shopping #Bargain #Deals Designer KATHY Van > Zeeland > > > > I would like to write map-reduce code that will take each record and > ultimately create a sequence file of mahout vectors that can then be used > by the Naive Bayes algorithm. I have not been able to find any examples of > this seemingly basic task online. A few things that confuse me about > writing such code is how do you call Lucene analyzers and vectorizers so > that they are consistent among each map-task. Could someone provide either > an example of this online or some advice about how I would do such a > thing? My understanding is that I would want the first column to be the > key and the vectorized form of the third column to be the value of this > sequence file. > > Chimpler provides some code but it seems to be done using a local file > system instead of in the map-reduce framework. > > Chirag > > >
