Hi all,
I'm trying to use the Naive Bayes classifier in Mahout to classify some
products data. This data was originally in a CSV file, and there are some
fields that have no value (i.e. a ,, in the CSV file).
I have used solr to convert both my datasets into lucene indexes, then used
the Mahout split command to create the training and holdout sets. This
appeared to work fine.
Now I am up to the stage of training the Naive Bayes model with trainnb,
but I'm receiving the following error:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:119)
at
org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:152)
at
org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:92)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.main(TrainNaiveBayesJob.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
This is my command input:
$MAHOUT_HOME/bin/./mahout trainnb -i
~/training_output/Amazon_training_output/ -el -o ~/model/Amazon -li
~/labelindex/Amazon -ow -c
What does the error mean in this context, and how do I resolve it? Is it
possible that my original index is to blame?
I'm pretty new to this so I'm open to any and all advice on how to go about
this.
Cheers,
-dcf