Hi, this prolem troubles me for couples of days, I sitll can not find any reason, is there any one can HELP?
Here is my cmd : mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50 First, I tried this cmd with the same data on a single VM, and there is no prolem :) and then, I tried this on a real web server cluster, and things below happened :( 15:54 [username@servername]$ mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/hadoop/default/bin/hadoop and HADOOP_CONF_DIR=/opt/hadoop/default/etc/hadoop MAHOUT-JOB: /home/username/apps/mahout-distribution-0.9/mahout-examples-0.9-job.jar 14/11/04 15:55:30 ERROR common.AbstractJob: Unexpected sequencefile while processing Job-Specific Options: Unexpected sequencefile while processing Job-Specific Options: Usage: [--input <input> --output <output> --outputFormat <outputFormat> --substring <substring> --numWords <numWords> --pointsDir <pointsDir> --samplePoints <samplePoints> --dictionary <dictionary> --dictionaryType <dictionaryType> --evaluate --distanceMeasure <distanceMeasure> --help --tempDir <tempDir> --startPhase <startPhase> --endPhase <endPhase>] Job-Specific Options: --input (-i) input Path to job input directory. --output (-o) output The directory pathname for output. --outputFormat (-of) outputFormat The optional output format for the results. Options: TEXT, CSV, JSON or GRAPH_ML --substring (-b) substring The number of chars of the asFormatString() to print --numWords (-n) numWords The number of top terms to print --pointsDir (-p) pointsDir The directory containing points sequence files mapping input vectors to their cluster. If specified, then the program will output the points associated with a cluster --samplePoints (-sp) samplePoints Specifies the maximum number of points to include _per_ cluster. The default is to include all points --dictionary (-d) dictionary The dictionary file --dictionaryType (-dt) dictionaryType The dictionary file type (text|sequencefile) --evaluate (-e) Run ClusterEvaluator and CDbwEvaluator over the input. The output will be appended to the rest of the output at the end. --distanceMeasure (-dm) distanceMeasure The classname of the DistanceMeasure. Default is SquaredEuclidean --help (-h) Print out help --tempDir tempDir Intermediate output directory --startPhase startPhase First phase to run --endPhase endPhase Last phase to run 14/11/04 15:55:31 INFO driver.MahoutDriver: Program took 439 ms (Minutes: 0.007316666666666667) PS: I configed mahout in /home/username/.bashrc: # set mahout path export MAHOUT_HOME=/home/username/apps/mahout-distribution-0.9 export MAHOUT_LOCAL= export PATH=$PATH:$MAHOUT_HOME/bin export CLASSPATH=$CLASSPATH:$MAHOUT_HOME/mahout-core-0.9.jar:$MAHOUT_HOME/mahout-math-0.9.jar:$MAHOUT_HOME/mahout-integration-0.9.jar Thanks a lot Spike
