Problem with K-Means clustering on Amazon EMR

Konstantin Slisenko Sun, 16 Mar 2014 04:08:45 -0700

Hello!

I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
Reduce. As input and output I use S3 Amazon file system. I specify all
paths as "s3://bucket-name/folder-name".


SparceVectorsFromSequenceFile works correctly with S3
but when I start K-Means clustering job, I get this error:

Exception in thread "main" java.lang.IllegalArgumentException: This
file system object (hdfs://172.31.41.65:9000) does not support access
to the request path
's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
You possibly called FileSystem.get(conf) when you should have called
FileSystem.get(uri, conf) to obtain a file system supporting your
path.

        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
        at 
org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
        at 
bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
of this a
        at 
bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I checked RandomSeedGenerator.buildRandom
(http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f)
and I assume it has correct code:

FileSystem fs = FileSystem.get(output.toUri(), conf);


I can not run clustering because of this error. May be you have any
ideas how to fix this?

Problem with K-Means clustering on Amazon EMR

Reply via email to