Re: Problem with K-Means clustering on Amazon EMR

Frank Scholten Sun, 16 Mar 2014 06:23:24 -0700

Hi Konstantin,

Good to hear from you.


The link you mentioned points to EigenSeedGenerator not
RandomSeedGenerator. The problem seems to be with the call to

fs.getFileStatus(input).isDir()


It's been a while and I don't remember but perhaps you have to set
additional Hadoop fs properties to use S3. See
https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of
this by creating a small Java main app with that line of code and run it in
the debugger.

Cheers,

Frank



On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko
<[email protected]>wrote:

> Hello!
>
> I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map
> Reduce. As input and output I use S3 Amazon file system. I specify all
> paths as "s3://bucket-name/folder-name".
>
> SparceVectorsFromSequenceFile works correctly with S3
> but when I start K-Means clustering job, I get this error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: This
> file system object (hdfs://172.31.41.65:9000) does not support access
> to the request path
>
> 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors'
> You possibly called FileSystem.get(conf) when you should have called
> FileSystem.get(uri, conf) to obtain a file system supporting your
> path.
>
>         at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530)
>         at
> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76)
>         at
> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121)
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause
> of this a
>         at
> bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> I checked RandomSeedGenerator.buildRandom
> (
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f
> )
> and I assume it has correct code:
>
> FileSystem fs = FileSystem.get(output.toUri(), conf);
>
>
> I can not run clustering because of this error. May be you have any
> ideas how to fix this?
>

Re: Problem with K-Means clustering on Amazon EMR

Reply via email to