Hi Konstantin, Good to hear from you.
The link you mentioned points to EigenSeedGenerator not RandomSeedGenerator. The problem seems to be with the call to fs.getFileStatus(input).isDir() It's been a while and I don't remember but perhaps you have to set additional Hadoop fs properties to use S3. See https://wiki.apache.org/hadoop/AmazonS3. Perhaps you isolate the cause of this by creating a small Java main app with that line of code and run it in the debugger. Cheers, Frank On Sun, Mar 16, 2014 at 12:07 PM, Konstantin Slisenko <[email protected]>wrote: > Hello! > > I run a text-documents clustering on Hadoop cluster in Amazon Elastic Map > Reduce. As input and output I use S3 Amazon file system. I specify all > paths as "s3://bucket-name/folder-name". > > SparceVectorsFromSequenceFile works correctly with S3 > but when I start K-Means clustering job, I get this error: > > Exception in thread "main" java.lang.IllegalArgumentException: This > file system object (hdfs://172.31.41.65:9000) does not support access > to the request path > > 's3://by.kslisenko.bigdata/stackovweflow-small/out_new/sparse/tfidf-vectors' > You possibly called FileSystem.get(conf) when you should have called > FileSystem.get(uri, conf) to obtain a file system supporting your > path. > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:375) > at > org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:106) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:162) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:530) > at > org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:76) > at > org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:93) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > bbuzz2011.stackoverflow.runner.RunnerWithInParams.cluster(RunnerWithInParams.java:121) > at > bbuzz2011.stackoverflow.runner.RunnerWithInParams.run(RunnerWithInParams.java:52)cause > of this a > at > bbuzz2011.stackoverflow.runner.RunnerWithInParams.main(RunnerWithInParams.java:41) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > I checked RandomSeedGenerator.buildRandom > ( > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.8/org/apache/mahout/clustering/kmeans/EigenSeedGenerator.java?av=f > ) > and I assume it has correct code: > > FileSystem fs = FileSystem.get(output.toUri(), conf); > > > I can not run clustering because of this error. May be you have any > ideas how to fix this? >
