Hi all, I am trying to copy a file out of Riak using the s3 protocol to HDFS. I have the following file:
I created the following file: /etc/hadoop/conf/jets3t.properties s3service.s3-endpoint=myhost s3service.s3-endpoint-http-port=8080 s3service.disable-dns-buckets=true s3service.s3-endpoint-virtual-path=/ s3service.max-thread-count=10 threaded-service.max-thread-count=10 s3service.https-only=false httpclient.proxy-autodetect=false httpclient.proxy-host=myhost httpclient.proxy-port=8080 httpclient.retry-max=11 hadoop distcp s3://<access key>:<secret key>@test/test hdfs://localhost/tmp/test I get this stack trace: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: Request Error. -- ResponseCode: 404, ResponseStatus: Object Not Found at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Caused by: org.jets3t.service.S3ServiceException: Request Error. -- ResponseCode: 404, ResponseStatus: Object Not Found at org.jets3t.service.S3Service.getObject(S3Service.java:1379) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163) ... 20 more Caused by: org.jets3t.service.impl.rest.HttpException at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:519) at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281) at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestGet(RestStorageService.java:981) at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2150) at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2087) at org.jets3t.service.StorageService.getObject(StorageService.java:1140) at org.jets3t.service.S3Service.getObject(S3Service.java:2583) at org.jets3t.service.S3Service.getObject(S3Service.java:84) at org.jets3t.service.StorageService.getObject(StorageService.java:525) at org.jets3t.service.S3Service.getObject(S3Service.java:1377) However, with a local .s3cfg file that points to a Riak cluster, I can do this: [hdfs@dsg01 ~]$ s3cmd ls s3://test DIR s3://test/home/ DIR s3://test/setup/ DIR s3://test/test/ DIR s3://test/tmp/ So, s3://test/test does exist and is in Riak, not AWS. Now, if I comment out s3service.s3-endpoint-virtual-path and run: hadoop distcp s3://<access key>:<secret key>@test/test hdfs://localhost/tmp/test I see: java.io.IOException: /test doesn't exist at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Using @test/test/ produces the same exception as above. Using: hadoop distcp s3://<access key>:<secret key>@test hdfs://localhost/tmp/test java.io.IOException: /user/hdfs doesn't exist at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) I am the user 'hdfs'. If I comment out these properties #s3service.s3-endpoint=myhost #s3service.s3-endpoint-http-port=8080 #s3service.disable-dns-buckets=true #s3service.s3-endpoint-virtual-path=/ and run: hadoop distcp s3://<access key>:<secret key>@test/test hdfs://localhost/tmp/test I get fresh, new exception: 16/04/22 21:53:34 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error> at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:175) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.retrieveINode(Unknown Source) at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) at org.apache.hadoop.fs.Globber.glob(Globber.java:252) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1655) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382) at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153) at org.apache.hadoop.tools.DistCp.run(DistCp.java:126) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Caused by: org.jets3t.service.S3ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/%2Ftest</Resource><RequestId></RequestId></Error> at org.jets3t.service.S3Service.getObject(S3Service.java:1379) at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:163) ... 20 more It's odd to see "/%2Ftest" which is a URL encoding for '/'. Why is that there? Note: 'myhost' is just a placeholder for the actual hostname which does resolve. What am I missing? -- View this message in context: http://riak-users.197444.n3.nabble.com/Unable-to-use-hadoop-distcp-with-Riak-tp4034185.html Sent from the Riak Users mailing list archive at Nabble.com. _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com