Re: Processing S3 data with Apache Flink

KOSTIANTYN Kudriavtsev Tue, 06 Oct 2015 11:08:59 -0700

Hi Robert,

thank you very much for your input!


Have you tried that?
With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward, and
now got a new exception:


Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
for '/***.csv' - ResponseCode=403, ResponseMessage=Forbidden

it's really strange as far as I gave full permissions
to authenticated users and can get target file from s3cmd or s3 browser
from the same PC... I realize that it's question not to you, but perhaps
you have faced the same issue

Thanks in advance!
Kostia

Thank you,
Konstantin Kudryavtsev

On Mon, Oct 5, 2015 at 10:13 PM, Robert Metzger <rmetz...@apache.org> wrote:

> Hi Kostia,
>
> thank you for writing to the Flink mailing list. I actually started to try
> out our S3 File system support after I saw your question on StackOverflow
> [1].
> I found that our S3 connector is very broken. I had to resolve two more
> issues with it, before I was able to get the same exception you reported.
>
> Another Flink commiter looked into the issue as well (it was confirmed as
> well) but there was no solution [2].
>
> So for now, I would say we have to assume that our S3 connector is not
> working. I will start a separate discussion at the developer mailing list
> to remove our S3 connector.
>
> The good news is that you can just use Hadoop's S3 File System
> implementation with Flink.
>
> I used this Flink program to verify its working:
>
> public class S3FileSystem {
>    public static void main(String[] args) throws Exception {
>       ExecutionEnvironment ee = ExecutionEnvironment.createLocalEnvironment();
>       DataSet<String> myLines = 
> ee.readTextFile("s3n://my-bucket-name/some-test-file.xml");
>       myLines.print();
>    }
> }
>
> also, you need to make a Hadoop configuration file available to Flink.
> When running flink locally in your IDE, just create a "core-site.xml" in
> the src/main/resource folder, with the following content:
>
> <configuration>
>
>     <property>
>         <name>fs.s3n.awsAccessKeyId</name>
>         <value>putKeyHere</value>
>     </property>
>
>     <property>
>         <name>fs.s3n.awsSecretAccessKey</name>
>         <value>putSecretHere</value>
>     </property>
>     <property>
>         <name>fs.s3n.impl</name>
>         <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     </property>
> </configuration>
>
> Maybe you are running on a cluster, then re-use the existing core-site.xml
> file (= edit it) and point to the directory using Flink's
> fs.hdfs.hadoopconf configuration option.
>
> With these two things in place, you should be good to go.
>
> [1]
> http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3
> [2]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-with-Amazon-S3-td946.html
>
> On Mon, Oct 5, 2015 at 8:19 PM, Kostiantyn Kudriavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Hi guys,
>>
>> I,m trying to get work Apache Flink 0.9.1 on EMR, basically to read
>> data from S3. I tried the following path for data
>> s3://mybucket.s3.amazonaws.com/folder, but it throws me the following
>> exception:
>>
>> java.io.IOException: Cannot establish connection to Amazon S3:
>> com.amazonaws.services.s3.model.AmazonS3Exception: The request signature
>> we calculated does not match the signature you provided. Check your key
>> and signing method. (Service: Amazon S3; Status Code: 403;
>>
>> I added access and secret keys, so the problem is not here. I=92m using
>> standard region and gave read credential to everyone.
>>
>> Any ideas how can it be fixed?
>>
>> Thank you in advance,
>> Kostia
>>
>
>

Re: Processing S3 data with Apache Flink

Reply via email to