Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
I see, thank you, Robert. On 21.11.2015 15:28, Robert Metzger wrote: > Ah, I see. Maybe it would make sense then for you to use the latest > Hadoop version we are supporting. This way, you get the most recent > Hadoop S3 file system implementation. > > Note that there might be an issue with start

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Ah, I see. Maybe it would make sense then for you to use the latest Hadoop version we are supporting. This way, you get the most recent Hadoop S3 file system implementation. Note that there might be an issue with starting Flink 0.10.0 for Hadoop 2.7.0. We'll fix it with Flink 0.10.1. But if everyt

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
Hi Robert, I am basically only reading from Kafka and S3 and writing to S3 in this job. So I am using the Hadoop S3 FileSystem classes, but that's it. Cheers, Konstantin On 21.11.2015 15:16, Robert Metzger wrote: > Hi, > > great to hear that its working. I've updated the documentation (for 1.

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Hi, great to hear that its working. I've updated the documentation (for 1.0) and made the word directory bold ;) You should try to match your Hadoop version as closely as possible. Are you not using HDFS at all? Then it doesn't matter which version of Flink you are downloading. When using Hadoop

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
Hi Robert, thanks a lot, it's working now. Actually, it also says "directory" in the description. So I should have known :/ On additional question though. If I use the flink binary for Hadoop 1.2.1 and run flink in standalone mode, should I use the *-hadoop1 dependencies even If I am not interact

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Robert Metzger
Hi, It seems that you've set the "fs.hdfs.hadoopconf" configuration parameter to a file. I think you have to set it the directory containing the configuration. Sorry, I know that's not very intuitive, but in Hadoop the settings for in different files (hdfs|yarn|core)-site.xml. On Sat, Nov 21, 20

Re: Processing S3 data with Apache Flink

2015-11-21 Thread Konstantin Knauf
Hi Ufuk, sorry for not getting back to you for so long, and thanks for your answer. The problem persists unfortunately. Running the job from the IDE works (with core-site.xml on classpath), running it in local standalone mode does not. AccessKeyID and SecretAccesKey are not found. Attached the jo

Re: Processing S3 data with Apache Flink

2015-10-20 Thread Stephan Ewen
@Konstantin (2) : Can you try the workaround described by Robert, with the "s3n" file system scheme? We are removing the custom S3 connector now, simply reusing Hadoop's S3 connector for all cases. @Kostia: You are right, there should be no broken stuff that is not clearly marked as "beta". For t

Re: Processing S3 data with Apache Flink

2015-10-14 Thread Ufuk Celebi
> On 10 Oct 2015, at 22:59, snntr wrote: > > Hey everyone, > > I was having the same problem with S3 and found this thread very useful. > Everything works fine now, when I start Flink from my IDE, but when I run > the jar in local mode I keep getting > > java.lang.IllegalArgumentException: A

Re: Processing S3 data with Apache Flink

2015-10-10 Thread snntr
Hey everyone, I was having the same problem with S3 and found this thread very useful. Everything works fine now, when I start Flink from my IDE, but when I run the jar in local mode I keep getting java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as

Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
Hi Kostia, I understand your concern. I am going to propose to the Flink developers to remove the S3 File System support in Flink. Also, regarding these annotations, we are actually planning to add them for the 1.0 release so that users know which interfaces they an rely on. Which other component

Re: Processing S3 data with Apache Flink

2015-10-06 Thread KOSTIANTYN Kudriavtsev
Hi Robert, you are right, I just misspell name of the file :( Everything works fine! Basically, I'd suggest to move this workaround into official doc and mark custom S3FileSystem as @Deprecated... In fact, I like that idea to mark all untested functional with specific annotation, for example @Be

Re: Processing S3 data with Apache Flink

2015-10-06 Thread Robert Metzger
Mh. I tried out the code I've posted yesterday and it was working immediately. The security settings of AWS are sometimes a bit complicated. I think there are some logs for S3 buckets, maybe they contain some more information. Maybe there are other users facing the same issue. Since the S3FileSyst

Re: Processing S3 data with Apache Flink

2015-10-06 Thread KOSTIANTYN Kudriavtsev
Hi Robert, thank you very much for your input! Have you tried that? With org.apache.hadoop.fs.s3native.NativeS3FileSystem I moved forward, and now got a new exception: Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/***.csv' - ResponseCode=403, ResponseMessage=For

Re: Processing S3 data with Apache Flink

2015-10-05 Thread Robert Metzger
Hi Kostia, thank you for writing to the Flink mailing list. I actually started to try out our S3 File system support after I saw your question on StackOverflow [1]. I found that our S3 connector is very broken. I had to resolve two more issues with it, before I was able to get the same exception y