Hi Vijay, Can you post the error you are referring to? Did you properly set up an s3 plugin ( https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/) ?
On Fri, Sep 11, 2020 at 8:42 AM Vijay Balakrishnan <[email protected]> wrote: > Hi, > > I want to *get data from S3 and process and send to Kinesis.* > 1. Get gzip files from an s3 folder(s3://bucket/prefix) > 2. Sort each file > 3. Do some map/processing on each record in the file > 4. send to Kinesis > > Idea is: > env.readTextFile(s3Folder) > .sort(SortFunction) > .map(MapFunction) > .sink(KinesisSink) > > Struggling with reading the file from s3. > //Assume env is setup properly > //The endpoint can either be a single file or a directory - > "s3://<bucket>/<endpoint>" > final DataStreamSource<String> stringDataStreamSource = env. > readTextFile(s3Folder); > stringDataStreamSource.print(); > > It keeps *erroring* saying I need some kind of *HDFS* setup ??? I don't > want anything to do with HDFS. > Just want to read from S3. > Saw a StackOverflow mention by David Anderson I think about using the > Flink SQL API. > I would appreciate any decent example to get the reading from S3 working. > > TIA, > Vijay > >
