Re: Struggling with reading the file from s3 as Source

Robert Metzger Fri, 11 Sep 2020 11:09:55 -0700

Hi Vijay,

Can you post the error you are referring to?
Did you properly set up an s3 plugin (
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/) ?


On Fri, Sep 11, 2020 at 8:42 AM Vijay Balakrishnan <[email protected]>
wrote:

> Hi,
>
> I want to *get data from S3 and process and send to Kinesis.*
> 1. Get gzip files from an s3 folder(s3://bucket/prefix)
> 2. Sort each file
> 3. Do some map/processing on each record in the file
> 4. send to Kinesis
>
> Idea is:
> env.readTextFile(s3Folder)
> .sort(SortFunction)
> .map(MapFunction)
> .sink(KinesisSink)
>
> Struggling with reading the file from s3.
> //Assume env is setup properly
> //The endpoint can either be a single file or a directory -
> "s3://<bucket>/<endpoint>"
> final DataStreamSource<String> stringDataStreamSource = env.
> readTextFile(s3Folder);
> stringDataStreamSource.print();
>
> It keeps *erroring* saying I need some kind of *HDFS* setup ??? I don't
> want anything to do with HDFS.
> Just want to read from S3.
> Saw a StackOverflow mention by David Anderson I think about using the
> Flink SQL API.
> I would appreciate any decent example to get the reading from S3 working.
>
> TIA,
> Vijay
>
>

Re: Struggling with reading the file from s3 as Source

Reply via email to