Struggling with reading the file from s3 as Source

Vijay Balakrishnan Thu, 10 Sep 2020 23:42:19 -0700

Hi,

I want to *get data from S3 and process and send to Kinesis.*
1. Get gzip files from an s3 folder(s3://bucket/prefix)
2. Sort each file
3. Do some map/processing on each record in the file
4. send to Kinesis


Idea is:
env.readTextFile(s3Folder)
.sort(SortFunction)
.map(MapFunction)
.sink(KinesisSink)

Struggling with reading the file from s3.
//Assume env is setup properly
//The endpoint can either be a single file or a directory -
"s3://<bucket>/<endpoint>"
final DataStreamSource<String> stringDataStreamSource = env.
readTextFile(s3Folder);
stringDataStreamSource.print();

It keeps *erroring* saying I need some kind of *HDFS* setup ??? I don't
want anything to do with HDFS.
Just want to read from S3.
Saw a StackOverflow mention by David Anderson I think about using the Flink
SQL API.
I would appreciate any decent example to get the reading from S3 working.

TIA,
Vijay

Struggling with reading the file from s3 as Source

Reply via email to