Re: S3 Source support in Flink

Martijn Visser Fri, 29 Oct 2021 00:21:47 -0700

Hi,

When using the DataStream API, the new File Source already supports
continuous stream, but it isn't documented yet [1] There is an e2e test for
it, so you could look at that to understand how it works.
It's not yet supported for the Table API/SQL [2]


Best regards,

Martijn

[1] https://issues.apache.org/jira/browse/FLINK-20188
[2] https://issues.apache.org/jira/browse/FLINK-20286

On Fri, 29 Oct 2021 at 07:56, Yuval Itzchakov <yuva...@gmail.com> wrote:

> Hi Abhishek,
>
> You can use `readFileStream` directly defined on DataStream. You will
> still have to pay the ListObjects for each iteration using that method.
> If you want a source that does not rely on listing, you can implement a
> custom SQS source (there is no official existing one currently) and use
> Amazon S3 Event Notification to ship to from S3 to SQS:
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html
>
>
>
> On Fri, Oct 29, 2021 at 3:34 AM Abhishek SP <abhisheksp1...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I see S3 supported as a Sink through StreamingFileSink
>> <https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/streamfile_sink/>
>>  but
>> do not see a source equivalent StreamingFileSource
>>
>> *Questions:*
>> 1. What is the current recommendation for using S3 as a continuous source
>> for Flink Streaming Application?
>> 2. If we have to implement a custom S3 continuous source, how would one
>> implement the SplitEnumerator since ListObjects S3 API can become expensive
>> as the bucket grows?
>>
>> Thanks in advance
>>
>> Best,
>> Abhishek
>>
>
>
> --
> Best Regards,
> Yuval Itzchakov.
>

Re: S3 Source support in Flink

Reply via email to