Hi all,
I’m developing an event-based file source to continuously monitor an S3
bucket. The problem with the existing file source is that continuously
listing the bucket is expensive, and the state grows with the number of
files.

I was thinking of using SQS and listening for *ObjectCreated* events
instead of polling the bucket.

I’m currently considering two design alternatives:

   1.

   *Periodic enumerator* – The enumerator is triggered periodically and
   drains the SQS queue. Each S3 object becomes a split, similar to how
   Flink’s current file source works.
   2.

   *Single-reader enumerator* – The enumerator simply assigns the SQS queue
   to a single reader, which continuously consumes it. In this model, there is
   a single split (the SQS queue itself), similar to how FLIP-27
   
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface>
   treats Kafka partitions as splits assigned to readers.

Has anyone worked on a similar approach or explored event-driven file
sources before?

Reply via email to