Hi,
Had theoretical poc project in the past with quite similar functionalities
needed.
Bounded read makes sense, and can be threatened as special case of
unbounded read. The second I could imagine is doing the same (reading
emails for downstream processing like some logic triggers or ml
categorization and then send to different departments).
>From my perspective write is way more complicated and not sure If
beam/streaming applications are best pick for this tasks. Two potential
problems is that it needs distributed throttling out of the box for sending
emails. This can be done by using fixed parallelism (for example fixed
number of keys) and adaptive throttling (there is some out of the box code
for that already). The second problem I see is that even exactly once
processing options in runners (dataflow/flink) do not guarantee that
sending will be executed only once in all cases (this only guarantee that
only a single output will be seen downstream). To get around that probably
double locking would be required, but this together with throttling might
be challenging to get at same time.
Regarding potential use cases for write, definitely distributed
notification systems - have seen ideas for such projects already in at
least 3 corporation s. Some features they required (as far as my memory is
correct):
- templating messages for output (Jinja like) but this could technically be
pushed upstream
- priority queue - so that if there is a more urgent message in a priority
queue it should be send first before normal queue at same time considering
throttling.
- single destination throttling - so a single email will get at most x msgs
per week.
- channel configuration - so that user receiving notification could
configure which channel he wants to get msgs (email, slack, mobile push,
sms etc. ).
But above are typical requirements for whole notification apps, nor only
for the mail io, but I guess you could extract from this some use cases.
For the unbounded read, definitely emails could be used as some kind of
interface users could use to trigger asynchronous tasks (gdpr data deletion
for example). Having dedicated mail io read would avoid the need of having
separate be app to fetch the emails or additional brooker configuration for
emails systems (sometimes this is not possible because security policies in
corporations).

Let me know if this is helpful. Happy to see such initiatives 🙂
Best Wiśniowski Piotr

wt., 12 lis 2024, 13:03 użytkownik LDesire <two_som...@icloud.com> napisał:

> Hello,
>
> I am currently working on developing a MailIO connector for Apache Beam.
>
> While I have made progress implementing bounded read functionality, I'm
> somewhat uncertain about the practical use cases where users would need the
> MailIO connector.
>
> The use cases I've considered are:
>
> - Bounded Read:
> Email folder archiving - For example, archiving all messages from specific
> folders to storage systems like GCS, HDFS, or S3.
>
> - Write:
> Integrating with messaging systems like Pub/Sub to collect user behavior
> data, generating AI-powered messages based on these behaviors, and then
> using MailIO.write to compose and send emails.
>
> I haven't considered implementing Unbounded Read yet.
>
> I'm wondering if there might be other valuable use cases that I haven't
> thought of?
>
> Thank you.

Reply via email to