Hi, Had theoretical poc project in the past with quite similar functionalities needed. Bounded read makes sense, and can be threatened as special case of unbounded read. The second I could imagine is doing the same (reading emails for downstream processing like some logic triggers or ml categorization and then send to different departments). >From my perspective write is way more complicated and not sure If beam/streaming applications are best pick for this tasks. Two potential problems is that it needs distributed throttling out of the box for sending emails. This can be done by using fixed parallelism (for example fixed number of keys) and adaptive throttling (there is some out of the box code for that already). The second problem I see is that even exactly once processing options in runners (dataflow/flink) do not guarantee that sending will be executed only once in all cases (this only guarantee that only a single output will be seen downstream). To get around that probably double locking would be required, but this together with throttling might be challenging to get at same time. Regarding potential use cases for write, definitely distributed notification systems - have seen ideas for such projects already in at least 3 corporation s. Some features they required (as far as my memory is correct): - templating messages for output (Jinja like) but this could technically be pushed upstream - priority queue - so that if there is a more urgent message in a priority queue it should be send first before normal queue at same time considering throttling. - single destination throttling - so a single email will get at most x msgs per week. - channel configuration - so that user receiving notification could configure which channel he wants to get msgs (email, slack, mobile push, sms etc. ). But above are typical requirements for whole notification apps, nor only for the mail io, but I guess you could extract from this some use cases.
For the unbounded read, definitely emails could be used as some kind of interface users could use to trigger asynchronous tasks (gdpr data deletion for example). Having dedicated mail io read would avoid the need of having separate be app to fetch the emails or additional brooker configuration for emails systems (sometimes this is not possible because security policies in corporations). Let me know if this is helpful. Happy to see such initiatives 🙂 Best Wiśniowski Piotr wt., 12 lis 2024, 13:03 użytkownik LDesire <two_som...@icloud.com> napisał: > Hello, > > I am currently working on developing a MailIO connector for Apache Beam. > > While I have made progress implementing bounded read functionality, I'm > somewhat uncertain about the practical use cases where users would need the > MailIO connector. > > The use cases I've considered are: > > - Bounded Read: > Email folder archiving - For example, archiving all messages from specific > folders to storage systems like GCS, HDFS, or S3. > > - Write: > Integrating with messaging systems like Pub/Sub to collect user behavior > data, generating AI-powered messages based on these behaviors, and then > using MailIO.write to compose and send emails. > > I haven't considered implementing Unbounded Read yet. > > I'm wondering if there might be other valuable use cases that I haven't > thought of? > > Thank you.