Hello, One additional scenario that might be useful is the ability to fetch and process email attachments, such as CSV files, from specific recipients who send automated reports with a consistent schema. This would allow for seamless integration of recurring email-based data into data pipelines.
Looking forward to hearing about your progress! Best regards, Marcin On Tue, Nov 12, 2024 at 1:42 PM Piotr Wiśniowski <contact.wisniowskipi...@gmail.com> wrote: > > Hi, > Had theoretical poc project in the past with quite similar functionalities > needed. > Bounded read makes sense, and can be threatened as special case of unbounded > read. The second I could imagine is doing the same (reading emails for > downstream processing like some logic triggers or ml categorization and then > send to different departments). > From my perspective write is way more complicated and not sure If > beam/streaming applications are best pick for this tasks. Two potential > problems is that it needs distributed throttling out of the box for sending > emails. This can be done by using fixed parallelism (for example fixed number > of keys) and adaptive throttling (there is some out of the box code for that > already). The second problem I see is that even exactly once processing > options in runners (dataflow/flink) do not guarantee that sending will be > executed only once in all cases (this only guarantee that only a single > output will be seen downstream). To get around that probably double locking > would be required, but this together with throttling might be challenging to > get at same time. > Regarding potential use cases for write, definitely distributed notification > systems - have seen ideas for such projects already in at least 3 corporation > s. Some features they required (as far as my memory is correct): > - templating messages for output (Jinja like) but this could technically be > pushed upstream > - priority queue - so that if there is a more urgent message in a priority > queue it should be send first before normal queue at same time considering > throttling. > - single destination throttling - so a single email will get at most x msgs > per week. > - channel configuration - so that user receiving notification could configure > which channel he wants to get msgs (email, slack, mobile push, sms etc. ). > But above are typical requirements for whole notification apps, nor only for > the mail io, but I guess you could extract from this some use cases. > > For the unbounded read, definitely emails could be used as some kind of > interface users could use to trigger asynchronous tasks (gdpr data deletion > for example). Having dedicated mail io read would avoid the need of having > separate be app to fetch the emails or additional brooker configuration for > emails systems (sometimes this is not possible because security policies in > corporations). > > Let me know if this is helpful. Happy to see such initiatives 🙂 > Best Wiśniowski Piotr > > > wt., 12 lis 2024, 13:03 użytkownik LDesire <two_som...@icloud.com> napisał: >> >> Hello, >> >> I am currently working on developing a MailIO connector for Apache Beam. >> >> While I have made progress implementing bounded read functionality, I'm >> somewhat uncertain about the practical use cases where users would need the >> MailIO connector. >> >> The use cases I've considered are: >> >> - Bounded Read: >> Email folder archiving - For example, archiving all messages from specific >> folders to storage systems like GCS, HDFS, or S3. >> >> - Write: >> Integrating with messaging systems like Pub/Sub to collect user behavior >> data, generating AI-powered messages based on these behaviors, and then >> using MailIO.write to compose and send emails. >> >> I haven't considered implementing Unbounded Read yet. >> >> I'm wondering if there might be other valuable use cases that I haven't >> thought of? >> >> Thank you.