Hello,

One additional scenario that might be useful is the ability to fetch
and process email attachments, such as CSV files, from specific
recipients who send automated reports with a consistent schema. This
would allow for seamless integration of recurring email-based data
into data pipelines.

Looking forward to hearing about your progress!

Best regards,
Marcin

On Tue, Nov 12, 2024 at 1:42 PM Piotr Wiśniowski
<contact.wisniowskipi...@gmail.com> wrote:
>
> Hi,
> Had theoretical poc project in the past with quite similar functionalities 
> needed.
> Bounded read makes sense, and can be threatened as special case of unbounded 
> read. The second I could imagine is doing the same (reading emails for 
> downstream processing like some logic triggers or ml categorization and then 
> send to different departments).
> From my perspective write is way more complicated and not sure If 
> beam/streaming applications are best pick for this tasks. Two potential 
> problems is that it needs distributed throttling out of the box for sending 
> emails. This can be done by using fixed parallelism (for example fixed number 
> of keys) and adaptive throttling (there is some out of the box code for that 
> already). The second problem I see is that even exactly once processing 
> options in runners (dataflow/flink) do not guarantee that sending will be 
> executed only once in all cases (this only guarantee that only a single 
> output will be seen downstream). To get around that probably double locking 
> would be required, but this together with throttling might be challenging to 
> get at same time.
> Regarding potential use cases for write, definitely distributed notification 
> systems - have seen ideas for such projects already in at least 3 corporation 
> s. Some features they required (as far as my memory is correct):
> - templating messages for output (Jinja like) but this could technically be 
> pushed upstream
> - priority queue - so that if there is a more urgent message in a priority 
> queue it should be send first before normal queue at same time considering 
> throttling.
> - single destination throttling - so a single email will get at most x msgs 
> per week.
> - channel configuration - so that user receiving notification could configure 
> which channel he wants to get msgs (email, slack, mobile push, sms etc. ).
> But above are typical requirements for whole notification apps, nor only for 
> the mail io, but I guess you could extract from this some use cases.
>
> For the unbounded read, definitely emails could be used as some kind of 
> interface users could use to trigger asynchronous tasks (gdpr data deletion 
> for example). Having dedicated mail io read would avoid the need of having 
> separate be app to fetch the emails or additional brooker configuration for 
> emails systems (sometimes this is not possible because security policies in 
> corporations).
>
> Let me know if this is helpful. Happy to see such initiatives 🙂
> Best Wiśniowski Piotr
>
>
> wt., 12 lis 2024, 13:03 użytkownik LDesire <two_som...@icloud.com> napisał:
>>
>> Hello,
>>
>> I am currently working on developing a MailIO connector for Apache Beam.
>>
>> While I have made progress implementing bounded read functionality, I'm 
>> somewhat uncertain about the practical use cases where users would need the 
>> MailIO connector.
>>
>> The use cases I've considered are:
>>
>> - Bounded Read:
>> Email folder archiving - For example, archiving all messages from specific 
>> folders to storage systems like GCS, HDFS, or S3.
>>
>> - Write:
>> Integrating with messaging systems like Pub/Sub to collect user behavior 
>> data, generating AI-powered messages based on these behaviors, and then 
>> using MailIO.write to compose and send emails.
>>
>> I haven't considered implementing Unbounded Read yet.
>>
>> I'm wondering if there might be other valuable use cases that I haven't 
>> thought of?
>>
>> Thank you.

Reply via email to