Dear Marcin,

Thank you for your suggestion. I think it's a great idea.

What do you think about providing the attachment details to users including 
filename, type, and data (as byte array)?

Thank you for your valuable feedback.

Best regards,
LDesire


> 2024. 11. 12. 오후 9:49, Marcin Stańczak <m.stanczak...@gmail.com> 작성:
> 
> Hello,
> 
> One additional scenario that might be useful is the ability to fetch
> and process email attachments, such as CSV files, from specific
> recipients who send automated reports with a consistent schema. This
> would allow for seamless integration of recurring email-based data
> into data pipelines.
> 
> Looking forward to hearing about your progress!
> 
> Best regards,
> Marcin
> 
> On Tue, Nov 12, 2024 at 1:42 PM Piotr Wiśniowski
> <contact.wisniowskipi...@gmail.com> wrote:
>> 
>> Hi,
>> Had theoretical poc project in the past with quite similar functionalities 
>> needed.
>> Bounded read makes sense, and can be threatened as special case of unbounded 
>> read. The second I could imagine is doing the same (reading emails for 
>> downstream processing like some logic triggers or ml categorization and then 
>> send to different departments).
>> From my perspective write is way more complicated and not sure If 
>> beam/streaming applications are best pick for this tasks. Two potential 
>> problems is that it needs distributed throttling out of the box for sending 
>> emails. This can be done by using fixed parallelism (for example fixed 
>> number of keys) and adaptive throttling (there is some out of the box code 
>> for that already). The second problem I see is that even exactly once 
>> processing options in runners (dataflow/flink) do not guarantee that sending 
>> will be executed only once in all cases (this only guarantee that only a 
>> single output will be seen downstream). To get around that probably double 
>> locking would be required, but this together with throttling might be 
>> challenging to get at same time.
>> Regarding potential use cases for write, definitely distributed notification 
>> systems - have seen ideas for such projects already in at least 3 
>> corporation s. Some features they required (as far as my memory is correct):
>> - templating messages for output (Jinja like) but this could technically be 
>> pushed upstream
>> - priority queue - so that if there is a more urgent message in a priority 
>> queue it should be send first before normal queue at same time considering 
>> throttling.
>> - single destination throttling - so a single email will get at most x msgs 
>> per week.
>> - channel configuration - so that user receiving notification could 
>> configure which channel he wants to get msgs (email, slack, mobile push, sms 
>> etc. ).
>> But above are typical requirements for whole notification apps, nor only for 
>> the mail io, but I guess you could extract from this some use cases.
>> 
>> For the unbounded read, definitely emails could be used as some kind of 
>> interface users could use to trigger asynchronous tasks (gdpr data deletion 
>> for example). Having dedicated mail io read would avoid the need of having 
>> separate be app to fetch the emails or additional brooker configuration for 
>> emails systems (sometimes this is not possible because security policies in 
>> corporations).
>> 
>> Let me know if this is helpful. Happy to see such initiatives 🙂
>> Best Wiśniowski Piotr
>> 
>> 
>> wt., 12 lis 2024, 13:03 użytkownik LDesire <two_som...@icloud.com> napisał:
>>> 
>>> Hello,
>>> 
>>> I am currently working on developing a MailIO connector for Apache Beam.
>>> 
>>> While I have made progress implementing bounded read functionality, I'm 
>>> somewhat uncertain about the practical use cases where users would need the 
>>> MailIO connector.
>>> 
>>> The use cases I've considered are:
>>> 
>>> - Bounded Read:
>>> Email folder archiving - For example, archiving all messages from specific 
>>> folders to storage systems like GCS, HDFS, or S3.
>>> 
>>> - Write:
>>> Integrating with messaging systems like Pub/Sub to collect user behavior 
>>> data, generating AI-powered messages based on these behaviors, and then 
>>> using MailIO.write to compose and send emails.
>>> 
>>> I haven't considered implementing Unbounded Read yet.
>>> 
>>> I'm wondering if there might be other valuable use cases that I haven't 
>>> thought of?
>>> 
>>> Thank you.

Reply via email to