RE: [PROPOSAL] Add streaming support to PartialOperator

2024-12-16 Thread Blain David
POSAL] Add streaming support to PartialOperator EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>. Hi David, As it s

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-12-03 Thread Blain David
xpand functionality. Kind regards, David -Original Message- From: Ash Berlin-Taylor Sent: Tuesday, 3 December 2024 11:44 To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming support to PartialOperator EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vert

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-12-03 Thread Jarek Potiuk
e better to do an official AIP proposal. I just planted the seed here to > see how this proposal would be received. I will try to do this as soon as > possible. > > > > Kind regards, > > David > > > > From: Constance Martineau > > Sent: Wednesday, 16 Octo

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-12-03 Thread Ash Berlin-Taylor
To: dev@airflow.apache.org > Cc: Blain David > Subject: Re: [PROPOSAL] Add streaming support to PartialOperator > > You don't often get email from > consta...@astronomer.io<mailto:consta...@astronomer.io>. Learn why this is > important<https://aka.ms/LearnAboutSende

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-11-06 Thread Blain David
soon as possible. Kind regards, David From: Constance Martineau Sent: Wednesday, 16 October 2024 23:06 To: dev@airflow.apache.org Cc: Blain David Subject: Re: [PROPOSAL] Add streaming support to PartialOperator You don't often get email from consta...@astronomer.io<mailt

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-17 Thread Jarek Potiuk
Oh. I don't think we want to "vote" on it (but I will let David to chime in because I was mostly guessing what's his expectation and worries are). On Thu, Oct 17, 2024 at 1:07 AM Vikram Koka wrote: > Hmm, I can think of a different solution to the problem here as well, but I > could be misunder

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-16 Thread Vikram Koka
Hmm, I can think of a different solution to the problem here as well, but I could be misunderstanding the problem. I understand that producing a full AIP may be frustrating, but I don't feel confident enough in my understanding that I can vote on this at this time. I do think a "light AIP" in the

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-16 Thread Daniel Standish
Yeah I agree with Jens here. I think it makes sense to produce an AIP so people can understand the proposal better. We can't really give a thumbs up or down without a proposal. At least it's not like python where you have to implement the whole thing first :)

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-16 Thread Constance Martineau
That was a lot to read through, and to be honest, it's hard for me to tell whether or not Jarek's proposal solves David's problem. However, if the debate is whether it's worthwhile or not to provide a first-class way for DAG authors to use Operators as part of TaskFlow Tasks, it is. Operators are

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Jens Scheffler
Hi all, thanks for picking-up the discussion. So following the email chain a bit I would recommend to spin an AIP for the implementation. There might be one or multiple cases where this is a cool feature. Still it will add complexity and needs a closer discussion. The best discussion might be on

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Daniel Standish
RE SLAs there was actually a lot of people who chimed in and expressed concerns with the approach, but no one took the step of actually down voting it. It's hard to down vote and say no this does not seem right. And sometimes these things gain a momentum and you don't want to be a stick in the mud

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Jarek Potiuk
So I think what David really needs (from you Daniel and others) if is the idaa sounds right, if it does and we agree it is something that should be clarified in detail and there are no major blockers to move in this direction - this can be turned into detailed proposal with the syntax, I think we

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Jarek Potiuk
It's about the same David's proposal is about stream syntax to run the operators in the task. So those are not two things - this is the "idea" (run operators in a loop in a task) and implementation detail (stream syntax). I think at this stage I distilled the idea from the syntax proposal, and wha

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Daniel Standish
I'm still a bit fuzzy on the proposal. It also seems at times like you two (David and Jarek) are sorta talking about two different things. David: "stream" syntax. Jarek: run operator in a task. I would suggest @David maybe just produce a sort of draft AIP maybe in google docs or something and s

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-15 Thread Jarek Potiuk
s were check marked, we could > think of a better implementation in Airflow 3.x, if this feature would be > accepted of course. > > -----Original Message- > From: Jarek Potiuk > Sent: Saturday, October 5, 2024 12:35 AM > To: dev@airflow.apache.org > Subject: Re: [P

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-10-07 Thread Blain David
ked, we could think of a better implementation in Airflow 3.x, if this feature would be accepted of course. -Original Message- From: Jarek Potiuk Sent: Saturday, October 5, 2024 12:35 AM To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming support to PartialOperator EX

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-04 Thread Jarek Potiuk
>From the earlier discussions with David - this is also (and mainly) about optimisation. Those operators do very little, and when you add total overhead that Airflow adds for scheduling and running every task, then it turns out that looping such operator's execute in a single interpreter is many, m

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-04 Thread Daniel Standish
Well, it looks like we do have concurrency control for mapped tasks after all. See max_active_tis_per_dagrun which was added in https://github.com/apache/airflow/pull/29094. So this would allow you to map over your 3000 users in a single run, but process only one at a time (or 5 or 10 at a time).

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-04 Thread Daniel Standish
One thing, it would have to be 3.0 since no new features are going into 2.x anymore AFAIK. Do I understand correctly that essentially what you want to be able to do is limit parallelism in mapped task? E.g. is it correct that you essentially want to do task mapping, but with parallelism=1? Would

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-10-04 Thread Blain David
s we can already use it, but it could nice that other Aiflow users could also benefit from this functionality, as I know this topic has been discussed many times. -Original Message----- From: Jarek Potiuk Sent: Friday, October 4, 2024 5:52 AM To: dev@airflow.apache.org Subject: Re: [PROPOSA

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-03 Thread Jarek Potiuk
> why not just do things sequentially in a loop inside of a task? Yes I think you nailed it - and I think it's just the abstraction you use in this case. When you loop in the task to do a small thing many times with one of the integrations of Airflow - you could use Hook for that. But - apparentl

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-03 Thread Daniel Standish
The thing i'm having trouble with is that the problem the user, David, is trying to solve is basically, that airflow doesn't like super fine-grained tasks. Like let's push this to the limit. I run an ecommerce company that has 10M visitors per day and each time they visit we update the visitor t

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-03 Thread Jarek Potiuk
Often hooks do a lot of validation -> Often operators do a lot of validation On Thu, Oct 3, 2024 at 7:50 PM Jarek Potiuk wrote: > I think this is very similar to past discussions that we had about > allowing operators to be used in task flow as a "first class citizen". > https://lists.apache.org

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-10-03 Thread Jarek Potiuk
I think this is very similar to past discussions that we had about allowing operators to be used in task flow as a "first class citizen". https://lists.apache.org/thread/nflt9h6dc5obzztmyqxlpxfs950rtqsq I re-read the original discussion and ... In theory It sounds like you should be able to do the

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-09-19 Thread Blain David
responses so we can handle it in one task, but in some situations, you just can't do that as explained in my previous example. -Original Message- From: Daniel Standish Sent: Wednesday, September 18, 2024 6:41 PM To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming suppo

RE: [PROPOSAL] Add streaming support to PartialOperator

2024-09-18 Thread Blain David
Standish Sent: Wednesday, September 18, 2024 6:41 PM To: dev@airflow.apache.org Subject: Re: [PROPOSAL] Add streaming support to PartialOperator EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur d

Re: [PROPOSAL] Add streaming support to PartialOperator

2024-09-18 Thread Daniel Standish
Curious why you want to model this as many tasks, e.g. one page == one task. Another option would be to handle many pages in one task. And I'm curious what were the factors that led you to split it out more granularly.