Hi David,

As it stands today I’m -1 to accepting this for a couple of reasons, sorry:

First of all: the implementation looks like it is a “parallel implementation” 
of the scheduler, triggerer. I know that it is in some ways a POC only, but 
there is more for my reasoning.

It feels counter to mapped tasks — the entire point of using mapped tasks is to 
run a single task (or a group of tasks) over a repeated input, and have each 
one be able to run independently, be restarted or retried independently and to 
scale out independently. Right now we loose almost all of those benefits.

On the name, “stream” is 100% the wrong word for this concept as it is not 
streaming to process data as it comes in, but the opposite almost, it’s 
batching it all up to run in one group. “Iterate” is much better.

From chatting to David on slack it feels to me like this entire feature is 
built to work around a problem with a large number of mapped tasks. So I’m -1 
to accepting this in the core project and we should instead spend our 
maintenance effort on improving mapped tasks. It can be maintained as a 
separate operator in a provider out of tree, and if it gains traction we could 
see about bringing it into core as an apache-maintained provider.

In this particular case it also feels like it could be achieved with 
“executor=‘Celery’” + KEDA to scale the worker and get 90% of the same 
behaviour without any changes at all, or as Daniel suggested earlier, simply do 
this in a plain old `for` loop inside a task.

The other idea might be to be able to have a mapped task go directly into a 
triggered — that might also gain the performance you want.

Please let me know if I’ve misunderstood anything about your proposal, and 
sorry to be harsh, but one of the hardest things as a maintainer of an open 
source project is saying no to feature requests.

-ash 

> On 6 Nov 2024, at 11:21, Blain David <david.bl...@infrabel.be> wrote:
> 
> Hello guys,
> 
> First of all, thank you all for taking your time and giving your opinions and 
> insights regarding my proposal.  I also think it would indeed be better to do 
> an official AIP proposal.  I just planted the seed here to see how this 
> proposal would be received.  I will try to do this as soon as possible.
> 
> Kind regards,
> David
> 
> From: Constance Martineau <consta...@astronomer.io>
> Sent: Wednesday, 16 October 2024 23:06
> To: dev@airflow.apache.org
> Cc: Blain David <david.bl...@infrabel.be>
> Subject: Re: [PROPOSAL] Add streaming support to PartialOperator
> 
> You don't often get email from 
> consta...@astronomer.io<mailto:consta...@astronomer.io>. Learn why this is 
> important<https://aka.ms/LearnAboutSenderIdentification>
> 
> 
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
> vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur 
> deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>.
> That was a lot to read through, and to be honest, it's hard for me to tell 
> whether or not Jarek's proposal solves David's problem. However, if the 
> debate is whether it's worthwhile or not to provide a first-class way for DAG 
> authors to use Operators as part of TaskFlow Tasks, it is.
> 
> Operators are a major value-add to the Airflow Ecosystem, and we're 
> implicitly forcing DAG authors to choose whether they value their DAGs being 
> pythonic and simpler to read and reason (Taskflow), or whether they value 
> limiting custom code (Traditional Syntax with Operators). There should be a 
> first class way to have both, and while it's possible to have dependencies 
> between decorated tasks and traditional tasks (and use hooks within tasks), 
> you lose a lot of the benefits and it's easier to revert to traditional 
> syntax.
> 
> On Tue, Oct 15, 2024 at 2:46 PM Jens Scheffler 
> <j_scheff...@gmx.de.invalid<mailto:j_scheff...@gmx.de.invalid>> wrote:
> Hi all,
> 
> thanks for picking-up the discussion. So following the email chain a bit
> I would recommend to spin an AIP for the implementation. There might be
> one or multiple cases where this is a cool feature. Still it will add
> complexity and needs a closer discussion. The best discussion might be
> on the AIP itself and then once all questions and details are described
> we still can VOTE on it.
> 
> @David, can you follow as described in
> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals
> ?
> 
> (I also have another use case in mind and am courious if the propsal
> would also support this)
> 
> Jens
> 
> On 15.10.24 18:24, Daniel Standish wrote:
>> RE SLAs there was actually a lot of people who chimed in and expressed
>> concerns with the approach, but no one took the step of actually down
>> voting it.  It's hard to down vote and say no this does not seem right.
>> And sometimes these things gain a momentum and you don't want to be a stick
>> in the mud, particularly if you don't have a better solution and someone
>> has spent a lot of time on it.  But yeah we should not be so timid about
>> saying no that we never do it.
>> 
>> I think I did not really engage with it until substantially later in the
>> process, wish I could have engaged earlier.
>> 
>> On the topic of streaming, yeah, I'm trying to do my part to engage in this
>> thread.  I don't yet see and understand the value so that's why I suggested
>> fleshing out the proposal in a doc. I'm not ready to give any thumbs up yet
>> cus I'd don't see it.  That doesn't mean the value isn't there, just I
>> don't see it / understand it yet.
>> 
>> And yeah we're only two people here engaging with this one so, it's good if
>> others could consider the proposal also.  But people only have so much
>> time.  And anyway, I think the proposal needs more clarity to be
>> efficiently and accurately evaluated -- so formalizing it a bit, even if
>> not precisely an AIP, would help in others to chime in.  Really get into
>> what problem it's solving and why and how.
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Oct 15, 2024 at 9:05 AM Jarek Potiuk 
>> <ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:
>> 
>>> So I think what David really needs (from you Daniel and others) if is the
>>> idaa sounds right, if it does and we agree it is something that should be
>>> clarified in detail and there are no major blockers to move in this
>>> direction - this can be turned into detailed proposal with the syntax,
>>> 
>>> I think we had a long story of some cases (like SLA) where we asked for
>>> detailed AIPs and then after it has been delivered it turned out that the
>>> idea from the very beginning was not right, but this feedback has been
>>> missing. SLA feature sufferred from late feedback that "the whole idea
>>> seems wrong".
>>> 
>>> I think we should avoid such an approach. If we see that the general idea
>>> is wrong we should give early feedback - and then engage in detailed
>>> discussion - but without the "I have not paid attention before but the
>>> whole thing is wrong".
>>> 
>>> I think David is looking for this kind of confirmation, so that he does not
>>> spend days and weeks on detailing a proposal then was strangled to death
>>> because we did not like the idea in the first place. That's very
>>> discouraging.
>>> 
>>> J,
>>> 
>>> On Tue, Oct 15, 2024 at 6:00 PM Jarek Potiuk 
>>> <ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote:
>>> 
>>>> It's about the same David's proposal is about stream syntax to run the
>>>> operators in the task. So those are not two things - this is the "idea"
>>>> (run operators in a loop in a task) and implementation detail (stream
>>>> syntax).
>>>> 
>>>> I think at this stage I distilled the idea from the syntax proposal, and
>>>> what we could do in the future is to make sure that syntax is good.
>>>> 
>>>> 
>>>> J.
>>>> 
>>>> 
>>>> On Tue, Oct 15, 2024 at 4:11 PM Daniel Standish
>>>> <daniel.stand...@astronomer.io.invalid<mailto:daniel.stand...@astronomer.io.invalid>>
>>>>  wrote:
>>>> 
>>>>> I'm still a bit fuzzy on the proposal.  It also seems at times like you
>>>>> two
>>>>> (David and Jarek) are sorta talking about two different things.  David:
>>>>> "stream" syntax.  Jarek: run operator in a task.
>>>>> 
>>>>> I would suggest @David maybe just produce a sort of draft AIP maybe in
>>>>> google docs or something and share and interested parties can review and
>>>>> understand better and possibly help shape the direction.
>>>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: 
> dev-unsubscr...@airflow.apache.org<mailto:dev-unsubscr...@airflow.apache.org>
> For additional commands, e-mail: 
> dev-h...@airflow.apache.org<mailto:dev-h...@airflow.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to