Re: [PROPOSAL] Add streaming support to PartialOperator

Jarek Potiuk Thu, 03 Oct 2024 20:52:41 -0700

> why not just do things sequentially in a loop inside of a task?

Yes I think you nailed it - and I think it's just the abstraction you use
in this case.

When you loop in the task to do a small thing many times with one of the
integrations of Airflow - you could use Hook for that. But - apparently -
this abstraction is difficult to discover and possibly sometimes difficult
to discover for the user because all they know is "operators" and they do
not know what Hooks do.

So .. the natural way to interact with external integration for many of our
users is via Operators - so to allow such looping using operators sounds
like "follow what is natural for your users".

Basically.- we are not telling the users "Use hooks", but we are following
what our users want to do -  "use operators" in this case as it feels more
natural for them. I think - now when i think of that - it simply shows that
we have two kinds of users in this case:

1) those who know and are happy to write custom operators (they will use
hooks)
2) those who are more comfortable in just putting together
existing building blocks - i.e. operators (all they know are dag,
operators, dependencies - and when they come to composing things they think
of task flow as the way to compose the things they know

Clearly -> allowing to use operators in task flow in this mode would
respond to the 2nd group of the users.

For me this is kinda model leadership we should do - when you as a leader
in a space try to convince others to do things in one way, but pretty much
everyone is not following and stubbornly attempt to use the thing you think
is wrong, maybe it's a good time to think "well maybe they are right".

J.

On Thu, Oct 3, 2024 at 8:10 PM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> The thing i'm having trouble with is that the problem the user, David, is
> trying to solve is basically, that airflow doesn't like super fine-grained
> tasks.   Like let's push this to the limit.  I run an ecommerce company
> that has 10M visitors per day and each time they visit we update the
> visitor table.  I want to run a daily job to process updates.  Should I
> model my pipeline as 1 task per customer?  Probably not a good idea.
>
> There's a reason e.g. that databases exist and you can do things in a
> set-based way.  There seems to be an analogy here with David's example.
> That's why I asked why model it so fine grained.  He does not seem to want
> to write a custom operator, but it would seem it's probably a good idea
> here.  One way of thinking about the use case is, I want to do things
> sequentially in a loop -- why not just do things sequentially in a loop
> inside of a task?
>

Re: [PROPOSAL] Add streaming support to PartialOperator

Reply via email to