> why not just do things sequentially in a loop inside of a task? Yes I think you nailed it - and I think it's just the abstraction you use in this case.
When you loop in the task to do a small thing many times with one of the integrations of Airflow - you could use Hook for that. But - apparently - this abstraction is difficult to discover and possibly sometimes difficult to discover for the user because all they know is "operators" and they do not know what Hooks do. So .. the natural way to interact with external integration for many of our users is via Operators - so to allow such looping using operators sounds like "follow what is natural for your users". Basically.- we are not telling the users "Use hooks", but we are following what our users want to do - "use operators" in this case as it feels more natural for them. I think - now when i think of that - it simply shows that we have two kinds of users in this case: 1) those who know and are happy to write custom operators (they will use hooks) 2) those who are more comfortable in just putting together existing building blocks - i.e. operators (all they know are dag, operators, dependencies - and when they come to composing things they think of task flow as the way to compose the things they know Clearly -> allowing to use operators in task flow in this mode would respond to the 2nd group of the users. For me this is kinda model leadership we should do - when you as a leader in a space try to convince others to do things in one way, but pretty much everyone is not following and stubbornly attempt to use the thing you think is wrong, maybe it's a good time to think "well maybe they are right". J. On Thu, Oct 3, 2024 at 8:10 PM Daniel Standish <daniel.stand...@astronomer.io.invalid> wrote: > The thing i'm having trouble with is that the problem the user, David, is > trying to solve is basically, that airflow doesn't like super fine-grained > tasks. Like let's push this to the limit. I run an ecommerce company > that has 10M visitors per day and each time they visit we update the > visitor table. I want to run a daily job to process updates. Should I > model my pipeline as 1 task per customer? Probably not a good idea. > > There's a reason e.g. that databases exist and you can do things in a > set-based way. There seems to be an analogy here with David's example. > That's why I asked why model it so fine grained. He does not seem to want > to write a custom operator, but it would seem it's probably a good idea > here. One way of thinking about the use case is, I want to do things > sequentially in a loop -- why not just do things sequentially in a loop > inside of a task? >