The thing i'm having trouble with is that the problem the user, David, is
trying to solve is basically, that airflow doesn't like super fine-grained
tasks.   Like let's push this to the limit.  I run an ecommerce company
that has 10M visitors per day and each time they visit we update the
visitor table.  I want to run a daily job to process updates.  Should I
model my pipeline as 1 task per customer?  Probably not a good idea.

There's a reason e.g. that databases exist and you can do things in a
set-based way.  There seems to be an analogy here with David's example.
That's why I asked why model it so fine grained.  He does not seem to want
to write a custom operator, but it would seem it's probably a good idea
here.  One way of thinking about the use case is, I want to do things
sequentially in a loop -- why not just do things sequentially in a loop
inside of a task?

Reply via email to