Often hooks do a lot of validation -> Often operators do a lot of validation
On Thu, Oct 3, 2024 at 7:50 PM Jarek Potiuk <ja...@potiuk.com> wrote: > I think this is very similar to past discussions that we had about > allowing operators to be used in task flow as a "first class citizen". > https://lists.apache.org/thread/nflt9h6dc5obzztmyqxlpxfs950rtqsq > I re-read the original discussion and ... > > In theory It sounds like you should be able to do the same using Hooks - > you should not need to refer to operators. That's (in theory) what the > hooks are for. > > But one of my favourite sayings .... > > In theory practice is the same as theory, but in practice it's not. > > It definitely looks like our users are not really "aware" of hook > capability and in fact a number of implemented operators, even if they are > theoretically thin wrappers around Hooks functionality, the main benefit of > it (i.e. Hook reusability) is not really materializing. Often hooks do a > lot of validation and some pre-processing that would otherwise have to be > copied to Task Flow operators to make the hooks really useful. > > For me that sounds like a "design smell" that we could also get rid of in > Airflow 3 if we make operators runnable directly from task flow - with all > the bells and whistles including automated jinja template pre-processing > etc (as discussed in the original thread). I think we have great > opportunity with Airflow 3 to introduce this and pretty much get rid of the > notion or even hint that Hooks are the reusability components for DAG > authors (they would still provide reusability for Operator authors though > as they do today) - and instead we could promote "Airflow 3 can now mix and > match operators in Task Flow. So a similar approach as you proposed David > would be one of the ways to do it (with the exception that I really do not > like a "streaming" name - which as David mentioned in a Slack conversation > - direct Java rip-off. > > Also following the slack conversation - I know David is eager to do it in > the way that it will be Airflow 2 compatible, but I think making it Airflow > 3 feature would be much, much, much more powerful. Even if we **could** do > it for Airflow 2, it does not necessarily mean we **should** do it. > Implementing it for Airflow 2 means that any problems and fixes would have > to be backported and implemented in both etc. etc. - also being able to > announce it as a "feature" of Airflow 3 "Hey, you finally can mix and match > operators in Airflow Dag tasks" is a very cool incentive for people to > migrate. > > I would love to hear what others think. > > J. > > >