Re: [PROPOSAL] Add streaming support to PartialOperator

Jarek Potiuk Tue, 15 Oct 2024 03:34:04 -0700

I am all for accepting it. Naming etc. can be worked out but I am very much
for adding mutliple optimisations to Airflow. And this one seems not only
easy but also pretty obvious.

Just a comment here.

I spent the last 2 weeks in the Bay Area and Community Over Code in Denver
and I met a lot of people from our industry who are a lot smarter than me
and who's experience and knowledge I tapped into to learn what is happening
in our industry. People from the Apache Software Foundation (various data
engineering, big data, ML/AI related projects), people - my old friends -
Roman Shaposhnik - who is the VP Legal in the foundation - who is currently
running startups in the ML/AI area especially (that is the hot area of
course), Sid Anand - our PMC member, Greg Czajkowski - ex. VP
engineering in Snowflake, ex. Google, ex. Sun/Java engineer and a number of
other people. I also was a "Data Engineering" track lead at Community Over
Code - 2 days, 16 talks, with countless discussions and questions with a
number of Data Engineering tools, libraries, projects, etc.

And there is one theme that I have been hearing over-and-over-and-over
again. We are at the brink of getting into "optimization wars".

Running ML/ LLM/AI workfloads - both training and inference related ones -
is currently terribly, terribly expensive. It's not sustainable in the long
run. And while the ideas about "what solutions should be used for
LLM/ML/AI" are already pretty firm with transformer architecture pretty
much overtaking everything else, everyone is now focusing on running
whatever they are running in far more optimized ways - reusing GPUs for
multiple parallel workflows - so that parts of the models can be loaded
once to GPU and reused, Arrow used as ubiquitous format to store the data
in order to utilize zero-copy Arrow's capabilities and chaining multiple
tools using arrow to use the same data stored not only in memory but as of
recently in GPU, co-locating data-e the centers from multiple clouds closer
to each other in order to save the bandwidth and decrease latency, heck -
even running all the workloads on K8S in the way that you can at any time
move the stable workflows from Clouds to 3x cheaper on-premise K8S cluster.
All that and more - and "optimizing the cost" (which translates to optimize
all resource use) is all the rage now. A lot of people are also working on
optimizing hardware  - NVIDIA is currently the top player, but there are
hardware contenders who are working on specialized hardware that will
optimize some of the workfloads 100x (for example dedicated
transformer-only-capable hardware). But it will take many years for the
hardware to get adopted - because it needs quality APIs, compilers and
adoption of the existing solutions and tools, so for now everyone is trying
to optimize their processes on "data engineering" and software level - by
optimizing out all the overhead of the processing introduced by the
pipelines, libraries and tools they run their workflows with.

This will be a theme for the next 2 or 3 years easily - and we have a
chance to shine as the leader in that space - also by making it very easy
and intuitive to optimize workflows that our users currently have.

I think the solution proposed by David, is a very intuitive and easy way
for the users to optimize their workflows. We should not stop there - I
will start another discussion soon about one of the possible things we
should consider implementing (unicorn executor), but I think the more
options we give our users to easily optimise their workflows in Airflow
3.*, the more suitable it will be for some of the new workflows we have in
mind for Airflow 3 (mainly inference related).

J.

On Mon, Oct 7, 2024 at 10:33 AM Blain David <david.bl...@infrabel.be> wrote:

> Exactly Jarek, and yes Daniel, I use that max_active_tis_per_dag parameter
> to control parallelism.  The main reason for this solution is not the
> parallelism itself, but to be able to loop within one same task instance
> without overhead of Xcom's and task scheduling, which of course makes it (a
> lot) faster.  By default, it will use parallelism in my implementation, but
> sometimes it's a necessity to not do it when doing IO but still want this
> approach as the processing will still be faster than when doing it through
> dynamic task mapping.
>
> I was aware that if that feature would be added, it would have been
> Airflow 3.x anyway.  As I mentioned before, I wanted  a POC to know if it
> was easily feasible without too much changes and also I didn't wanted to
> wait until we have 3.0.  That's why I tried to implement this generic
> approach at our company to see if it:
>
> 1. Would be easily feasible to implement this without too much changes in
> Airflow base code.
> 2. Would offer better performance (always need to measure to know it).
> 3. Would make the DAG code cleaner (less or no custom code required).
>
> Now that I know it's possible and all 3 points were check marked, we could
> think of a better implementation in Airflow 3.x, if this feature would be
> accepted of course.
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Saturday, October 5, 2024 12:35 AM
> To: dev@airflow.apache.org
> Subject: Re: [PROPOSAL] Add streaming support to PartialOperator
>
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel,
> stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:
> ab...@infrabel.be>.
>
> From the earlier discussions with David - this is also (and mainly) about
> optimisation. Those operators do very little, and when you add total
> overhead that Airflow adds for scheduling and running every task, then it
> turns out that looping such operator's execute in a single interpreter is
> many, many times faster (several orders of magnitude) than running them
> sequentially as tasks.
>
> On Fri, Oct 4, 2024 at 10:16 AM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
> > Well, it looks like we do have concurrency control for mapped tasks
> > after all.
> >
> > See max_active_tis_per_dagrun which was added in
> > https://github.com/apache/airflow/pull/29094.
> >
> > So this would allow you to map over your 3000 users in a single run,
> > but process only one at a time (or 5 or 10 at a time).  Does that help
> > your use case?
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>

Re: [PROPOSAL] Add streaming support to PartialOperator

Reply via email to