This is cool.

As I understand what It means - If I read it correctly, that it's mostly a
deployment issue - we don't even have to have YuniKorn Executor - we can
use K8S Executor, and it will work out of the box, with scheduling
controlled by YuniKorn, but then we need to find a way to configure
behaviour of tasks and dags (likely via annotations of pods maybe?). That
would mean that it's mostly a documentation on "How I can leverage YuniKorn
with Airflow" + maybe a helm chart modification to install YuniKorn as an
option?

And then likely we need to add a little bit of metadata and some mapping of
"task" or "dag" or "task group" properties to open-up more capabilities of
YuniKorn Scheduling ?

Do I understand correctly?

> 1. Yunikorn treats applications at the DAG level not at the task level,
> which is great. Due to this, we can try to leverag
> gang scheduling abilities of Yunikorn.

This is great. I was wondering if we could also allow the application on
the "Task Group" level. I find it is a really interesting feature to be
able to treat a "Task Group" as an entity that we could treat as
"application" - this way you could treat the "Task Group" as "schedulable
entity" and for example set pre-emption properties for all tasks in the
same task group. Or Gang scheduling for the task group ("Only schedule
tasks in the task group when there is enough resources for the whole task
group". Or - and this is something that I think as a "holy grail" of
scheduling in the context of optimisation of machine learning workflows:
"Make sure that all the tasks in a group are scheduled on the the same node
and use the same local hardware resources" + if any of them fail, retry the
whole group - also on the same instance (I think this is partially possible
with some node affinity setup - but I would love if we should be able to
set a property on a task group effectively meaning ("Execute all tasks in
the group on the same hardware") - so a bit higher abstraction, and have
YuniKorn handle all the pre-emption and optimisations of scheduling for
that.

> 2. With the admission controller running, even the older DAGs will be able
> to benefit from the Yunikorn scheduling ablities
>
> without the need to make changes to the DAGs. This means that the same DAG
> will run with default scheduler (K8s default)

> as well as Yunikorn if need be!

Fantastic!

3. As Mani mentioned, preemption capabilities can be explored due to this
as well.

I am happy to work on this effort and looking forward to it.

> Yeah that would be cool - also see above, I think if we will be able to
have some "light touch" integration with Yunikorn, where we could handle
"Task Group" as schedulable entity + have some higher level abstractions /
properties of it that would map into some "scheduling behaviour" -
preemption/gang scheduling and document it, that would be great and easy
way of expanding Airflow capabilities - especially for ML workflows.

J.


On Tue, Oct 29, 2024 at 8:10 AM Amogh Desai <amoghdesai....@gmail.com>
wrote:

> Building upon the POC done by Manikandan, I tried my hands at an experiment
> too.
>
> I wanted to mainly experiment with the Yunikorn admission controller, with
> an aim to make
>
> no changes to my older DAGs.
>
>
> Deployed a setup that looks like this:
>
> - Deployed Yunikorn in a kind cluster with the default configurations. The
> default configurations launches the
>
> Yunikorn scheduler as well as an admission controller which watches for a
> `yunikorn-configs` configmap that
>
> can define queues, partitions, placement rules etc.
>
> - Deployed Airflow using helm charts in the same kind cluster while
> specifying the executor as KubernetesExecutor.
>
>
>
> Wanted to test out if Yunikorn can take over the scheduling of Airflow
> workers. Created some queues using this
>
> config present here:
>
> https://github.com/apache/yunikorn-k8shim/blob/master/deployments/examples/namespace/queues.yaml
>
>
> Tried running the Airflow K8s executor dag
> <
> https://github.com/apache/airflow/blob/main/airflow/example_dags/example_kubernetes_executor.py
> >
> without
> any changes to the DAG.
>
> I was able to run the DAG successfully.
>
>
> Results
>
> 1. The task pods get scheduled by Yunikorn instead of the default K8s
> scheduler
>
>
> 2. I was able to observe a single application run for the Airflow DAG in
> the Yunikorn UI.
>
>
> Observations
>
> 1. Yunikorn treats applications at the DAG level not at the task level,
> which is great. Due to this, we can try to leverage
>
> gang scheduling abilities of Yunikorn.
>
> 2. With the admission controller running, even the older DAGs will be able
> to benefit from the Yunikorn scheduling ablities
>
> without the need to make changes to the DAGs. This means that the same DAG
> will run with default scheduler (K8s default)
>
> as well as Yunikorn if need be!
>
> 3. As Mani mentioned, preemption capabilities can be explored due to this
> as well.
>
>
> I am happy to work on this effort and looking forward to it.
>
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Tue, Oct 15, 2024 at 4:26 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Hello here,
> >
> > *Tl;DR; I would love to start discussion about creating (for Airflow 3.x
> -
> > it does not have to be Airflow 3.0) a new community executor based on
> > YuniKorn*
> >
> > You might remember my point "replacing Celery Executor" when I raised the
> > Airflow 3 question. I never actually "meant" to replace (and remove)
> Celery
> > Executor, but I was more in a quest to see if we have a viable
> alternative.
> >
> > And I think we have one with Apache Yunicorn.
> https://yunikorn.apache.org/
> >
> > While it is not a direct replacement (so I'd say it should be an
> additional
> > executor), I think Yunikorn can provide us with a number of features that
> > we currently cannot give to our users and from the discussions I had and
> > talk I saw at the Community Over Code in Denver, I believe it might be
> > something that might make Airflow also more capable especially in the
> > "optimization wars" context that I wrote about in
> > https://lists.apache.org/thread/1mp6jcfvx67zd3jjt9w2hlj0c5ysbh8r
> >
> > It seems like quite a good fit for the "Inference" use case that we want
> to
> > support for Airflow 3.
> >
> > At the Community Over Code I attended a talk (and had quite nice
> follow-up
> > discussion) from Apple engineers - named: "Maximizing GPU Utilization:
> > Apache YuniKorn Preemption" and had a very long discussion with Cloudera
> > people who are using YuniKorn for years to optimize their workloads.
> >
> > The presentation is not recorded, but I will try to get slides and send
> it
> > your way.
> >
> > I think we should take a close look at it  - because it seems to save a
> ton
> > of implementation effort for the Apple team running Batch inference for
> > their multi-tenant internal environment - which I think is precisely what
> > you want to do.
> >
> > YuniKorn (https://yunikorn.apache.org/) is an "app-aware" scheduler that
> > has a number of queue / capacity management models, policies that allow
> > controlling various applications - competing for GPUs from a common pool.
> >
> > They mention things like:
> >
> > * Gang Scheduling / with gang scheduling preemption where there are
> > workloads requiring minimum number of workers
> > * Supports Latency sensitive workloads
> > * Resource quota management - things like priorities of execution
> > * YuniKorn preemption - with guaranteed capacity and preemption when
> needed
> > - which improves the utilisation
> > * Preemption that minimizes preemption cost (Pod level preemption rather
> > than application level preemption) - very customizable preemption with
> > opt-in/opt-out, queues, resource weights, fencing, supporting fifo/lifo
> > sorting etc.
> > * Runs in Cloud and on-premise
> >
> > The talk described quite a few scenarios of preemption/utilization/
> > guaranteed resources etc. They also outlined on what YuniKorn works on
> new
> > features (intra-queue preemption etc.) and what future things can be
> done.
> >
> >
> > Coincidentally - Amogh Desai with a friend submitted a talk for Airflow
> > Summit:
> >
> > "A Step Towards Multi-Tenant Airflow Using Apache YuniKorn"
> >
> > Which did not make it to the Summit (other talk of Amogh did) - but I
> think
> > back then we have not realized about the potential of utilising YuniKorn
> to
> > optimize workflows managed by Airflow.
> >
> > But we seem to have people in the community who know more about YuniKorn
> <>
> > Airflow relation (Amogh :) ) and could probably comment and add some
> "from
> > the trenches" experience to the discussion.
> >
> > Here is the description of the talk that Amoghs submitted:
> >
> > Multi-tenant Airflow is hard and there have been novel approaches in the
> > recent past to converge this gap. A key obstacle in multi-tenant Airflow
> is
> > the management of cluster resources. This is crucial to avoid one
> malformed
> > workload from hijacking an entire cluster. It is also vital to restrict
> > users and groups from monopolizing resources in a shared cluster using
> > their workloads.
> >
> > To tackle these challenges, we turn to Apache YuniKorn, a K8s scheduler
> > catering all kinds of workloads. We leverage YuniKorn’s hierarchical
> queues
> > in conjunction with resource quotas to establish multi-tenancy at both
> the
> > shared namespace level and within individual namespaces where Airflow is
> > deployed.
> >
> > YuniKorn also introduces Airflow to a new dimension of preemption. Now,
> > Airflow workers can preempt resources from lower-priority jobs, ensuring
> > critical schedules in our data pipelines are met without compromise.
> >
> > Join us for a discussion on integrating Airflow with YuniKorn, unraveling
> > solutions to these multi-tenancy challenges. We will also share our past
> > experiences while scaling Airflow and the steps we have taken to handle
> > real world production challenges in equitable multi-tenant K8s clusters.
> >
> > I would love to hear what you think about it. I know we are deep into
> > Airflow 3.0 implementation - but that one can be discussed/implemented
> > independently and maybe it's a good idea to start doing it earlier than
> > later if we see that it has good potential.
> >
> > J.
> >
>

Reply via email to