This all looks really good - sounds like something that we could only do in K8S executor and likely even make it compatible with Airflow 2 and release independently
On Tue, Oct 29, 2024 at 1:30 PM Amogh Desai <amoghdesai....@gmail.com> wrote: > > As I understand what It means - If I read it correctly, that it's mostly > a > deployment issue - we don't even have to have YuniKorn Executor - we can > use K8S Executor, and it will work out of the box, with scheduling > controlled by YuniKorn, but then we need to find a way to configure > behaviour of tasks and dags (likely via annotations of pods maybe?). That > would mean that it's mostly a documentation on "How I can leverage YuniKorn > with Airflow" + maybe a helm chart modification to install YuniKorn as an > option? > > And then likely we need to add a little bit of metadata and some mapping of > "task" or "dag" or "task group" properties to open-up more capabilities of > YuniKorn Scheduling ? > > Do I understand correctly? > > You mostly summed it up. But a few things. > Yes, we can open up Yunikorn to schedule Airflow workloads just by doing > basically > nothing or at most very little manual work. > > But to really enable Yunikorn in full power, we will have to make some > changes to the > Airflow codebase. A few things at the top of my head: > The admission controller will take care of the applicationId and scheduler > name etc, but from > an initial read, if we want things like - "schedule dags to a certain queue > only" or something of > that sort, we will need some labels to be injected or even a level above, > get the KPO to add > some labels etc, like a queue. > OR > even if we can specify the queue for every operator by extending the > BaseOperator, that would be cool > too. > > I personally think if we could extend the KubernetesExecutor to > YunikornExecutor (naming doesn't matter > to me), we can handle things like installing Yunikorn along with Airflow by > making changes for helm charts, > make it come up with the scheduler, admission controller, etc. We will able > to make code changes for Airflow > by controlling the internal logic with the executor type instead of lending > it all the way to the end user (I > mean options like the label injection, labelling all the tasks of a group > as an application, to adhere to Jarek's > thought). > > Manikandan, feel free to add anything more from the Yunikorn side in case I > have misinterpreted or > just generally missed :) > > > Thanks & Regards, > Amogh Desai > > > On Tue, Oct 29, 2024 at 1:28 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > This is cool. > > > > As I understand what It means - If I read it correctly, that it's mostly > a > > deployment issue - we don't even have to have YuniKorn Executor - we can > > use K8S Executor, and it will work out of the box, with scheduling > > controlled by YuniKorn, but then we need to find a way to configure > > behaviour of tasks and dags (likely via annotations of pods maybe?). That > > would mean that it's mostly a documentation on "How I can leverage > YuniKorn > > with Airflow" + maybe a helm chart modification to install YuniKorn as an > > option? > > > > And then likely we need to add a little bit of metadata and some mapping > of > > "task" or "dag" or "task group" properties to open-up more capabilities > of > > YuniKorn Scheduling ? > > > > Do I understand correctly? > > > > > 1. Yunikorn treats applications at the DAG level not at the task level, > > > which is great. Due to this, we can try to leverag > > > gang scheduling abilities of Yunikorn. > > > > This is great. I was wondering if we could also allow the application on > > the "Task Group" level. I find it is a really interesting feature to be > > able to treat a "Task Group" as an entity that we could treat as > > "application" - this way you could treat the "Task Group" as "schedulable > > entity" and for example set pre-emption properties for all tasks in the > > same task group. Or Gang scheduling for the task group ("Only schedule > > tasks in the task group when there is enough resources for the whole task > > group". Or - and this is something that I think as a "holy grail" of > > scheduling in the context of optimisation of machine learning workflows: > > "Make sure that all the tasks in a group are scheduled on the the same > node > > and use the same local hardware resources" + if any of them fail, retry > the > > whole group - also on the same instance (I think this is partially > possible > > with some node affinity setup - but I would love if we should be able to > > set a property on a task group effectively meaning ("Execute all tasks in > > the group on the same hardware") - so a bit higher abstraction, and have > > YuniKorn handle all the pre-emption and optimisations of scheduling for > > that. > > > > > 2. With the admission controller running, even the older DAGs will be > > able > > > to benefit from the Yunikorn scheduling ablities > > > > > > without the need to make changes to the DAGs. This means that the same > > DAG > > > will run with default scheduler (K8s default) > > > > > as well as Yunikorn if need be! > > > > Fantastic! > > > > 3. As Mani mentioned, preemption capabilities can be explored due to this > > as well. > > > > I am happy to work on this effort and looking forward to it. > > > > > Yeah that would be cool - also see above, I think if we will be able to > > have some "light touch" integration with Yunikorn, where we could handle > > "Task Group" as schedulable entity + have some higher level abstractions > / > > properties of it that would map into some "scheduling behaviour" - > > preemption/gang scheduling and document it, that would be great and easy > > way of expanding Airflow capabilities - especially for ML workflows. > > > > J. > > > > > > On Tue, Oct 29, 2024 at 8:10 AM Amogh Desai <amoghdesai....@gmail.com> > > wrote: > > > > > Building upon the POC done by Manikandan, I tried my hands at an > > experiment > > > too. > > > > > > I wanted to mainly experiment with the Yunikorn admission controller, > > with > > > an aim to make > > > > > > no changes to my older DAGs. > > > > > > > > > Deployed a setup that looks like this: > > > > > > - Deployed Yunikorn in a kind cluster with the default configurations. > > The > > > default configurations launches the > > > > > > Yunikorn scheduler as well as an admission controller which watches > for a > > > `yunikorn-configs` configmap that > > > > > > can define queues, partitions, placement rules etc. > > > > > > - Deployed Airflow using helm charts in the same kind cluster while > > > specifying the executor as KubernetesExecutor. > > > > > > > > > > > > Wanted to test out if Yunikorn can take over the scheduling of Airflow > > > workers. Created some queues using this > > > > > > config present here: > > > > > > > > > https://github.com/apache/yunikorn-k8shim/blob/master/deployments/examples/namespace/queues.yaml > > > > > > > > > Tried running the Airflow K8s executor dag > > > < > > > > > > https://github.com/apache/airflow/blob/main/airflow/example_dags/example_kubernetes_executor.py > > > > > > > without > > > any changes to the DAG. > > > > > > I was able to run the DAG successfully. > > > > > > > > > Results > > > > > > 1. The task pods get scheduled by Yunikorn instead of the default K8s > > > scheduler > > > > > > > > > 2. I was able to observe a single application run for the Airflow DAG > in > > > the Yunikorn UI. > > > > > > > > > Observations > > > > > > 1. Yunikorn treats applications at the DAG level not at the task level, > > > which is great. Due to this, we can try to leverage > > > > > > gang scheduling abilities of Yunikorn. > > > > > > 2. With the admission controller running, even the older DAGs will be > > able > > > to benefit from the Yunikorn scheduling ablities > > > > > > without the need to make changes to the DAGs. This means that the same > > DAG > > > will run with default scheduler (K8s default) > > > > > > as well as Yunikorn if need be! > > > > > > 3. As Mani mentioned, preemption capabilities can be explored due to > this > > > as well. > > > > > > > > > I am happy to work on this effort and looking forward to it. > > > > > > > > > > > > Thanks & Regards, > > > Amogh Desai > > > > > > > > > On Tue, Oct 15, 2024 at 4:26 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > Hello here, > > > > > > > > *Tl;DR; I would love to start discussion about creating (for Airflow > > 3.x > > > - > > > > it does not have to be Airflow 3.0) a new community executor based on > > > > YuniKorn* > > > > > > > > You might remember my point "replacing Celery Executor" when I raised > > the > > > > Airflow 3 question. I never actually "meant" to replace (and remove) > > > Celery > > > > Executor, but I was more in a quest to see if we have a viable > > > alternative. > > > > > > > > And I think we have one with Apache Yunicorn. > > > https://yunikorn.apache.org/ > > > > > > > > While it is not a direct replacement (so I'd say it should be an > > > additional > > > > executor), I think Yunikorn can provide us with a number of features > > that > > > > we currently cannot give to our users and from the discussions I had > > and > > > > talk I saw at the Community Over Code in Denver, I believe it might > be > > > > something that might make Airflow also more capable especially in the > > > > "optimization wars" context that I wrote about in > > > > https://lists.apache.org/thread/1mp6jcfvx67zd3jjt9w2hlj0c5ysbh8r > > > > > > > > It seems like quite a good fit for the "Inference" use case that we > > want > > > to > > > > support for Airflow 3. > > > > > > > > At the Community Over Code I attended a talk (and had quite nice > > > follow-up > > > > discussion) from Apple engineers - named: "Maximizing GPU > Utilization: > > > > Apache YuniKorn Preemption" and had a very long discussion with > > Cloudera > > > > people who are using YuniKorn for years to optimize their workloads. > > > > > > > > The presentation is not recorded, but I will try to get slides and > send > > > it > > > > your way. > > > > > > > > I think we should take a close look at it - because it seems to > save a > > > ton > > > > of implementation effort for the Apple team running Batch inference > for > > > > their multi-tenant internal environment - which I think is precisely > > what > > > > you want to do. > > > > > > > > YuniKorn (https://yunikorn.apache.org/) is an "app-aware" scheduler > > that > > > > has a number of queue / capacity management models, policies that > allow > > > > controlling various applications - competing for GPUs from a common > > pool. > > > > > > > > They mention things like: > > > > > > > > * Gang Scheduling / with gang scheduling preemption where there are > > > > workloads requiring minimum number of workers > > > > * Supports Latency sensitive workloads > > > > * Resource quota management - things like priorities of execution > > > > * YuniKorn preemption - with guaranteed capacity and preemption when > > > needed > > > > - which improves the utilisation > > > > * Preemption that minimizes preemption cost (Pod level preemption > > rather > > > > than application level preemption) - very customizable preemption > with > > > > opt-in/opt-out, queues, resource weights, fencing, supporting > fifo/lifo > > > > sorting etc. > > > > * Runs in Cloud and on-premise > > > > > > > > The talk described quite a few scenarios of preemption/utilization/ > > > > guaranteed resources etc. They also outlined on what YuniKorn works > on > > > new > > > > features (intra-queue preemption etc.) and what future things can be > > > done. > > > > > > > > > > > > Coincidentally - Amogh Desai with a friend submitted a talk for > Airflow > > > > Summit: > > > > > > > > "A Step Towards Multi-Tenant Airflow Using Apache YuniKorn" > > > > > > > > Which did not make it to the Summit (other talk of Amogh did) - but I > > > think > > > > back then we have not realized about the potential of utilising > > YuniKorn > > > to > > > > optimize workflows managed by Airflow. > > > > > > > > But we seem to have people in the community who know more about > > YuniKorn > > > <> > > > > Airflow relation (Amogh :) ) and could probably comment and add some > > > "from > > > > the trenches" experience to the discussion. > > > > > > > > Here is the description of the talk that Amoghs submitted: > > > > > > > > Multi-tenant Airflow is hard and there have been novel approaches in > > the > > > > recent past to converge this gap. A key obstacle in multi-tenant > > Airflow > > > is > > > > the management of cluster resources. This is crucial to avoid one > > > malformed > > > > workload from hijacking an entire cluster. It is also vital to > restrict > > > > users and groups from monopolizing resources in a shared cluster > using > > > > their workloads. > > > > > > > > To tackle these challenges, we turn to Apache YuniKorn, a K8s > scheduler > > > > catering all kinds of workloads. We leverage YuniKorn’s hierarchical > > > queues > > > > in conjunction with resource quotas to establish multi-tenancy at > both > > > the > > > > shared namespace level and within individual namespaces where Airflow > > is > > > > deployed. > > > > > > > > YuniKorn also introduces Airflow to a new dimension of preemption. > Now, > > > > Airflow workers can preempt resources from lower-priority jobs, > > ensuring > > > > critical schedules in our data pipelines are met without compromise. > > > > > > > > Join us for a discussion on integrating Airflow with YuniKorn, > > unraveling > > > > solutions to these multi-tenancy challenges. We will also share our > > past > > > > experiences while scaling Airflow and the steps we have taken to > handle > > > > real world production challenges in equitable multi-tenant K8s > > clusters. > > > > > > > > I would love to hear what you think about it. I know we are deep into > > > > Airflow 3.0 implementation - but that one can be > discussed/implemented > > > > independently and maybe it's a good idea to start doing it earlier > than > > > > later if we see that it has good potential. > > > > > > > > J. > > > > > > > > > >