Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Small update to that - with 2 machines that would not work perfectly(I leave the exercise for others to think about), you'd **really** need a Yuinikorn Executor for that. But it would work with 1 machine with 10 GPUs and 20 CPUS. And another comment - I used to work with a similar setup when I wor

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Sure. Say the company has 2 workers with 5 GPUs and 10 CPUS each and wants to optimize the load so that they can run up-to-20 CPU bound tasks (very short one, not doing much) and up-to-10 GPU bound tasks.. When GPU bound tasks start - they request (and wait for) GPU being available. Airflow has

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Also just to anticipate one of the possible counter-solutions. The same scenario applies when there are several DAGs - each using the same GPU operator - having "pool" being the only limiting factor (in one DAG you could have max_active_tasks etc.). But the customer has several different DAGs - eac

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Daniel Standish
With your GPU example, that seems like a bit of a stretch. Can you flesh out the example a little more, get more into the details of how it actually works (and I understand it's made up anyway)? Sounds like something other than this task is spinning up a gpu resource? And this is just a "wait the

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Was supposed to be ... Enough? On Sun, Oct 20, 2024 at 7:05 PM Jarek Potiuk wrote: > Yeah... Off-by one... Good eye - I lied too :) (noticed it after I sent > the message. I wish email had the ability to correct typos).. > > 2) -> yes we agree, but to clarify a bit - we need it at OPERATOR leve

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Yeah... Off-by one... Good eye - I lied too :) (noticed it after I sent the message. I wish email had the ability to correct typos).. 2) -> yes we agree, but to clarify a bit - we need it at OPERATOR level not TASK level. The difference comes from who defines it should be the Operator's Author not

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Daniel Standish
So yeah hopefully we all agree that if we keep it, we should move it to task. I guess we can think of this thread as two items: 1. if we keep ability to have tasks not occupy a pool slot, shouldn't it be configured at task level? I think so. 2. but should we keep the ability to have tas

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Daniel Standish
Totally agree that if we *must* make it configurable, then at task is the right place. Will read the rest of it later :)

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
Hello everyone, TL;DR; I also read Constance's point and I wholeheartedly agree with her. And after thinking about it my answer is slightly changed: *"we should keep the option, but having it per-pool is wrong, it should be defined at operator level".* > I think the confusion comes from who’s res

Re: [DISCUSS] pools and deferrables

2024-10-20 Thread Jarek Potiuk
TL;DR; I think we should keep what we have. (Daniel and just to clarify - I am not trying to derail the discussion on what we should do with pool + deferrable - just trying to understand what use cases it serves now and maybe what use cases it should serve in the future world where ML/AI and parti

Re: [DISCUSS] pools and deferrables

2024-10-19 Thread Daniel Standish
> > I’m wondering why do we want to remove this. The design seems to be > reasonable, but yep, it might not be as helpful as mentioned. > > > is it useful to have it take up a slot at the first and last couple > seconds > of its lifecycle? methinks no. > > Some edge cases “might” be helpful. I gue

Re: [DISCUSS] pools and deferrables

2024-10-11 Thread Constance Martineau
The main reason for pools is to control task execution parallelism, especially when tasks interact with systems like APIs or databases, to avoid overwhelming them. For deferrable operators, if a trigger is just 'sleeping' or waiting, it’s fine to exclude the task from the pool while in a deferred s

Re: [DISCUSS] pools and deferrables

2024-10-10 Thread Wei Lee
I’m wondering why do we want to remove this. The design seems to be reasonable, but yep, it might not be as helpful as mentioned. > is it useful to have it take up a slot at the first and last couple seconds of its lifecycle? methinks no. Some edge cases “might” be helpful. I guess? A deferrabl

Re: [DISCUSS] pools and deferrables

2024-10-10 Thread Daniel Standish
No worries. The other issue is good to know about. Just I'm trying to keep the discussion focused on the main question. People don't have a lot of excess time to attend to things like this and weigh in. We're lucky if we get 3 people to weigh in on a discussion item like this. So, I'm interest

Re: [DISCUSS] pools and deferrables

2024-10-10 Thread Pavankumar Gopidesu
Apologies if my earlier message wasn't clear! What I was trying to convey is that, since we're discussing pool and deferrable tasks. Yes, there's also been some discussion around managing deferred tasks at the DAG level, which might be relevant here. The PR I mentioned (https://github.com/apache/ai

Re: [DISCUSS] pools and deferrables

2024-10-09 Thread Daniel Standish
Pavan, I didn't understand... If you have a position on what I am proposing, can you try to clarify it? Or maybe you are just trying to let people know that there's a PR that is related to the general topic of deferrables and concurrency?

Re: [DISCUSS] pools and deferrables

2024-10-09 Thread Pavankumar Gopidesu
Thanks Daniel for putting this together. As we are discussing here about pool/slots occupancy, at present there is no way to control how many deferred task can be executed except pool configuration IMHO. There is issue created by Raphaelauv to control the number of deferred tasks on dag level. ht

Re: [DISCUSS] pools and deferrables

2024-10-09 Thread Daniel Standish
and in case it wasn't clear, part of what i'm proposing is remove the "pool can be configured to include deferred tasks" configuration option, because we should just alays count a deferred task as taking up a pool slot and if user doesn't want it to be part of pool during deferral just don't give i

[DISCUSS] pools and deferrables

2024-10-09 Thread Daniel Standish
Some time ago, we added the ability to optionally configure a pool so that it would count a deferred task as taking up a pool slot. It strikes me that there isn't much point in specifying a pool for a deferrable task if the task will not take up a slot when it is deferred. why? well the point of