Re: [DISCUSS] pools and deferrables

Daniel Standish Sun, 20 Oct 2024 11:57:09 -0700

With your GPU example, that seems like a bit of a stretch.  Can you flesh
out the example a little more, get more into the details of how it
actually works (and I understand it's made up anyway)?


Sounds like something other than this task is spinning up a gpu resource?
And this is just a "wait then run" operator?  If something else is
controlling the resource spin up, then why does this task need a pool at
all?  It's not controlling the increase in load.

No example needed with use_pool_when_deferred=True because that's how I
think it makes sense to be universally.


On Sun, Oct 20, 2024 at 10:05 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Yeah... Off-by one... Good eye - I lied too :) (noticed it after I sent the
> message. I wish email had the ability to correct typos)..
>
> 2) -> yes we agree, but to clarify a bit - we need it at OPERATOR level not
> TASK level. The difference comes from who defines it should be the
> Operator's Author not the  DAG Author. I.e. we should be able to define
> "use_pool_when_deferred" (working name) when you define the operator, not
> when you create the operator as a task in DAG. So basically IMHO it should
> have the ability to internally set this property of the BaseOperator, but
> it's not necessary to expose it via the `__init__` method of the actual
> CustomDeferrableOperator(BaseOperator). We still CAN expose it via
> __init__, but I'd not say it's desired.
>
> Example:
>
> 1) example 1. RunMyGPUFineTuningOperator. Pool = num shared GPU, The
> operator does: a) wait in deferrable for a MODEL to appear b) upload the
> model and fine-tunes it (non-deferrable, uses GPU).
> "use_pool_when_deferred" = False
> 2) example 2. UpdateNewSalesforceUsersOperator. Pool = num salesforce
> connections. (Protect Salesforce API from being overloaded - our licence
> has only 10 parallel connections possible) The operator does a) checks if
> new users are defined (by polling Salesforce API) - deferred b) updates the
> user with new fields via Salesforce API. "use_pool_when_deferred" = True
>
> Enough.
>
> J.
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Oct 20, 2024 at 4:45 PM Daniel Standish
> <daniel.stand...@astronomer.io.invalid> wrote:
>
> > So yeah hopefully we all agree that if we keep it, we should move it to
> > task.
> >
> > I guess we can think of this thread as two items:
> >
> >    1. if we keep ability to have tasks not occupy a pool slot, shouldn't
> it
> >    be configured at task level?  I think so.
> >    2. but should we keep the ability to have tasks not be considered at
> >    task level?
> >    3. if tasks are to stay in pools when deferred, ideally they should do
> >    so continuously (e.g. including when in between worker and triggerer)
> >
> >
> > Ok I lied, three items. But 3 is more like a reminder that there is this
> > bad behavior :)
> >
> > Anyway, let's move on to focus on number 2, whether we should provide
> users
> > a configuration option to make tasks "drop out" of the pool when
> deferred.
> >
> > After reading your message Jarek, I did not come away with an
> understanding
> > of a practical use case for having the task vacate the pool slot when
> > deferred.  Can you offer an example or two?
> >
> >
> >
> > On Sun, Oct 20, 2024 at 7:29 AM Daniel Standish <
> > daniel.stand...@astronomer.io> wrote:
> >
> > > Totally agree that if we *must* make it configurable, then at task is
> the
> > > right place.  Will read the rest of it later :)
> >
>

Re: [DISCUSS] pools and deferrables

Reply via email to