Re: [Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session

Masahito MUROI Fri, 14 Apr 2017 01:05:32 -0700

Hi scientific team,

As Jay mentioned the previous mail, I drafted the instancereservation[1] of Blazar and some have already added their comments.


1. https://etherpad.openstack.org/p/new-instance-reservation

Please adds your comments, concerns and/or what you want. It could makemore clear what the draft's missing now and what the instancereservation needs to include. Additionally, I think we would have betterdiscussion in the forum session basing the previous discussion.



best regards,
Masahito


On 2017/04/12 4:22, Jay Pipes wrote:

On 04/11/2017 02:08 PM, Pierre Riteau wrote:

On 4 Apr 2017, at 22:23, Jay Pipes <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>> wrote:

On 04/04/2017 02:48 PM, Tim Bell wrote:

Some combination of spot/OPIE


What is OPIE?


Maybe I missed a message: I didn’t see any reply to Jay’s question about
OPIE.


Thanks!

OPIE is the OpenStack Preemptible Instances
Extension: https://github.com/indigo-dc/opie
I am sure other on this list can provide more information.


Got it.

I think running OPIE instances inside Blazar reservations would be
doable without many changes to the implementation.
We’ve talked about this idea several times, this forum session would be
an ideal place to draw up an implementation plan.


I just looked through the OPIE source code. One thing I'm wondering is
why the code for killing off pre-emptible instances is being done in the
filter_scheduler module?

Why not have a separate service that merely responds to the raising of a
NoValidHost exception being raised from the scheduler with a call to go
and terminate one or more instances that would have allowed the original
request to land on a host?

Right here is where OPIE goes and terminates pre-emptible instances:

https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L92-L100


However, that code should actually be run when line 90 raises NoValidHost:

https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L90


There would be no need at all for "detecting overcommit" here:

https://github.com/indigo-dc/opie/blob/master/opie/scheduler/filter_scheduler.py#L96


Simply detect a NoValidHost being returned to the conductor from the
scheduler, examine if there are pre-emptible instances currently running
that could be terminated and terminate them, and re-run the original
call to select_destinations() (the scheduler call) just like a Retry
operation normally does.

There's be no need whatsoever to involve any changes to the scheduler at
all.

and Blazar would seem doable as long as the resource provider
reserves capacity appropriately (i.e. spot resources>>blazar
committed along with no non-spot requests for the same aggregate).
Is this feasible?


No. :)

As mentioned in previous emails and on the etherpad here:

https://etherpad.openstack.org/p/new-instance-reservation

I am firmly against having the resource tracker or the placement API
represent inventory or allocations with a temporal aspect to them (i.e.
allocations in the future).

A separate system (hopefully Blazar) is needed to manage the time-based
associations to inventories of resources over a period in the future.

Best,
-jay

I'm not sure how the above is different from the constraints I mention
below about having separate sets of resource providers for preemptible
instances than for non-preemptible instances?

Best,
-jay

Tim

On 04.04.17, 19:21, "Jay Pipes" <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>> wrote:

   On 04/03/2017 06:07 PM, Blair Bethwaite wrote:
   > Hi Jay,
   >
   > On 4 April 2017 at 00:20, Jay Pipes <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>> wrote:
   >> However, implementing the above in any useful fashion requires
that Blazar
   >> be placed *above* Nova and essentially that the cloud operator
turns off
   >> access to Nova's  POST /servers API call for regular users.
Because if not,
   >> the information that Blazar acts upon can be simply
circumvented by any user
   >> at any time.
   >
   > That's something of an oversimplification. A reservation system
   > outside of Nova could manipulate Nova host-aggregates to "cordon
off"
   > infrastructure from on-demand access (I believe Blazar already
uses
   > this approach), and it's not much of a jump to imagine operators
being
   > able to twiddle the available reserved capacity in a finite
cloud so
   > that reserved capacity can be offered to the subset of
users/projects
   > that need (or perhaps have paid for) it.

   Sure, I'm following you up until here.

   > Such a reservation system would even be able to backfill capacity
   > between reservations. At the end of the reservation the system
   > cleans-up any remaining instances and preps for the next
   > reservation.

   By "backfill capacity between reservations", do you mean consume
   resources on the compute hosts that are "reserved" by this paying
   customer at some date in the future? i.e. Spot instances that can be
   killed off as necessary by the reservation system to free
resources to
   meet its reservation schedule?

   > The are a couple of problems with putting this outside of Nova
though.
   > The main issue is that pre-emptible/spot type instances can't be
   > accommodated within the on-demand cloud capacity.

   Correct. The reservation system needs complete control over a
subset of
   resource providers to be used for these spot instances. It would
be like
   a hotel reservation system being used for a motel where cars could
   simply pull up to a room with a vacant sign outside the door. The
   reservation system would never be able to work on accurate data
unless
   some part of the motel's rooms were carved out for reservation
system to
   use and cars to not pull up and take.

    >  You could have the
   > reservation system implementing this feature, but that would
then put
   > other scheduling constraints on the cloud in order to be effective
   > (e.g., there would need to be automation changing the size of the
   > on-demand capacity so that the maximum pre-emptible capacity was
   > always available). The other issue (admittedly minor, but still a
   > consideration) is that it's another service - personally I'd
love to
   > see Nova support these advanced use-cases directly.

   Welcome to the world of microservices. :)

   -jay

   _______________________________________________
   OpenStack-operators mailing list
   OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



--
室井 雅仁(Masahito MUROI)
Software Innovation Center, NTT
Tel: +81-422-59-4539



_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [scientific] Resource reservation requirements (Blazar) - Forum session

Reply via email to