On 03/06/2018 09:36 PM, Alex Xu wrote:
2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com
<mailto:sou...@gmail.com>>:
2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com
<mailto:sean.k.moo...@intel.com>>:
__ __
__ __
*From:*Matthew Booth [mailto:mbo...@redhat.com
<mailto:mbo...@redhat.com>]
*Sent:* Saturday, March 3, 2018 4:15 PM
*To:* OpenStack Development Mailing List (not for usage
questions) <openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>>
*Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple
functions____
__ __
On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>> wrote:____
On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
Hello Nova team,
During the Cyborg discussion at Rocky PTG, we
proposed a flow for FPGAs wherein the request spec asks
for a device type as a resource class, and optionally a
function (such as encryption) in the extra specs. This
does not seem to work well for the usage model that I’ll
describe below.
An FPGA device may implement more than one function. For
example, it may implement both compression and
encryption. Say a cluster has 10 devices of device type
X, and each of them is programmed to offer 2 instances
of function A and 4 instances of function B. More
specifically, the device may implement 6 PCI functions,
with 2 of them tied to function A, and the other 4 tied
to function B. So, we could have 6 separate instances
accessing functions on the same device.____
__ __
Does this imply that Cyborg can't reprogram the FPGA at all?____
*/[Mooney, Sean K] cyborg is intended to support fixed function
acclerators also so it will not always be able to program the
accelerator. In this case where an fpga is preprogramed with a
multi function bitstream that is statically provisioned cyborge
will not be able to reprogram the slot if any of the fuctions
from that slot are already allocated to an instance. In this
case it will have to treat it like a fixed function device and
simply allocate a unused vf of the corret type if available.
____/*
____
In the current flow, the device type X is modeled as a
resource class, so Placement will count how many of them
are in use. A flavor for ‘RC device-type-X + function A’
will consume one instance of the RC device-type-X. But
this is not right because this precludes other functions
on the same device instance from getting used.
One way to solve this is to declare functions A and B as
resource classes themselves and have the flavor request
the function RC. Placement will then correctly count the
function instances. However, there is still a problem:
if the requested function A is not available, Placement
will return an empty list of RPs, but we need some way
to reprogram some device to create an instance of
function A.____
Clearly, nova is not going to be reprogramming devices with
an instance of a particular function.
Cyborg might need to have a separate agent that listens to
the nova notifications queue and upon seeing an event that
indicates a failed build due to lack of resources, then
Cyborg can try and reprogram a device and then try
rebuilding the original request.____
__ __
It was my understanding from that discussion that we intend to
insert Cyborg into the spawn workflow for device configuration
in the same way that we currently insert resources provided by
Cinder and Neutron. So while Nova won't be reprogramming a
device, it will be calling out to Cyborg to reprogram a device,
and waiting while that happens.____
My understanding is (and I concede some areas are a little
hazy):____
* The flavors says device type X with function Y____
* Placement tells us everywhere with device type X____
* A weigher orders these by devices which already have an
available function Y (where is this metadata stored?)____
* Nova schedules to host Z____
* Nova host Z asks cyborg for a local function Y and blocks____
* Cyborg hopefully returns function Y which is already
available____
* If not, Cyborg reprograms a function Y, then returns it____
Can anybody correct me/fill in the gaps?____
*/[Mooney, Sean K] that correlates closely to my recollection
also. As for the metadata I think the weigher may need to call
to cyborg to retrieve this as it will not be available in the
host state object./*
Is it the nova scheduler weigher or we want to support weigh on
placement? Function is traits as I think, so can we have
preferred_traits? I remember we talk about that parameter in the
past, but we don't have good use-case at that time. This is good
use-case.
If we call the Cyborg from the nova scheduler weigher, that will slow
down the scheduling a lot also.
Right, which is why I don't want to do any weighing in Placement at all.
If folks want to sort by things that require long-running code/callbacks
or silly temporal things like metrics, they can do that in a custom
weigher in the nova-scheduler and take the performance hit there.
Best,
-jay
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev