2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com>: > > > 2018-03-06 22:45 GMT+08:00 Mooney, Sean K <sean.k.moo...@intel.com>: > >> >> >> >> >> *From:* Matthew Booth [mailto:mbo...@redhat.com] >> *Sent:* Saturday, March 3, 2018 4:15 PM >> *To:* OpenStack Development Mailing List (not for usage questions) < >> openstack-dev@lists.openstack.org> >> *Subject:* Re: [openstack-dev] [Nova] [Cyborg] Tracking multiple >> functions >> >> >> >> On 2 March 2018 at 14:31, Jay Pipes <jaypi...@gmail.com> wrote: >> >> On 03/02/2018 02:00 PM, Nadathur, Sundar wrote: >> >> Hello Nova team, >> >> During the Cyborg discussion at Rocky PTG, we proposed a flow for >> FPGAs wherein the request spec asks for a device type as a resource class, >> and optionally a function (such as encryption) in the extra specs. This >> does not seem to work well for the usage model that I’ll describe below. >> >> An FPGA device may implement more than one function. For example, it may >> implement both compression and encryption. Say a cluster has 10 devices of >> device type X, and each of them is programmed to offer 2 instances of >> function A and 4 instances of function B. More specifically, the device may >> implement 6 PCI functions, with 2 of them tied to function A, and the other >> 4 tied to function B. So, we could have 6 separate instances accessing >> functions on the same device. >> >> >> >> Does this imply that Cyborg can't reprogram the FPGA at all? >> >> *[Mooney, Sean K] cyborg is intended to support fixed function >> acclerators also so it will not always be able to program the accelerator. >> In this case where an fpga is preprogramed with a multi function bitstream >> that is statically provisioned cyborge will not be able to reprogram the >> slot if any of the fuctions from that slot are already allocated to an >> instance. In this case it will have to treat it like a fixed function >> device and simply allocate a unused vf of the corret type if available. * >> >> >> >> >> >> In the current flow, the device type X is modeled as a resource class, so >> Placement will count how many of them are in use. A flavor for ‘RC >> device-type-X + function A’ will consume one instance of the RC >> device-type-X. But this is not right because this precludes other >> functions on the same device instance from getting used. >> >> One way to solve this is to declare functions A and B as resource classes >> themselves and have the flavor request the function RC. Placement will then >> correctly count the function instances. However, there is still a problem: >> if the requested function A is not available, Placement will return an >> empty list of RPs, but we need some way to reprogram some device to create >> an instance of function A. >> >> >> Clearly, nova is not going to be reprogramming devices with an instance >> of a particular function. >> >> Cyborg might need to have a separate agent that listens to the nova >> notifications queue and upon seeing an event that indicates a failed build >> due to lack of resources, then Cyborg can try and reprogram a device and >> then try rebuilding the original request. >> >> >> >> It was my understanding from that discussion that we intend to insert >> Cyborg into the spawn workflow for device configuration in the same way >> that we currently insert resources provided by Cinder and Neutron. So while >> Nova won't be reprogramming a device, it will be calling out to Cyborg to >> reprogram a device, and waiting while that happens. >> >> My understanding is (and I concede some areas are a little hazy): >> >> * The flavors says device type X with function Y >> >> * Placement tells us everywhere with device type X >> >> * A weigher orders these by devices which already have an available >> function Y (where is this metadata stored?) >> >> * Nova schedules to host Z >> >> * Nova host Z asks cyborg for a local function Y and blocks >> >> * Cyborg hopefully returns function Y which is already available >> >> * If not, Cyborg reprograms a function Y, then returns it >> >> Can anybody correct me/fill in the gaps? >> >> *[Mooney, Sean K] that correlates closely to my recollection also. As for >> the metadata I think the weigher may need to call to cyborg to retrieve >> this as it will not be available in the host state object.* >> > Is it the nova scheduler weigher or we want to support weigh on placement? > Function is traits as I think, so can we have preferred_traits? I remember > we talk about that parameter in the past, but we don't have good use-case > at that time. This is good use-case. >
If we call the Cyborg from the nova scheduler weigher, that will slow down the scheduling a lot also. > > >> Matt >> >> >> >> -- >> >> Matthew Booth >> >> Red Hat OpenStack Engineer, Compute DFG >> >> >> >> Phone: +442070094448 <+44%2020%207009%204448> (UK) >> >> >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev