On Wed, May 30, 2018 at 1:33 AM, Nadathur, Sundar <sundar.nadat...@intel.com > wrote:
> Hi all, > The Cyborg/Nova scheduling spec [1] details what traits will be applied > to the resource providers that represent devices like GPUs. Some of the > traits referred to vendor names. I got feedback that traits must not refer > to products or specific models of devices. I agree. However, we need some > reference to device types to enable matching the VM driver with the device. > > TL;DR We need some reference to device types, but we don't need product > names. I will update the spec [1] to clarify that. Rest of this email > clarifies why we need device types in traits, and what traits we propose to > include. > > In general, an accelerator device is operated by two pieces of software: a > driver in the kernel (which may discover and handle the PF for SR-IOV > devices), and a driver/library in the guest (which may handle the assigned > VF). > > The device assigned to the VM must match the driver/library packaged in > the VM. For this, the request must explicitly state what category of > devices it needs. For example, if the VM needs a GPU, it needs to say > whether it needs an AMD GPU or an Nvidia GPU, since it may have the > driver/libraries for that vendor alone. It may also need to state what > version of Cuda is needed, if it is a Nvidia GPU. These aspects are > necessarily vendor-specific. > > FWIW, the vGPU implementation for Nova also has the same concern. We want to provide traits for explicitly say "use this vGPU type" but given it's related to a specific vendor, we can't just say "ask for this frame buffer size, or just for the display heads", but rather "we need a vGPU accepting Quadro vDWS license". > Further, one driver/library version may handle multiple devices. Since a > new driver version may be backwards compatible, multiple driver versions > may manage the same device. The development/release of the driver/library > inside the VM should be independent of the kernel driver for that device. > > I agree. > For FPGAs, there is an additional twist as the VM may need specific > bitstream(s), and they match only specific device/region types. The > bitstream for a device from a vendor will not fit any other device from the > same vendor, let alone other vendors. IOW, the region type is specific not > just to a vendor but to a device type within the vendor. So, it is > essential to identify the device type. > > So, the proposed set of RCs and traits are as below. As we learn more > about actual usages by operators, we may need to evolve this set. > > - There is a resource class per device category e.g. > CUSTOM_ACCELERATOR_GPU, CUSTOM_ACCELERATOR_FPGA. > - The resource provider that represents a device has the following > traits: > - Vendor/Category trait: e.g. CUSTOM_GPU_AMD, CUSTOM_FPGA_XILINX. > - Device type trait which is a refinement of vendor/category trait > e.g. CUSTOM_FPGA_XILINX_VU9P. > > NOTE: This is not a product or model, at least for FPGAs. Multiple > products may use the same FPGA chip. > NOTE: The reason for having both the vendor/category and this one is that > a flavor may ask for either, depending on the granularity desired. IOW, if > one driver can handle all devices from a vendor (*eye roll*), the flavor > can ask for the vendor/category trait alone. If there are separate drivers > for different device families from the same vendor, the flavor must specify > the trait for the device family. > NOTE: The equivalent trait for GPUs may be like CUSTOM_GPU_NVIDIA_P90, but > I'll let others decide if that is a product or not. > > I was about to propose the same for vGPUs in Nova, ie. using custom traits. The only concern is that we need operators to set the traits directly using osc-placement instead of having Nova magically provide those traits. But anyway, given operators need to set the vGPU types they want, I think it's acceptable. > > - For FPGAs, we have additional traits: > - Functionality trait: e.g. CUSTOM_FPGA_COMPUTE, > CUSTOM_FPGA_NETWORK, CUSTOM_FPGA_STORAGE > - Region type ID. e.g. CUSTOM_FPGA_INTEL_REGION_<uuid>. > - Optionally, a function ID, indicating what function is > currently programmed in the region RP. e.g. > CUSTOM_FPGA_INTEL_FUNCTION_<uuid>. > Not all implementations may provide it. The function trait may > change on > reprogramming, but it is not expected to be frequent. > - Possibly, CUSTOM_PROGRAMMABLE as a separate trait. > > [1] https://review.openstack.org/#/c/554717/ > I'll try to review the spec as soon as I can. -Sylvain > > > Thanks. > > Regards, > Sundar > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev