Sorry for the delayed response. I broadly agree with previous replies.
For the concerns about the impact of Cyborg weigher on scheduling
performance , there are some options (apart from filtering candidates as
much as possible in Placement):
* Handle hosts in bulk by extending BaseWeigher
<https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and
overriding weigh_objects
<https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(),
instead of handling one host at a time.
* If we have to handle one host at a time for whatever reason, since the
weigher is maintained by Cyborg, it could directly query Cyborg DB
rather than go through Cyborg REST API. This will be not unlike other
weighers.
Given these and other possible optimizations, it may be too soon to
worry about the performance impact.
I am working on a spec that will capture the flow discussed in the PTG.
I will try to address these aspects as well.
Thanks & Regards,
Sundar
On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
@jay I'm also against a weigher in nova/placement. This should be an
optional step depends on vendor implementation, not a default one.
@Alex I think we should explore the idea of preferred trait.
@Mathew: Like Sean said, Cyborg wants to support both reprogrammable
FPGA and pre-programed ones.
Therefore it is correct that in your description, the programming
operation should be a call from Nova to Cyborg, and cyborg will
complete the operation while nova waits. The only problem is that the
weigher step should be an optional one.
On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>> wrote:
On 03/06/2018 09:36 PM, Alex Xu wrote:
2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com
<mailto:sou...@gmail.com> <mailto:sou...@gmail.com
<mailto:sou...@gmail.com>>>:
2018-03-06 22:45 GMT+08:00 Mooney, Sean K
<sean.k.moo...@intel.com <mailto:sean.k.moo...@intel.com>
<mailto:sean.k.moo...@intel.com
<mailto:sean.k.moo...@intel.com>>>:
__ __
__ __
*From:*Matthew Booth [mailto:mbo...@redhat.com
<mailto:mbo...@redhat.com>
<mailto:mbo...@redhat.com <mailto:mbo...@redhat.com>>]
*Sent:* Saturday, March 3, 2018 4:15 PM
*To:* OpenStack Development Mailing List (not for usage
questions) <openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>
<mailto:openstack-dev@lists.openstack.org
<mailto:openstack-dev@lists.openstack.org>>>
*Subject:* Re: [openstack-dev] [Nova] [Cyborg]
Tracking multiple
functions____
__ __
On 2 March 2018 at 14:31, Jay Pipes
<jaypi...@gmail.com <mailto:jaypi...@gmail.com>
<mailto:jaypi...@gmail.com
<mailto:jaypi...@gmail.com>>> wrote:____
On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____
Hello Nova team,
During the Cyborg discussion at Rocky
PTG, we
proposed a flow for FPGAs wherein the request
spec asks
for a device type as a resource class, and
optionally a
function (such as encryption) in the extra
specs. This
does not seem to work well for the usage model
that I’ll
describe below.
An FPGA device may implement more than one
function. For
example, it may implement both compression and
encryption. Say a cluster has 10 devices of
device type
X, and each of them is programmed to offer 2
instances
of function A and 4 instances of function B. More
specifically, the device may implement 6 PCI
functions,
with 2 of them tied to function A, and the
other 4 tied
to function B. So, we could have 6 separate
instances
accessing functions on the same device.____
__ __
Does this imply that Cyborg can't reprogram the FPGA
at all?____
*/[Mooney, Sean K] cyborg is intended to support fixed
function
acclerators also so it will not always be able to
program the
accelerator. In this case where an fpga is
preprogramed with a
multi function bitstream that is statically
provisioned cyborge
will not be able to reprogram the slot if any of the
fuctions
from that slot are already allocated to an instance.
In this
case it will have to treat it like a fixed function
device and
simply allocate a unused vf of the corret type if
available.
____/*
____
In the current flow, the device type X is
modeled as a
resource class, so Placement will count how
many of them
are in use. A flavor for ‘RC device-type-X +
function A’
will consume one instance of the RC
device-type-X. But
this is not right because this precludes other
functions
on the same device instance from getting used.
One way to solve this is to declare functions
A and B as
resource classes themselves and have the
flavor request
the function RC. Placement will then correctly
count the
function instances. However, there is still a
problem:
if the requested function A is not available,
Placement
will return an empty list of RPs, but we need
some way
to reprogram some device to create an instance of
function A.____
Clearly, nova is not going to be reprogramming
devices with
an instance of a particular function.
Cyborg might need to have a separate agent that
listens to
the nova notifications queue and upon seeing an
event that
indicates a failed build due to lack of resources,
then
Cyborg can try and reprogram a device and then try
rebuilding the original request.____
__ __
It was my understanding from that discussion that we
intend to
insert Cyborg into the spawn workflow for device
configuration
in the same way that we currently insert resources
provided by
Cinder and Neutron. So while Nova won't be reprogramming a
device, it will be calling out to Cyborg to reprogram
a device,
and waiting while that happens.____
My understanding is (and I concede some areas are a little
hazy):____
* The flavors says device type X with function Y____
* Placement tells us everywhere with device type X____
* A weigher orders these by devices which already have an
available function Y (where is this metadata stored?)____
* Nova schedules to host Z____
* Nova host Z asks cyborg for a local function Y and
blocks____
* Cyborg hopefully returns function Y which is already
available____
* If not, Cyborg reprograms a function Y, then
returns it____
Can anybody correct me/fill in the gaps?____
*/[Mooney, Sean K] that correlates closely to my
recollection
also. As for the metadata I think the weigher may need
to call
to cyborg to retrieve this as it will not be available
in the
host state object./*
Is it the nova scheduler weigher or we want to support
weigh on
placement? Function is traits as I think, so can we have
preferred_traits? I remember we talk about that parameter
in the
past, but we don't have good use-case at that time. This
is good
use-case.
If we call the Cyborg from the nova scheduler weigher, that
will slow down the scheduling a lot also.
Right, which is why I don't want to do any weighing in Placement
at all. If folks want to sort by things that require long-running
code/callbacks or silly temporal things like metrics, they can do
that in a custom weigher in the nova-scheduler and take the
performance hit there.
Best,
-jay
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
--
Zhipeng (Howard) Huang
Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com <mailto:huangzhip...@huawei.com>
Office: Huawei Industrial Base, Longgang, Shenzhen
(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu <mailto:zhipe...@uci.edu>
Office: Calit2 Building Room 2402
OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev