Sorry for the delayed response. I broadly agree with previous replies.

For the concerns about the impact of Cyborg weigher on scheduling performance , there are some options (apart from filtering candidates as much as possible in Placement): * Handle hosts in bulk by extending BaseWeigher <https://github.com/openstack/nova/blob/master/nova/weights.py#L67> and overriding weigh_objects <https://github.com/openstack/nova/blob/master/nova/weights.py#L92>(), instead of handling one host at a time. * If we have to handle one host at a time for whatever reason, since the weigher is maintained by Cyborg, it could directly query Cyborg DB rather than go through Cyborg REST API. This will be not unlike other weighers.

Given these and other possible optimizations, it may be too soon to worry about the performance impact.

I am working on a spec that will capture the flow discussed in the PTG. I will try to address these aspects as well.

Thanks & Regards,
Sundar

On 3/8/2018 4:53 AM, Zhipeng Huang wrote:
@jay I'm also against a weigher in nova/placement. This should be an optional step depends on vendor implementation, not a default one.

@Alex I think we should explore the idea of preferred trait.

@Mathew: Like Sean said, Cyborg wants to support both reprogrammable FPGA and pre-programed ones. Therefore it is correct that in your description, the programming operation should be a call from Nova to Cyborg, and cyborg will complete the operation while nova waits. The only problem is that the weigher step should be an optional one.


On Wed, Mar 7, 2018 at 9:21 PM, Jay Pipes <jaypi...@gmail.com <mailto:jaypi...@gmail.com>> wrote:

    On 03/06/2018 09:36 PM, Alex Xu wrote:

        2018-03-07 10:21 GMT+08:00 Alex Xu <sou...@gmail.com
        <mailto:sou...@gmail.com> <mailto:sou...@gmail.com
        <mailto:sou...@gmail.com>>>:



            2018-03-06 22:45 GMT+08:00 Mooney, Sean K
        <sean.k.moo...@intel.com <mailto:sean.k.moo...@intel.com>
            <mailto:sean.k.moo...@intel.com
        <mailto:sean.k.moo...@intel.com>>>:

                __ __

                __ __

                *From:*Matthew Booth [mailto:mbo...@redhat.com
        <mailto:mbo...@redhat.com>
                <mailto:mbo...@redhat.com <mailto:mbo...@redhat.com>>]
                *Sent:* Saturday, March 3, 2018 4:15 PM
                *To:* OpenStack Development Mailing List (not for usage
                questions) <openstack-dev@lists.openstack.org
        <mailto:openstack-dev@lists.openstack.org>
                <mailto:openstack-dev@lists.openstack.org
        <mailto:openstack-dev@lists.openstack.org>>>
                *Subject:* Re: [openstack-dev] [Nova] [Cyborg]
        Tracking multiple
                functions____

                __ __

                On 2 March 2018 at 14:31, Jay Pipes
        <jaypi...@gmail.com <mailto:jaypi...@gmail.com>
                <mailto:jaypi...@gmail.com
        <mailto:jaypi...@gmail.com>>> wrote:____

                    On 03/02/2018 02:00 PM, Nadathur, Sundar wrote:____

                        Hello Nova team,

                              During the Cyborg discussion at Rocky
        PTG, we
                        proposed a flow for FPGAs wherein the request
        spec asks
                        for a device type as a resource class, and
        optionally a
                        function (such as encryption) in the extra
        specs. This
                        does not seem to work well for the usage model
        that I’ll
                        describe below.

                        An FPGA device may implement more than one
        function. For
                        example, it may implement both compression and
                        encryption. Say a cluster has 10 devices of
        device type
                        X, and each of them is programmed to offer 2
        instances
                        of function A and 4 instances of function B. More
                        specifically, the device may implement 6 PCI
        functions,
                        with 2 of them tied to function A, and the
        other 4 tied
                        to function B. So, we could have 6 separate
        instances
                        accessing functions on the same device.____

                __ __

                Does this imply that Cyborg can't reprogram the FPGA
        at all?____

                */[Mooney, Sean K] cyborg is intended to support fixed
        function
                acclerators also so it will not always be able to
        program the
                accelerator. In this case where an fpga is
        preprogramed with a
                multi function bitstream that is statically
        provisioned cyborge
                will not be able to reprogram the slot if any of the
        fuctions
                from that slot are already allocated to an instance.
        In this
                case it will have to treat it like a fixed function
        device and
                simply allocate a unused  vf  of the corret type if
        available.
                ____/*


                ____


                        In the current flow, the device type X is
        modeled as a
                        resource class, so Placement will count how
        many of them
                        are in use. A flavor for ‘RC device-type-X +
        function A’
                        will consume one instance of the RC
        device-type-X.  But
                        this is not right because this precludes other
        functions
                        on the same device instance from getting used.

                        One way to solve this is to declare functions
        A and B as
                        resource classes themselves and have the
        flavor request
                        the function RC. Placement will then correctly
        count the
                        function instances. However, there is still a
        problem:
                        if the requested function A is not available,
        Placement
                        will return an empty list of RPs, but we need
        some way
                        to reprogram some device to create an instance of
                        function A.____


                    Clearly, nova is not going to be reprogramming
        devices with
                    an instance of a particular function.

                    Cyborg might need to have a separate agent that
        listens to
                    the nova notifications queue and upon seeing an
        event that
                    indicates a failed build due to lack of resources,
        then
                    Cyborg can try and reprogram a device and then try
                    rebuilding the original request.____

                __ __

                It was my understanding from that discussion that we
        intend to
                insert Cyborg into the spawn workflow for device
        configuration
                in the same way that we currently insert resources
        provided by
                Cinder and Neutron. So while Nova won't be reprogramming a
                device, it will be calling out to Cyborg to reprogram
        a device,
                and waiting while that happens.____

                My understanding is (and I concede some areas are a little
                hazy):____

                * The flavors says device type X with function Y____

                * Placement tells us everywhere with device type X____

                * A weigher orders these by devices which already have an
                available function Y (where is this metadata stored?)____

                * Nova schedules to host Z____

                * Nova host Z asks cyborg for a local function Y and
        blocks____

                   * Cyborg hopefully returns function Y which is already
                available____

                   * If not, Cyborg reprograms a function Y, then
        returns it____

                Can anybody correct me/fill in the gaps?____

                */[Mooney, Sean K] that correlates closely to my
        recollection
                also. As for the metadata I think the weigher may need
        to call
                to cyborg to retrieve this as it will not be available
        in the
                host state object./*

            Is it the nova scheduler weigher or we want to support
        weigh on
            placement? Function is traits as I think, so can we have
            preferred_traits? I remember we talk about that parameter
        in the
            past, but we don't have good use-case at that time. This
        is good
            use-case.


        If we call the Cyborg from the nova scheduler weigher, that
        will slow down the scheduling a lot also.


    Right, which is why I don't want to do any weighing in Placement
    at all. If folks want to sort by things that require long-running
    code/callbacks or silly temporal things like metrics, they can do
    that in a custom weigher in the nova-scheduler and take the
    performance hit there.

    Best,
    -jay


    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




--
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com <mailto:huangzhip...@huawei.com>
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu <mailto:zhipe...@uci.edu>
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to