Unfortunately, Eddie, I'm not entirely sure what is going on with your situation. According to the code, the non-existing PCI device should be removed from the pci_devices table when the PCI manager notices the PCI device is no longer on the local host...

On 07/09/2017 08:36 PM, Eddie Yen wrote:
Hi there,

Does the information already enough or need additional items?

Thanks,
Eddie.

2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com <mailto:missile0...@gmail.com>>:

    Sorry,

    Re-new the nova-compute log after remove "1002:68c8" and restart
    nova-compute.
    http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
    <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/>

    2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com
    <mailto:missile0...@gmail.com>>:

        Hi Jay,

        Below are few logs and information you may want to check.



        I wrote GPU inforamtion into nova.conf like this.

        pci_passthrough_whitelist = [{ "product_id":"0ff3",
        "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]

        pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
        "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]


        Then restart the services.

        nova-compute log when insert new GPU device info into nova.conf
        and restart service:
        http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
        <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/>

        Strange is, the log shows that resource tracker only collect
        information of new setup GPU, not included the old one.


        But If I do some actions on the instance contained old GPU, the
        tracker will get both GPU.
        http://paste.openstack.org/show/614658/
        <http://paste.openstack.org/show/614658/>

        Nova database shows correct information on both GPU
        http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
        <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/>



        Now remove ID "1002:68c8" from nova.conf and compute node, and
        restart services.

        The pci_passthrough_whitelist and pci_alias only keep
        "10de:0ff3" GPU info.

        pci_passthrough_whitelist = { "product_id":"0ff3",
        "vendor_id":"10de" }

        pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420" }


        nova-compute log shows resource tracker report node only have
        "10de:0ff3" PCI resource
        http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
        <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/>

        But in Nova database, "1002:68c8" still exist, and stayed in
        "Available" status. Even "deleted" value shows not zero.
        http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
        <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/>


        Many thanks,
        Eddie.

        2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com
        <mailto:missile0...@gmail.com>>:

            Uh wait,

            Is that possible it still shows available if PCI device
            still exist in the same address?

            Because when I remove the GPU card, I replace it to a SFP+
            network card in the same slot.
            So when I type lspci the SFP+ card stay in the same address.

            But it still doesn't make any sense because these two cards
            definitely not a same VID:PID.
            And I set the information as VID:PID in nova.conf


            I'll try reproduce this issue and put a log on this list.

            Thanks,

            2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com
            <mailto:jaypi...@gmail.com>>:

                Hmm, very odd indeed. Any way you can save the
                nova-compute logs from when you removed the GPU and
                restarted the nova-compute service and paste those logs
                to paste.openstack.org <http://paste.openstack.org>?
                Would be useful in tracking down this buggy behaviour...

                Best,
                -jay

                On 07/06/2017 08:54 PM, Eddie Yen wrote:

                    Hi Jay,

                    The status of the "removed" GPU still shows as
                    "Available" in pci_devices table.

                    2017-07-07 8:34 GMT+08:00 Jay Pipes
                    <jaypi...@gmail.com <mailto:jaypi...@gmail.com>
                    <mailto:jaypi...@gmail.com
                    <mailto:jaypi...@gmail.com>>>:


                         Hi again, Eddie :) Answer inline...

                         On 07/06/2017 08:14 PM, Eddie Yen wrote:

                             Hi everyone,

                             I'm using OpenStack Mitaka version
                    (deployed from Fuel 9.2)

                             In present, I installed two different model
                    of GPU card.

                             And wrote these information into pci_alias and
                             pci_passthrough_whitelist in nova.conf on
                    Controller and Compute
                             (the node which installed GPU).
                             Then restart nova-api, nova-scheduler,and
                    nova-compute.

                             When I check database, both of GPU info
                    registered in
                             pci_devices table.

                             Now I removed one of the GPU from compute
                    node, and remove the
                             information from nova.conf, then restart
                    services.

                             But I check database again, the information
                    of the removed card
                             still exist in pci_devices table.

                             How can I do to fix this problem?


                         So, when you removed the GPU from the compute
                    node and restarted the
                         nova-compute service, it *should* have noticed
                    you had removed the
                         GPU and marked that PCI device as deleted. At
                    least, according to
                         this code in the PCI manager:

                    
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
                    
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
                    
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>>

                         Question for you: what is the value of the
                    status field in the
                         pci_devices table for the GPU that you removed?

                         Best,
                         -jay

                         p.s. If you really want to get rid of that
                    device, simply remove
                         that record from the pci_devices table. But,
                    again, it *should* be
                         removed automatically...

                         _______________________________________________
                         Mailing list:
                    
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>
                         Post to     : openstack@lists.openstack.org
                    <mailto:openstack@lists.openstack.org>
                         <mailto:openstack@lists.openstack.org
                    <mailto:openstack@lists.openstack.org>>
                         Unsubscribe :
                    
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
                    
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>







_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to