Sorry, Re-new the nova-compute log after remove "1002:68c8" and restart nova-compute. http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > Hi Jay, > > Below are few logs and information you may want to check. > > > > I wrote GPU inforamtion into nova.conf like this. > > pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de" }, > { "product_id":"68c8", "vendor_id":"1002" }] > > pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type": > "type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002", > "device_type":"type-PCI", "name":"v4800" }] > > Then restart the services. > > nova-compute log when insert new GPU device info into nova.conf and > restart service: > http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ > > Strange is, the log shows that resource tracker only collect information > of new setup GPU, not included the old one. > > > But If I do some actions on the instance contained old GPU, the tracker > will get both GPU. > http://paste.openstack.org/show/614658/ > > Nova database shows correct information on both GPU > http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ > > > > Now remove ID "1002:68c8" from nova.conf and compute node, and restart > services. > > The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU info. > > pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" } > > pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type": > "type-PCI", "name":"k420" } > > nova-compute log shows resource tracker report node only have "10de:0ff3" > PCI resource > http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ > > But in Nova database, "1002:68c8" still exist, and stayed in "Available" > status. Even "deleted" value shows not zero. > http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ > > > Many thanks, > Eddie. > > 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > >> Uh wait, >> >> Is that possible it still shows available if PCI device still exist in >> the same address? >> >> Because when I remove the GPU card, I replace it to a SFP+ network card >> in the same slot. >> So when I type lspci the SFP+ card stay in the same address. >> >> But it still doesn't make any sense because these two cards definitely >> not a same VID:PID. >> And I set the information as VID:PID in nova.conf >> >> >> I'll try reproduce this issue and put a log on this list. >> >> Thanks, >> >> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: >> >>> Hmm, very odd indeed. Any way you can save the nova-compute logs from >>> when you removed the GPU and restarted the nova-compute service and paste >>> those logs to paste.openstack.org? Would be useful in tracking down >>> this buggy behaviour... >>> >>> Best, >>> -jay >>> >>> On 07/06/2017 08:54 PM, Eddie Yen wrote: >>> >>>> Hi Jay, >>>> >>>> The status of the "removed" GPU still shows as "Available" in >>>> pci_devices table. >>>> >>>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypi...@gmail.com <mailto: >>>> jaypi...@gmail.com>>: >>>> >>>> >>>> Hi again, Eddie :) Answer inline... >>>> >>>> On 07/06/2017 08:14 PM, Eddie Yen wrote: >>>> >>>> Hi everyone, >>>> >>>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2) >>>> >>>> In present, I installed two different model of GPU card. >>>> >>>> And wrote these information into pci_alias and >>>> pci_passthrough_whitelist in nova.conf on Controller and Compute >>>> (the node which installed GPU). >>>> Then restart nova-api, nova-scheduler,and nova-compute. >>>> >>>> When I check database, both of GPU info registered in >>>> pci_devices table. >>>> >>>> Now I removed one of the GPU from compute node, and remove the >>>> information from nova.conf, then restart services. >>>> >>>> But I check database again, the information of the removed card >>>> still exist in pci_devices table. >>>> >>>> How can I do to fix this problem? >>>> >>>> >>>> So, when you removed the GPU from the compute node and restarted the >>>> nova-compute service, it *should* have noticed you had removed the >>>> GPU and marked that PCI device as deleted. At least, according to >>>> this code in the PCI manager: >>>> >>>> https://github.com/openstack/nova/blob/master/nova/pci/manag >>>> er.py#L168-L183 >>>> <https://github.com/openstack/nova/blob/master/nova/pci/mana >>>> ger.py#L168-L183> >>>> >>>> Question for you: what is the value of the status field in the >>>> pci_devices table for the GPU that you removed? >>>> >>>> Best, >>>> -jay >>>> >>>> p.s. If you really want to get rid of that device, simply remove >>>> that record from the pci_devices table. But, again, it *should* be >>>> removed automatically... >>>> >>>> _______________________________________________ >>>> Mailing list: >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>>> Post to : openstack@lists.openstack.org >>>> <mailto:openstack@lists.openstack.org> >>>> Unsubscribe : >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>>> >>>> >>>> >> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack