Roger that, I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see what happen.
Anyway, thanks for ur help, really appreciate. Eddie. 2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: > Unfortunately, Eddie, I'm not entirely sure what is going on with your > situation. According to the code, the non-existing PCI device should be > removed from the pci_devices table when the PCI manager notices the PCI > device is no longer on the local host... > > On 07/09/2017 08:36 PM, Eddie Yen wrote: > >> Hi there, >> >> Does the information already enough or need additional items? >> >> Thanks, >> Eddie. >> >> 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com <mailto: >> missile0...@gmail.com>>: >> >> Sorry, >> >> Re-new the nova-compute log after remove "1002:68c8" and restart >> nova-compute. >> http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ >> <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/> >> >> 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com >> <mailto:missile0...@gmail.com>>: >> >> >> Hi Jay, >> >> Below are few logs and information you may want to check. >> >> >> >> I wrote GPU inforamtion into nova.conf like this. >> >> pci_passthrough_whitelist = [{ "product_id":"0ff3", >> "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}] >> >> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", >> "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8", >> "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}] >> >> >> Then restart the services. >> >> nova-compute log when insert new GPU device info into nova.conf >> and restart service: >> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ >> <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/> >> >> Strange is, the log shows that resource tracker only collect >> information of new setup GPU, not included the old one. >> >> >> But If I do some actions on the instance contained old GPU, the >> tracker will get both GPU. >> http://paste.openstack.org/show/614658/ >> <http://paste.openstack.org/show/614658/> >> >> Nova database shows correct information on both GPU >> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ >> <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/> >> >> >> >> Now remove ID "1002:68c8" from nova.conf and compute node, and >> restart services. >> >> The pci_passthrough_whitelist and pci_alias only keep >> "10de:0ff3" GPU info. >> >> pci_passthrough_whitelist = { "product_id":"0ff3", >> "vendor_id":"10de" } >> >> pci_alias = { "product_id":"0ff3", "vendor_id":"10de", >> "device_type":"type-PCI", "name":"k420" } >> >> >> nova-compute log shows resource tracker report node only have >> "10de:0ff3" PCI resource >> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ >> <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/> >> >> But in Nova database, "1002:68c8" still exist, and stayed in >> "Available" status. Even "deleted" value shows not zero. >> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ >> <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/> >> >> >> Many thanks, >> Eddie. >> >> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com >> <mailto:missile0...@gmail.com>>: >> >> Uh wait, >> >> Is that possible it still shows available if PCI device >> still exist in the same address? >> >> Because when I remove the GPU card, I replace it to a SFP+ >> network card in the same slot. >> So when I type lspci the SFP+ card stay in the same address. >> >> But it still doesn't make any sense because these two cards >> definitely not a same VID:PID. >> And I set the information as VID:PID in nova.conf >> >> >> I'll try reproduce this issue and put a log on this list. >> >> Thanks, >> >> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com >> <mailto:jaypi...@gmail.com>>: >> >> Hmm, very odd indeed. Any way you can save the >> nova-compute logs from when you removed the GPU and >> restarted the nova-compute service and paste those logs >> to paste.openstack.org <http://paste.openstack.org>? >> Would be useful in tracking down this buggy behaviour... >> >> Best, >> -jay >> >> On 07/06/2017 08:54 PM, Eddie Yen wrote: >> >> Hi Jay, >> >> The status of the "removed" GPU still shows as >> "Available" in pci_devices table. >> >> 2017-07-07 8:34 GMT+08:00 Jay Pipes >> <jaypi...@gmail.com <mailto:jaypi...@gmail.com> >> <mailto:jaypi...@gmail.com >> >> <mailto:jaypi...@gmail.com>>>: >> >> >> Hi again, Eddie :) Answer inline... >> >> On 07/06/2017 08:14 PM, Eddie Yen wrote: >> >> Hi everyone, >> >> I'm using OpenStack Mitaka version >> (deployed from Fuel 9.2) >> >> In present, I installed two different model >> of GPU card. >> >> And wrote these information into pci_alias >> and >> pci_passthrough_whitelist in nova.conf on >> Controller and Compute >> (the node which installed GPU). >> Then restart nova-api, nova-scheduler,and >> nova-compute. >> >> When I check database, both of GPU info >> registered in >> pci_devices table. >> >> Now I removed one of the GPU from compute >> node, and remove the >> information from nova.conf, then restart >> services. >> >> But I check database again, the information >> of the removed card >> still exist in pci_devices table. >> >> How can I do to fix this problem? >> >> >> So, when you removed the GPU from the compute >> node and restarted the >> nova-compute service, it *should* have noticed >> you had removed the >> GPU and marked that PCI device as deleted. At >> least, according to >> this code in the PCI manager: >> >> https://github.com/openstack/n >> ova/blob/master/nova/pci/manager.py#L168-L183 >> <https://github.com/openstack/ >> nova/blob/master/nova/pci/manager.py#L168-L183> >> < >> https://github.com/openstack/nova/blob/master/nova/pci/mana >> ger.py#L168-L183 >> <https://github.com/openstack/ >> nova/blob/master/nova/pci/manager.py#L168-L183>> >> >> Question for you: what is the value of the >> status field in the >> pci_devices table for the GPU that you removed? >> >> Best, >> -jay >> >> p.s. If you really want to get rid of that >> device, simply remove >> that record from the pci_devices table. But, >> again, it *should* be >> removed automatically... >> >> _______________________________________________ >> Mailing list: >> http://lists.openstack.org/cgi >> -bin/mailman/listinfo/openstack >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack> >> < >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack>> >> Post to : openstack@lists.openstack.org >> <mailto:openstack@lists.openstack.org> >> <mailto:openstack@lists.openstack.org >> <mailto:openstack@lists.openstack.org>> >> Unsubscribe : >> http://lists.openstack.org/cgi >> -bin/mailman/listinfo/openstack >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack> >> < >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack>> >> >> >> >> >> >> >>
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack