Hi there, Does the information already enough or need additional items?
Thanks, Eddie. 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > Sorry, > > Re-new the nova-compute log after remove "1002:68c8" and restart > nova-compute. > http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ > > 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > >> Hi Jay, >> >> Below are few logs and information you may want to check. >> >> >> >> I wrote GPU inforamtion into nova.conf like this. >> >> pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de" >> }, { "product_id":"68c8", "vendor_id":"1002" }] >> >> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type": >> "type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002", >> "device_type":"type-PCI", "name":"v4800" }] >> >> Then restart the services. >> >> nova-compute log when insert new GPU device info into nova.conf and >> restart service: >> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ >> >> Strange is, the log shows that resource tracker only collect information >> of new setup GPU, not included the old one. >> >> >> But If I do some actions on the instance contained old GPU, the tracker >> will get both GPU. >> http://paste.openstack.org/show/614658/ >> >> Nova database shows correct information on both GPU >> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ >> >> >> >> Now remove ID "1002:68c8" from nova.conf and compute node, and restart >> services. >> >> The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU >> info. >> >> pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" } >> >> pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type":" >> type-PCI", "name":"k420" } >> >> nova-compute log shows resource tracker report node only have "10de:0ff3" >> PCI resource >> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ >> >> But in Nova database, "1002:68c8" still exist, and stayed in "Available" >> status. Even "deleted" value shows not zero. >> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ >> >> >> Many thanks, >> Eddie. >> >> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com>: >> >>> Uh wait, >>> >>> Is that possible it still shows available if PCI device still exist in >>> the same address? >>> >>> Because when I remove the GPU card, I replace it to a SFP+ network card >>> in the same slot. >>> So when I type lspci the SFP+ card stay in the same address. >>> >>> But it still doesn't make any sense because these two cards definitely >>> not a same VID:PID. >>> And I set the information as VID:PID in nova.conf >>> >>> >>> I'll try reproduce this issue and put a log on this list. >>> >>> Thanks, >>> >>> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: >>> >>>> Hmm, very odd indeed. Any way you can save the nova-compute logs from >>>> when you removed the GPU and restarted the nova-compute service and paste >>>> those logs to paste.openstack.org? Would be useful in tracking down >>>> this buggy behaviour... >>>> >>>> Best, >>>> -jay >>>> >>>> On 07/06/2017 08:54 PM, Eddie Yen wrote: >>>> >>>>> Hi Jay, >>>>> >>>>> The status of the "removed" GPU still shows as "Available" in >>>>> pci_devices table. >>>>> >>>>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypi...@gmail.com <mailto: >>>>> jaypi...@gmail.com>>: >>>>> >>>>> >>>>> Hi again, Eddie :) Answer inline... >>>>> >>>>> On 07/06/2017 08:14 PM, Eddie Yen wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2) >>>>> >>>>> In present, I installed two different model of GPU card. >>>>> >>>>> And wrote these information into pci_alias and >>>>> pci_passthrough_whitelist in nova.conf on Controller and >>>>> Compute >>>>> (the node which installed GPU). >>>>> Then restart nova-api, nova-scheduler,and nova-compute. >>>>> >>>>> When I check database, both of GPU info registered in >>>>> pci_devices table. >>>>> >>>>> Now I removed one of the GPU from compute node, and remove the >>>>> information from nova.conf, then restart services. >>>>> >>>>> But I check database again, the information of the removed card >>>>> still exist in pci_devices table. >>>>> >>>>> How can I do to fix this problem? >>>>> >>>>> >>>>> So, when you removed the GPU from the compute node and restarted >>>>> the >>>>> nova-compute service, it *should* have noticed you had removed the >>>>> GPU and marked that PCI device as deleted. At least, according to >>>>> this code in the PCI manager: >>>>> >>>>> https://github.com/openstack/nova/blob/master/nova/pci/manag >>>>> er.py#L168-L183 >>>>> <https://github.com/openstack/nova/blob/master/nova/pci/mana >>>>> ger.py#L168-L183> >>>>> >>>>> Question for you: what is the value of the status field in the >>>>> pci_devices table for the GPU that you removed? >>>>> >>>>> Best, >>>>> -jay >>>>> >>>>> p.s. If you really want to get rid of that device, simply remove >>>>> that record from the pci_devices table. But, again, it *should* be >>>>> removed automatically... >>>>> >>>>> _______________________________________________ >>>>> Mailing list: >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>>>> Post to : openstack@lists.openstack.org >>>>> <mailto:openstack@lists.openstack.org> >>>>> Unsubscribe : >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack> >>>>> >>>>> >>>>> >>> >> >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack