Unfortunately, Eddie, I'm not entirely sure what is going on with your
situation. According to the code, the non-existing PCI device should be
removed from the pci_devices table when the PCI manager notices the PCI
device is no longer on the local host...
On 07/09/2017 08:36 PM, Eddie Yen wrote:
Hi there,
Does the information already enough or need additional items?
Thanks,
Eddie.
2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com
<mailto:missile0...@gmail.com>>:
Sorry,
Re-new the nova-compute log after remove "1002:68c8" and restart
nova-compute.
http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
<http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/>
2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com
<mailto:missile0...@gmail.com>>:
Hi Jay,
Below are few logs and information you may want to check.
I wrote GPU inforamtion into nova.conf like this.
pci_passthrough_whitelist = [{ "product_id":"0ff3",
"vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]
pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
"device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
"vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]
Then restart the services.
nova-compute log when insert new GPU device info into nova.conf
and restart service:
http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
<http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/>
Strange is, the log shows that resource tracker only collect
information of new setup GPU, not included the old one.
But If I do some actions on the instance contained old GPU, the
tracker will get both GPU.
http://paste.openstack.org/show/614658/
<http://paste.openstack.org/show/614658/>
Nova database shows correct information on both GPU
http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
<http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/>
Now remove ID "1002:68c8" from nova.conf and compute node, and
restart services.
The pci_passthrough_whitelist and pci_alias only keep
"10de:0ff3" GPU info.
pci_passthrough_whitelist = { "product_id":"0ff3",
"vendor_id":"10de" }
pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
"device_type":"type-PCI", "name":"k420" }
nova-compute log shows resource tracker report node only have
"10de:0ff3" PCI resource
http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
<http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/>
But in Nova database, "1002:68c8" still exist, and stayed in
"Available" status. Even "deleted" value shows not zero.
http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
<http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/>
Many thanks,
Eddie.
2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com
<mailto:missile0...@gmail.com>>:
Uh wait,
Is that possible it still shows available if PCI device
still exist in the same address?
Because when I remove the GPU card, I replace it to a SFP+
network card in the same slot.
So when I type lspci the SFP+ card stay in the same address.
But it still doesn't make any sense because these two cards
definitely not a same VID:PID.
And I set the information as VID:PID in nova.conf
I'll try reproduce this issue and put a log on this list.
Thanks,
2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com
<mailto:jaypi...@gmail.com>>:
Hmm, very odd indeed. Any way you can save the
nova-compute logs from when you removed the GPU and
restarted the nova-compute service and paste those logs
to paste.openstack.org <http://paste.openstack.org>?
Would be useful in tracking down this buggy behaviour...
Best,
-jay
On 07/06/2017 08:54 PM, Eddie Yen wrote:
Hi Jay,
The status of the "removed" GPU still shows as
"Available" in pci_devices table.
2017-07-07 8:34 GMT+08:00 Jay Pipes
<jaypi...@gmail.com <mailto:jaypi...@gmail.com>
<mailto:jaypi...@gmail.com
<mailto:jaypi...@gmail.com>>>:
Hi again, Eddie :) Answer inline...
On 07/06/2017 08:14 PM, Eddie Yen wrote:
Hi everyone,
I'm using OpenStack Mitaka version
(deployed from Fuel 9.2)
In present, I installed two different model
of GPU card.
And wrote these information into pci_alias and
pci_passthrough_whitelist in nova.conf on
Controller and Compute
(the node which installed GPU).
Then restart nova-api, nova-scheduler,and
nova-compute.
When I check database, both of GPU info
registered in
pci_devices table.
Now I removed one of the GPU from compute
node, and remove the
information from nova.conf, then restart
services.
But I check database again, the information
of the removed card
still exist in pci_devices table.
How can I do to fix this problem?
So, when you removed the GPU from the compute
node and restarted the
nova-compute service, it *should* have noticed
you had removed the
GPU and marked that PCI device as deleted. At
least, according to
this code in the PCI manager:
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>>
Question for you: what is the value of the
status field in the
pci_devices table for the GPU that you removed?
Best,
-jay
p.s. If you really want to get rid of that
device, simply remove
that record from the pci_devices table. But,
again, it *should* be
removed automatically...
_______________________________________________
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>
Post to : openstack@lists.openstack.org
<mailto:openstack@lists.openstack.org>
<mailto:openstack@lists.openstack.org
<mailto:openstack@lists.openstack.org>>
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack