Oops, I just report this issue on the Launchpad at last moment. Thanks Moshe, I'll try this commit.
2017-07-11 9:13 GMT+08:00 Moshe Levi <mosh...@mellanox.com>: > Hi Eddie, > > > > > Looking on the your nova database after the delete looks correct to me. > > | created_at | updated_at | deleted_at | deleted | > id > > | 2017-06-21 00:56:06 | 2017-07-07 02:27:16 | NULL | 0 | > 2 > > | 2017-07-07 01:42:48 | 2017-07-07 02:13:14 | 2017-07-07 02:13:42 | 9 | > 9 > > See that the second row has deleted_at timestamp and deleted with no zero > value (the id of the row). Nova is doing soft delete which is just marking > the row as deleted but not actually deleting it from nova pci_devices table. > > See [1] and [2] > > > > There is a bug with the pci_devices in a scenario when we can delete > allocated pci device e.g. if pci.passthrough_whitelist is changed commit > [3] try to resolve. > > > > > > [1] - https://github.com/openstack/oslo.db/blob/master/oslo_db/ > sqlalchemy/models.py#L142-L150 > > [2] - https://github.com/openstack/nova/blob/master/nova/db/ > sqlalchemy/models.py#L1411 > > [3-] - https://review.openstack.org/#/c/426243/ > > > > *From:* Eddie Yen [mailto:missile0...@gmail.com] > *Sent:* Tuesday, July 11, 2017 3:18 AM > *To:* Jay Pipes <jaypi...@gmail.com> > *Cc:* openstack@lists.openstack.org > *Subject:* Re: [Openstack] [nova] Database not delete PCI info after > device is removed from host and nova.conf > > > > Roger that, > > > > I may going to report this bug on the OpenStack Compute (Nova) Launchpad > to see what happen. > > Anyway, thanks for ur help, really appreciate. > > > Eddie. > > > > 2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: > > Unfortunately, Eddie, I'm not entirely sure what is going on with your > situation. According to the code, the non-existing PCI device should be > removed from the pci_devices table when the PCI manager notices the PCI > device is no longer on the local host... > > On 07/09/2017 08:36 PM, Eddie Yen wrote: > > Hi there, > > Does the information already enough or need additional items? > > Thanks, > Eddie. > > 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com <mailto: > missile0...@gmail.com>>: > > Sorry, > > Re-new the nova-compute log after remove "1002:68c8" and restart > nova-compute. > http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0> > <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0> > > > > 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com > <mailto:missile0...@gmail.com>>: > > > > Hi Jay, > > Below are few logs and information you may want to check. > > > > I wrote GPU inforamtion into nova.conf like this. > > pci_passthrough_whitelist = [{ "product_id":"0ff3", > "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}] > > pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", > "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8", > "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}] > > > Then restart the services. > > nova-compute log when insert new GPU device info into nova.conf > and restart service: > http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0> > <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0> > > > > Strange is, the log shows that resource tracker only collect > information of new setup GPU, not included the old one. > > > But If I do some actions on the instance contained old GPU, the > tracker will get both GPU. > http://paste.openstack.org/show/614658/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0> > <http://paste.openstack.org/show/614658/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0> > > > > Nova database shows correct information on both GPU > http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0> > <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0> > > > > > > Now remove ID "1002:68c8" from nova.conf and compute node, and > restart services. > > The pci_passthrough_whitelist and pci_alias only keep > "10de:0ff3" GPU info. > > pci_passthrough_whitelist = { "product_id":"0ff3", > "vendor_id":"10de" } > > pci_alias = { "product_id":"0ff3", "vendor_id":"10de", > "device_type":"type-PCI", "name":"k420" } > > > nova-compute log shows resource tracker report node only have > "10de:0ff3" PCI resource > http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0> > <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0> > > > > But in Nova database, "1002:68c8" still exist, and stayed in > "Available" status. Even "deleted" value shows not zero. > http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0> > <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0> > > > > > Many thanks, > Eddie. > > 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com > > <mailto:missile0...@gmail.com>>: > > Uh wait, > > Is that possible it still shows available if PCI device > still exist in the same address? > > Because when I remove the GPU card, I replace it to a SFP+ > network card in the same slot. > So when I type lspci the SFP+ card stay in the same address. > > But it still doesn't make any sense because these two cards > definitely not a same VID:PID. > And I set the information as VID:PID in nova.conf > > > I'll try reproduce this issue and put a log on this list. > > Thanks, > > 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com > <mailto:jaypi...@gmail.com>>: > > Hmm, very odd indeed. Any way you can save the > nova-compute logs from when you removed the GPU and > restarted the nova-compute service and paste those logs > to paste.openstack.org > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> > <http://paste.openstack.org > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> > >? > Would be useful in tracking down this buggy behaviour... > > Best, > -jay > > On 07/06/2017 08:54 PM, Eddie Yen wrote: > > Hi Jay, > > The status of the "removed" GPU still shows as > "Available" in pci_devices table. > > 2017-07-07 8:34 GMT+08:00 Jay Pipes > <jaypi...@gmail.com <mailto:jaypi...@gmail.com> > <mailto:jaypi...@gmail.com > > > <mailto:jaypi...@gmail.com>>>: > > > Hi again, Eddie :) Answer inline... > > On 07/06/2017 08:14 PM, Eddie Yen wrote: > > Hi everyone, > > I'm using OpenStack Mitaka version > (deployed from Fuel 9.2) > > In present, I installed two different model > of GPU card. > > And wrote these information into pci_alias and > pci_passthrough_whitelist in nova.conf on > Controller and Compute > (the node which installed GPU). > Then restart nova-api, nova-scheduler,and > nova-compute. > > When I check database, both of GPU info > registered in > pci_devices table. > > Now I removed one of the GPU from compute > node, and remove the > information from nova.conf, then restart > services. > > But I check database again, the information > of the removed card > still exist in pci_devices table. > > How can I do to fix this problem? > > > So, when you removed the GPU from the compute > node and restarted the > nova-compute service, it *should* have noticed > you had removed the > GPU and marked that PCI device as deleted. At > least, according to > this code in the PCI manager: > > https://github.com/openstack/ > nova/blob/master/nova/pci/manager.py#L168-L183 > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> > <https://github.com/openstack/ > nova/blob/master/nova/pci/manager.py#L168-L183 > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> > > > <https://github.com/openstack/ > nova/blob/master/nova/pci/manager.py#L168-L183 > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> > <https://github.com/openstack/ > nova/blob/master/nova/pci/manager.py#L168-L183 > <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> > >> > > Question for you: what is the value of the > status field in the > pci_devices table for the GPU that you removed? > > Best, > -jay > > p.s. If you really want to get rid of that > device, simply remove > that record from the pci_devices table. But, > again, it *should* be > removed automatically... > > _______________________________________________ > Mailing list: > http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0> > <http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0> > > > <http://lists.openstack.org/ > cgi-bin/mailman/listinfo/openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > <http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > >> > Post to : openstack@lists.openstack.org > <mailto:openstack@lists.openstack.org> > > <mailto:openstack@lists.openstack.org > <mailto:openstack@lists.openstack.org>> > Unsubscribe : > http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > <http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > > > <http://lists.openstack.org/ > cgi-bin/mailman/listinfo/openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > <http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack > <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> > >> > > > > > > >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack