Hi, After I tried a patch, it seems like I need to modify more python codes on my OpenStack version. Not only patched from that site, but also nova-conductor and oslo_versionedobjects. Because it will going to check version and seems like pci_device.py version only support up to 1.5 on my platform.
As this result, I think I can't take a risk to use this patch, only I can do is waiting for update. Nevertheless, it's still a good experience to know deeper about the operation mechanism on Nova. Appreciate, Eddie. 2017-07-11 9:22 GMT+08:00 Eddie Yen <missile0...@gmail.com>: > Oops, I just report this issue on the Launchpad at last moment. > > Thanks Moshe, I'll try this commit. > > 2017-07-11 9:13 GMT+08:00 Moshe Levi <mosh...@mellanox.com>: > >> Hi Eddie, >> >> >> >> >> Looking on the your nova database after the delete looks correct to me. >> >> | created_at | updated_at | deleted_at | deleted >> | id >> >> | 2017-06-21 00:56:06 | 2017-07-07 02:27:16 | NULL | 0 >> | 2 >> >> | 2017-07-07 01:42:48 | 2017-07-07 02:13:14 | 2017-07-07 02:13:42 | 9 >> | 9 >> >> See that the second row has deleted_at timestamp and deleted with no >> zero value (the id of the row). Nova is doing soft delete which is just >> marking the row as deleted but not actually deleting it from nova >> pci_devices table. >> >> See [1] and [2] >> >> >> >> There is a bug with the pci_devices in a scenario when we can delete >> allocated pci device e.g. if pci.passthrough_whitelist is changed >> commit [3] try to resolve. >> >> >> >> >> >> [1] - https://github.com/openstack/oslo.db/blob/master/oslo_db/sql >> alchemy/models.py#L142-L150 >> >> [2] - https://github.com/openstack/nova/blob/master/nova/db/sqlalc >> hemy/models.py#L1411 >> >> [3-] - https://review.openstack.org/#/c/426243/ >> >> >> >> *From:* Eddie Yen [mailto:missile0...@gmail.com] >> *Sent:* Tuesday, July 11, 2017 3:18 AM >> *To:* Jay Pipes <jaypi...@gmail.com> >> *Cc:* openstack@lists.openstack.org >> *Subject:* Re: [Openstack] [nova] Database not delete PCI info after >> device is removed from host and nova.conf >> >> >> >> Roger that, >> >> >> >> I may going to report this bug on the OpenStack Compute (Nova) Launchpad >> to see what happen. >> >> Anyway, thanks for ur help, really appreciate. >> >> >> Eddie. >> >> >> >> 2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypi...@gmail.com>: >> >> Unfortunately, Eddie, I'm not entirely sure what is going on with your >> situation. According to the code, the non-existing PCI device should be >> removed from the pci_devices table when the PCI manager notices the PCI >> device is no longer on the local host... >> >> On 07/09/2017 08:36 PM, Eddie Yen wrote: >> >> Hi there, >> >> Does the information already enough or need additional items? >> >> Thanks, >> Eddie. >> >> 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0...@gmail.com <mailto: >> missile0...@gmail.com>>: >> >> Sorry, >> >> Re-new the nova-compute log after remove "1002:68c8" and restart >> nova-compute. >> http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0> >> <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0> >> > >> >> 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0...@gmail.com >> <mailto:missile0...@gmail.com>>: >> >> >> >> Hi Jay, >> >> Below are few logs and information you may want to check. >> >> >> >> I wrote GPU inforamtion into nova.conf like this. >> >> pci_passthrough_whitelist = [{ "product_id":"0ff3", >> "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}] >> >> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", >> "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8", >> "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}] >> >> >> Then restart the services. >> >> nova-compute log when insert new GPU device info into nova.conf >> and restart service: >> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0> >> <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0> >> > >> >> Strange is, the log shows that resource tracker only collect >> information of new setup GPU, not included the old one. >> >> >> But If I do some actions on the instance contained old GPU, the >> tracker will get both GPU. >> http://paste.openstack.org/show/614658/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0> >> <http://paste.openstack.org/show/614658/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0> >> > >> >> Nova database shows correct information on both GPU >> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0> >> <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0> >> > >> >> >> >> Now remove ID "1002:68c8" from nova.conf and compute node, and >> restart services. >> >> The pci_passthrough_whitelist and pci_alias only keep >> "10de:0ff3" GPU info. >> >> pci_passthrough_whitelist = { "product_id":"0ff3", >> "vendor_id":"10de" } >> >> pci_alias = { "product_id":"0ff3", "vendor_id":"10de", >> "device_type":"type-PCI", "name":"k420" } >> >> >> nova-compute log shows resource tracker report node only have >> "10de:0ff3" PCI resource >> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0> >> <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0> >> > >> >> But in Nova database, "1002:68c8" still exist, and stayed in >> "Available" status. Even "deleted" value shows not zero. >> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0> >> <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/ >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0> >> > >> >> >> Many thanks, >> Eddie. >> >> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0...@gmail.com >> >> <mailto:missile0...@gmail.com>>: >> >> Uh wait, >> >> Is that possible it still shows available if PCI device >> still exist in the same address? >> >> Because when I remove the GPU card, I replace it to a SFP+ >> network card in the same slot. >> So when I type lspci the SFP+ card stay in the same address. >> >> But it still doesn't make any sense because these two cards >> definitely not a same VID:PID. >> And I set the information as VID:PID in nova.conf >> >> >> I'll try reproduce this issue and put a log on this list. >> >> Thanks, >> >> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypi...@gmail.com >> <mailto:jaypi...@gmail.com>>: >> >> Hmm, very odd indeed. Any way you can save the >> nova-compute logs from when you removed the GPU and >> restarted the nova-compute service and paste those logs >> to paste.openstack.org >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> >> <http://paste.openstack.org >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> >> >? >> Would be useful in tracking down this buggy behaviour... >> >> Best, >> -jay >> >> On 07/06/2017 08:54 PM, Eddie Yen wrote: >> >> Hi Jay, >> >> The status of the "removed" GPU still shows as >> "Available" in pci_devices table. >> >> 2017-07-07 8:34 GMT+08:00 Jay Pipes >> <jaypi...@gmail.com <mailto:jaypi...@gmail.com> >> <mailto:jaypi...@gmail.com >> >> >> <mailto:jaypi...@gmail.com>>>: >> >> >> Hi again, Eddie :) Answer inline... >> >> On 07/06/2017 08:14 PM, Eddie Yen wrote: >> >> Hi everyone, >> >> I'm using OpenStack Mitaka version >> (deployed from Fuel 9.2) >> >> In present, I installed two different model >> of GPU card. >> >> And wrote these information into pci_alias >> and >> pci_passthrough_whitelist in nova.conf on >> Controller and Compute >> (the node which installed GPU). >> Then restart nova-api, nova-scheduler,and >> nova-compute. >> >> When I check database, both of GPU info >> registered in >> pci_devices table. >> >> Now I removed one of the GPU from compute >> node, and remove the >> information from nova.conf, then restart >> services. >> >> But I check database again, the information >> of the removed card >> still exist in pci_devices table. >> >> How can I do to fix this problem? >> >> >> So, when you removed the GPU from the compute >> node and restarted the >> nova-compute service, it *should* have noticed >> you had removed the >> GPU and marked that PCI device as deleted. At >> least, according to >> this code in the PCI manager: >> >> https://github.com/openstack/n >> ova/blob/master/nova/pci/manager.py#L168-L183 >> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> >> <https://github.com/openstack/ >> nova/blob/master/nova/pci/manager.py#L168-L183 >> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> >> > >> < >> https://github.com/openstack/nova/blob/master/nova/pci/mana >> ger.py#L168-L183 >> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> >> <https://github.com/openstack/ >> nova/blob/master/nova/pci/manager.py#L168-L183 >> <https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0> >> >> >> >> Question for you: what is the value of the >> status field in the >> pci_devices table for the GPU that you removed? >> >> Best, >> -jay >> >> p.s. If you really want to get rid of that >> device, simply remove >> that record from the pci_devices table. But, >> again, it *should* be >> removed automatically... >> >> _______________________________________________ >> Mailing list: >> http://lists.openstack.org/cgi >> -bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0> >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0> >> > >> < >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> >> >> Post to : openstack@lists.openstack.org >> <mailto:openstack@lists.openstack.org> >> >> <mailto:openstack@lists.openstack.org >> <mailto:openstack@lists.openstack.org>> >> Unsubscribe : >> http://lists.openstack.org/cgi >> -bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> > >> < >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> <http://lists.openstack.org/cg >> i-bin/mailman/listinfo/openstack >> <https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0> >> >> >> >> >> >> >> >> >> > >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack