On Jan 23 13:40, Hannes Reinecke wrote: > On 1/23/24 11:59, Damien Hedde wrote: > > Hi all, > > > > We are currently looking into hotplugging nvme devices and it is currently > > not possible: > > When nvme was introduced 2 years ago, the feature was disabled. > > > commit cc6fb6bc506e6c47ed604fcb7b7413dff0b7d845 > > > Author: Klaus Jensen > > > Date: Tue Jul 6 10:48:40 2021 +0200 > > > > > > hw/nvme: mark nvme-subsys non-hotpluggable > > > We currently lack the infrastructure to handle subsystem hotplugging, > > > so > > > disable it. > > > > Do someone know what's lacking or anyone have some tips/idea of what we > > should develop to add the support ? > > > Problem is that the object model is messed up. In qemu namespaces are > attached to controllers, which in turn are children of the PCI device. > There are subsystems, but these just reference the controller. > > So if you hotunplug the PCI device you detach/destroy the controller and > detach the namespaces from the controller. > But if you hotplug the PCI device again the NVMe controller will be attached > to the PCI device, but the namespace are still be detached. > > Klaus said he was going to fix that, and I dimly remember some patches > floating around. But apparently it never went anywhere. > > Fundamental problem is that the NVMe hierarchy as per spec is incompatible > with the qemu object model; qemu requires a strict > tree model where every object has exactly _one_ parent. >
A little history might help to nuance this just a bit. And to defend the current model ;) When we added support for multiple namespaces we did not consider subsystem support, so the namespaces would just be associated directly with a parent controller (in QDev terms, the parent has a bus that the namespace devices are attached to). When we added subsystems, where namespaces may be attached to several controllers, it became necessary to break the controller/namespace parent/child relationship. The problem was that removing the controller would take all the bus children with it, causing namespaces to be removed from other controllers in the subsystem. We fixed this by reparenting the namespaces to the subsystem device instead. I think this model fits the NVMe hierarchy as good as possible. Controllers and namespaces are considered children of the subsystem (as they are in NVMe). Now, the problem with namespaces not being re-attached is partly false. If the namespaces are 'shared=on', they will be automatically attached to any new controller attached to the subsystem. However, if they are private, that is is not the case. In NVMe, a private namespace just means a namespace that can only be attached to a single controller at a time. It is not entirely unlikely that you have a private namespace that you then reassign to controller B when controller A is removed. Now, what we could do is track the last controller identifier that a private namespace was attached to, and if the same controller identifier is added to the subsystem, we could reattach the private namespace. However, broadly, I think the current model does a pretty good job in supporting experimentation with hotplug, multipath and failover configurations.
signature.asc
Description: PGP signature