Alex Williamson <alex.william...@redhat.com> writes: > On Thu, 2010-09-23 at 12:43 -0500, Anthony Liguori wrote: >> On 09/22/2010 02:52 PM, Alex Williamson wrote: >> > During a hotplug, the netdev might be removed before the
unplug? >> > connected virtio device. When this happens, the guest might >> > be running cleanup operations that can trigger a segfault in >> > qemu. Avoid one set of these by checking whether the peer >> > device is present before trying to do tap operations. >> > >> > Signed-off-by: Alex Williamson<alex.william...@redhat.com> >> > >> >> Can you explain this scenario a little better? >> >> If nc.peer is NULL when set_features is called, it would seem to me like >> we're in a pretty critical state. I agree that we shouldn't set fault, >> but I wonder if the real bug is that this can happen at all. > > Unfortunately that critical state happens all the time since device_del > does an asynchronous ACPI call into the guest and libvirt isn't blocked > waiting for that to complete and doesn't poll to see if the device goes > away. So it's actually pretty common today that the netdev disappears > before the device. We talked about this in the community call on > Tuesday, and I think Michael is trying to think of a way to solve this, > perhaps by separating the guest releasing the device from the device > removal. > > In the mean time, virtio-net has this hole that seems like it can be > avoided by simply checking some pointers on a slow path. Since the > netdev has already disappeared, attempting to set features on it seems > pointless. The change in the load function is really just a paranoia > check since it followed the same model of calling tap_*() funcs w/o > checking the value of nc.peer. Thanks, I figure we should either make netdev_del fail when the netdev is in use, or make its users cope graciously with the netdev going away (make it look like somebody yanked the cable).