On 10/04/2021 19:25, Michael Brown wrote:
Commit https://github.com/torvalds/linux/commit/1f25657 ("xen-netback:
remove 'hotplug-status' once it has served its purpose") seems to have
introduced a regression that prevents a vif frontend from transitioning
more than once into Connected state.
As far as I can tell:
- The defined vif script (e.g. /etc/xen/scripts/vif-bridge) executes
only once, at domU startup, and sets
backend/vif/<domU>/0/hotplug-status="connected"
- When the frontend first enters Connected state,
drivers/net/xen-netback/xenbus.c's connect() sets up a watch on
"hotplug-status" with the callback function hotplug_status_changed()
- When hotplug_status_changed() is triggered by the watch, it
transitions the backend to Connected state and calls xenbus_rm() to
delete the "hotplug-status" attribute.
If the frontend subsequently disconnects and reconnects (e.g.
transitions through Closed->Initialising->Connected) then:
- Nothing recreates "hotplug-status"
- When the frontend re-enters Connected state, connect() sets up a watch
on "hotplug-status" again
- The callback hotplug_status_changed() is never triggered, and so the
backend device never transitions to Connected state.
That's not how I read it. Given that "hotplug-status" is removed by the
call to hotplug_status_changed() then the next call to connect() should
fail to register the watch and 'have_hotplug_status_watch' should be 0.
Thus backend_switch_state() should not defer the transition to
XenbusStateConnected in any subsequent interaction with the frontend.
Reverting the commit would fix this bug, but would obviously also
reintroduce the race condition that the commit was designed to avoid.
I'm happy to put together a patch, if one of the maintainers could
suggest a sensible design approach.
Are you seeing the watch successfully re-registered even though the node
does not exist? Perhaps there has been a change in xenstore behaviour?
Paul
I'm not a list member, so please CC me directly on replies.
Thanks,
Michael