On 13/04/2021 11:48, Michael Brown wrote:
On 13/04/2021 08:12, Paul Durrant wrote:
If the frontend subsequently disconnects and reconnects (e.g.
transitions through Closed->Initialising->Connected) then:
- Nothing recreates "hotplug-status"
- When the frontend re-enters Connected state, connect() sets up a
watch on "hotplug-status" again
- The callback hotplug_status_changed() is never triggered, and so
the backend device never transitions to Connected state.
That's not how I read it. Given that "hotplug-status" is removed by
the call to hotplug_status_changed() then the next call to connect()
should fail to register the watch and 'have_hotplug_status_watch'
should be 0. Thus backend_switch_state() should not defer the
transition to XenbusStateConnected in any subsequent interaction with
the frontend.
Thank you for the reply. I've tested and confirmed my initial
hypothesis: the call to xenbus_watch_pathfmt() succeeds even if the node
does not exist.
I confirmed this with ftrace using:
cd /sys/kernel/debug/tracing
echo function_graph > current_tracer
echo set_backend_state > set_ftrace_filter
echo xenbus_watch_pathfmt >> set_ftrace_filter
echo register_xenbus_watch >> set_ftrace_filter
echo xenbus_dev_fatal >> set_ftrace_filter
On the second time that the frontend transitions to Connected, this
produced the trace:
set_backend_state [xen_netback]() {
register_xenbus_watch();
register_xenbus_watch();
xenbus_watch_pathfmt() {
register_xenbus_watch();
}
}
which seems to confirm that the error path in xenbus_watch_path() is
*not* taken, i.e. that the call to register_xenbus_watch() succeeded
even though the node did not exist.
Other observations also seem to confirm this behaviour:
- Running "xenstore ls" in dom0 confirms that on the second frontend
transition to Connected, the frontend state is indeed Connected (4) but
the backend state remains in InitWait (2)
- Running "xenstore watch
/local/domain/0/backend/vif/<domU>/0/hotplug-status" *before* starting
the domU confirms that it is possible to create a watch on a node that
does not (yet) exist, and that the watch *is* notified when the node is
later created.
Are you seeing the watch successfully re-registered even though the
node does not exist? Perhaps there has been a change in xenstore
behaviour?
So, the TL;DR is that yes, the watch does successfully register even
though the node does not exist.
From a quick look through the xenstored source, it looks as though the
only check on the node name is the call to is_valid_nodename(), which
seems to perform a syntactic validity check only. I can't immediately
find any commit that would have changed this behaviour.
Ok, so it sound like this was probably my misunderstanding of xenstore
semantics in the first place (although I'm sure I remember watch
registration failing for non-existent nodes at some point in the past...
that may have been with a non-upstream version of oxenstored though).
Anyway... a reasonable fix would therefore be to read the node first and
only register the watch if it does exist.
Paul
Thanks,
Michael