Hi Volodymyr,
On 30/11/2020 20:51, Volodymyr Babchuk wrote:
Oleksandr Tyshchenko writes:
From: Oleksandr Tyshchenko <oleksandr_tyshche...@epam.com>
This patch adds proper handling of return value of
vcpu_ioreq_handle_completion() which involves using a loop
in leave_hypervisor_to_guest().
The reason to use an unbounded loop here is the fact that vCPU
shouldn't continue until an I/O has completed. In Xen case, if an I/O
never completes then it most likely means that something went horribly
wrong with the Device Emulator. And it is most likely not safe to
continue. So letting the vCPU to spin forever if I/O never completes
is a safer action than letting it continue and leaving the guest in
unclear state and is the best what we can do for now.
This wouldn't be an issue for Xen as do_softirq() would be called at
every loop. In case of failure, the guest will crash and the vCPU
will be unscheduled.
Why you don't block vcpu there and unblock it when response is ready?
The vCPU will already block while waiting for the event channel. See the
call wait_for_event_channel() in the ioreq code. However, you can still
receive spurious unblock (e.g. the event channel notificaiton is
received before the I/O is unhandled). So you have to loop around and
check if there are more work to do.
If
I got it right, "client" vcpu will spin in the loop, eating own
scheduling budget with no useful work done.
You can't really do much about that if you have a rogue/buggy device model.
In the worst case, it will
prevent "server" vcpu from running.
How so? Xen will raise the schedule softirq if the I/O is handled. You
would have a pretty buggy system if your "client" vCPU is considered to
be a much higher priority than your "server" vCPU.
Cheers,
--
Julien Grall