Hi,
On 30/09/2020 14:39, Oleksandr wrote:
Hi Julien
On 25.09.20 11:19, Paul Durrant wrote:
-----Original Message-----
From: Julien Grall <jul...@xen.org>
Sent: 24 September 2020 19:01
To: Oleksandr Tyshchenko <olekst...@gmail.com>;
xen-devel@lists.xenproject.org
Cc: Oleksandr Tyshchenko <oleksandr_tyshche...@epam.com>; Andrew
Cooper <andrew.coop...@citrix.com>;
George Dunlap <george.dun...@citrix.com>; Ian Jackson
<ian.jack...@eu.citrix.com>; Jan Beulich
<jbeul...@suse.com>; Stefano Stabellini <sstabell...@kernel.org>; Wei
Liu <w...@xen.org>; Roger Pau
Monné <roger....@citrix.com>; Paul Durrant <p...@xen.org>; Jun
Nakajima <jun.nakaj...@intel.com>;
Kevin Tian <kevin.t...@intel.com>; Tim Deegan <t...@xen.org>; Julien
Grall <julien.gr...@arm.com>
Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
+static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+{
+ unsigned int prev_state = STATE_IOREQ_NONE;
+ unsigned int state = p->state;
+ uint64_t data = ~0;
+
+ smp_rmb();
+
+ /*
+ * The only reason we should see this condition be false is
when an
+ * emulator dying races with I/O being requested.
+ */
+ while ( likely(state != STATE_IOREQ_NONE) )
+ {
+ if ( unlikely(state < prev_state) )
+ {
+ gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition
%u -> %u\n",
+ prev_state, state);
+ sv->pending = false;
+ domain_crash(sv->vcpu->domain);
+ return false; /* bail */
+ }
+
+ switch ( prev_state = state )
+ {
+ case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+ p->state = STATE_IOREQ_NONE;
+ data = p->data;
+ break;
+
+ case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} ->
IORESP_READY */
+ case STATE_IOREQ_INPROCESS:
+ wait_on_xen_event_channel(sv->ioreq_evtchn,
+ ({ state = p->state;
+ smp_rmb();
+ state != prev_state; }));
+ continue;
As I pointed out previously [1], this helper was implemented with the
expectation that wait_on_xen_event_channel() will not return if the vCPU
got rescheduled.
However, this assumption doesn't hold on Arm.
I can see two solution:
1) Re-execute the caller
2) Prevent an IOREQ to disappear until the loop finish.
@Paul any opinions?
The ioreq control plane is largely predicated on there being no
pending I/O when the state of a server is modified, and it is assumed
that domain_pause() is sufficient to achieve this. If that assumption
doesn't hold then we need additional synchronization.
I don't think this assumption even hold on x86 because domain_pause()
will not wait for I/O to finish.
On x86, the context switch will reset the stack and therefore
wait_on_xen_event_channel() is not going to return. Instead,
handle_hvm_io_completion() will be called from the tail callback in
context_switch(). get_pending_vcpu() would return NULL as the IOREQ
server disappeared. Although, it is not clear whether the vCPU will
continue to run (or not).
Did I miss anything?
Regarding the fix itself, I am not sure what sort of synchronization we
can do. Are you suggesting to wait for the I/O to complete? If so, how
do we handle the case the IOREQ server died?
May I please clarify whether a concern still stands (with what was said
above) and we need an additional synchronization on Arm?
Yes the concern is still there (See above).
Cheers,
--
Julien Grall