On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang <yu.c.zh...@linux.intel.com> wrote: > > > On 9/21/2016 9:04 PM, George Dunlap wrote: >> >> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zh...@linux.intel.com> >> wrote: >>>> >>>> On 9/2/2016 6:47 PM, Yu Zhang wrote: >>>>> >>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to >>>>> let one ioreq server claim/disclaim its responsibility for the >>>>> handling of guest pages with p2m type p2m_ioreq_server. Users >>>>> of this HVMOP can specify which kind of operation is supposed >>>>> to be emulated in a parameter named flags. Currently, this HVMOP >>>>> only support the emulation of write operations. And it can be >>>>> further extended to support the emulation of read ones if an >>>>> ioreq server has such requirement in the future. >>>>> >>>>> For now, we only support one ioreq server for this p2m type, so >>>>> once an ioreq server has claimed its ownership, subsequent calls >>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also >>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by >>>>> triggering this new HVMOP, with ioreq server id set to the current >>>>> owner's and flags parameter set to 0. >>>>> >>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server >>>>> are only supported for HVMs with HAP enabled. >>>>> >>>>> Also note that only after one ioreq server claims its ownership >>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server >>>>> be allowed. >>>>> >>>>> Signed-off-by: Paul Durrant <paul.durr...@citrix.com> >>>>> Signed-off-by: Yu Zhang <yu.c.zh...@linux.intel.com> >>>>> Acked-by: Tim Deegan <t...@xen.org> >>>>> --- >>>>> Cc: Paul Durrant <paul.durr...@citrix.com> >>>>> Cc: Jan Beulich <jbeul...@suse.com> >>>>> Cc: Andrew Cooper <andrew.coop...@citrix.com> >>>>> Cc: George Dunlap <george.dun...@eu.citrix.com> >>>>> Cc: Jun Nakajima <jun.nakaj...@intel.com> >>>>> Cc: Kevin Tian <kevin.t...@intel.com> >>>>> Cc: Tim Deegan <t...@xen.org> >>>>> >>>>> changes in v6: >>>>> - Clarify logic in hvmemul_do_io(). >>>>> - Use recursive lock for ioreq server lock. >>>>> - Remove debug print when mapping ioreq server. >>>>> - Clarify code in ept_p2m_type_to_flags() for consistency. >>>>> - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS. >>>>> - Add comments for HVMMEM_ioreq_server to note only changes >>>>> to/from HVMMEM_ram_rw are permitted. >>>>> - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server() >>>>> to avoid the race condition when a vm exit happens on a write- >>>>> protected page, just to find the ioreq server has been unmapped >>>>> already. >>>>> - Introduce a seperate patch to delay the release of p2m >>>>> lock to avoid the race condition. >>>>> - Introduce a seperate patch to handle the read-modify-write >>>>> operations on a write protected page. >>>>> >>>> Why do we need to do this? Won't the default case just DTRT if it finds >>>> that the ioreq server has been unmapped? >>> >>> >>> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as >>> "recal" or >>> reset to p2m_ram_rw directly. So my understanding is that we do not wish >>> to >>> see a ept violation due to a p2m_ioreq_server access after the ioreq >>> server >>> is unmapped. >>> Yet without this domain_pause/unpause() pair, VM accesses may trigger an >>> ept >>> violation >>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to >>> find >>> the ioreq >>> server is NULL. Then we would have to provide handlers which just do the >>> copy to/from >>> actions for the VM. This seems awkward to me. >> >> So the race you're worried about is this: >> >> 1. Guest fault happens >> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking >> 3. guest finds no ioreq server present >> >> I think in that case the easiest thing to do would be to simply assume >> there was a race and re-execute the instruction. Is that not possible >> for some reason? >> >> -George > > > Thanks for your reply, George. :) > Two reasons I'd like to use the domain_pause/unpause() to avoid the race > condition: > > 1> Like my previous explanation, in the read-modify-write scenario, the > ioreq server will > be NULL for the read emulation. But in such case, hypervisor will not > discard this trap, instead > it is supposed to do the copy work for the read access. So it would be > difficult for hypervisor > to decide if the ioreq server was detached due to a race condition, or if > the ioreq server should > be a NULL because we are emulating a read operation first for a > read-modify-write instruction.
Wouldn't a patch like the attached work (applied on top of the whole series)? -George
From 4e4b14641cd94c0c9fe64606b329cdbbf9c6a92b Mon Sep 17 00:00:00 2001 From: George Dunlap <george.dun...@citrix.com> Date: Thu, 22 Sep 2016 12:30:26 +0100 Subject: [PATCH] x86/hvm: Handle both ioreq detach races and read-modify-write instructions Compile-tested only. Signed-off-by: George Dunlap <george.dun...@citrix.com> --- xen/arch/x86/hvm/emulate.c | 68 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 56 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 564c117..120ef86 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -214,40 +214,84 @@ static int hvmemul_do_io( break; case X86EMUL_UNHANDLEABLE: { + /* + * Xen isn't emulating the instruction internally, so see if + * there's an ioreq server that can handle it. Rules: + * + * - PIO and "normal" mmio run through + * hvm_select_ioreq_server() to choose the ioreq server by + * range. If no server is found, the access is ignored. + * + * - p2m_ioreq_server accesses are handled by the current + * ioreq_server for the domain, but there are some corner + * cases: + * + * - If the domain ioreq_server is NULL, assume this is a + * race between unlooking the ioreq server and + * p2m_type_change and re-try the instruction. + * + * - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it + * like a normal PIO or MMIO that doesn't have an ioreq + * server (i.e., by ignoring it). + * + * - If the accesss is a read, this must be part of a + * read-modify-write instruction; emulate the read so that we have + * it + */ + struct hvm_ioreq_server *s = NULL; p2m_type_t p2mt = p2m_invalid; - + if ( is_mmio ) { unsigned long gmfn = paddr_to_pfn(addr); (void) get_gfn_query_unlocked(currd, gmfn, &p2mt); - if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE ) + if ( p2mt == p2m_ioreq_server ) { unsigned int flags; s = p2m_get_ioreq_server(currd, &flags); + + /* + * If p2mt is ioreq_server but ioreq_server is NULL, + * we probably lost a race with p2m_type_change; just + * retry the access. + */ + if ( s == NULL ) + { + rc = X86EMUL_RETRY; + vio->io_req.state = STATE_IOREQ_NONE; + break; + } + + /* + * This is part of a read-modify-write instruction. + * Emulate the read part so we have the value cached. + */ + if ( dir == IOREQ_READ ) + { + rc = hvm_process_io_intercept(&mem_handler, &p); + vio->io_req.state = STATE_IOREQ_NONE; + break; + } + + if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) ) + { s = NULL; + } } } if ( !s && p2mt != p2m_ioreq_server ) s = hvm_select_ioreq_server(currd, &p); - /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) { - /* - * For p2m_ioreq_server pages accessed with read-modify-write - * instructions, we provide a read handler to copy the data to - * the buffer. - */ - if ( p2mt == p2m_ioreq_server ) - rc = hvm_process_io_intercept(&mem_handler, &p); - else - rc = hvm_process_io_intercept(&null_handler, &p); + /* If there is no suitable backing DM, just ignore accesses */ + rc = hvm_process_io_intercept(&null_handler, &p); vio->io_req.state = STATE_IOREQ_NONE; } else -- 2.1.4
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel