On Thu, Feb 01, 2018 at 08:03:53PM +0200, Marcel Apfelbaum wrote: > On 01/02/2018 15:53, Eduardo Habkost wrote: > > On Thu, Feb 01, 2018 at 02:29:25PM +0200, Marcel Apfelbaum wrote: > >> On 01/02/2018 14:10, Eduardo Habkost wrote: > >>> On Thu, Feb 01, 2018 at 07:36:50AM +0200, Marcel Apfelbaum wrote: > >>>> On 01/02/2018 4:22, Michael S. Tsirkin wrote: > >>>>> On Wed, Jan 31, 2018 at 09:34:22PM -0200, Eduardo Habkost wrote: > >>> [...] > >>>>>> BTW, what's the root cause for requiring HVAs in the buffer? > >>>>> > >>>>> It's a side effect of the kernel/userspace API which always wants > >>>>> a single HVA/len pair to map memory for the application. > >>>>> > >>>>> > >>>> > >>>> Hi Eduardo and Michael, > >>>> > >>>>>> Can > >>>>>> this be fixed? > >>>>> > >>>>> I think yes. It'd need to be a kernel patch for the RDMA subsystem > >>>>> mapping an s/g list with actual memory. The HVA/len pair would then just > >>>>> be used to refer to the region, without creating the two mappings. > >>>>> > >>>>> Something like splitting the register mr into > >>>>> > >>>>> mr = create mr (va/len) - allocate a handle and record the va/len > >>>>> > >>>>> addmemory(mr, offset, hva, len) - pin memory > >>>>> > >>>>> register mr - pass it to HW > >>>>> > >>>>> As a nice side effect we won't burn so much virtual address space. > >>>>> > >>>> > >>>> We would still need a contiguous virtual address space range (for > >>>> post-send) > >>>> which we don't have since guest contiguous virtual address space > >>>> will always end up as non-contiguous host virtual address space. > >>>> > >>>> I am not sure the RDMA HW can handle a large VA with holes. > >>> > >>> I'm confused. Why would the hardware see and care about virtual > >>> addresses? > >> > >> The post-send operations bypasses the kernel, and the process > >> puts in the work request GVA addresses. > >> > >>> How exactly does the hardware translates VAs to > >>> PAs? > >> > >> The HW maintains a page-directory like structure different form MMU > >> VA -> phys pages > >> > >>> What if the process page tables change? > >>> > >> > >> Since the page tables the HW uses are their own, we just need the phys > >> page to be pinned. > > > > So there's no hardware-imposed requirement that the hardware VAs > > (mapped by the HW page directory) match the VAs in QEMU > > address-space, right? > > Actually there is. Today it works exactly as you described.
Are you sure there's such hardware-imposed requirement? Why would the hardware require VAs to match the ones in the userspace address-space, if it doesn't use the CPU MMU at all? -- Eduardo