On Mon, Nov 19, 2018 at 04:08:33PM +1100, Alexey Kardashevskiy wrote: > > > On 19/11/2018 13:42, David Gibson wrote: > > On Tue, Nov 13, 2018 at 07:31:02PM +1100, Alexey Kardashevskiy wrote: > >> When deciding about the huge DMA window, the typical Linux pseries guest > >> uses the maximum allowed RAM size as the upper limit. We did the same > >> on QEMU side to match that logic. Now we are going to support GPU RAM > >> pass through which is not available at the guest boot time as it requires > >> the guest driver interaction. As the result, the guest requests a smaller > >> window than it should. Therefore the guest needs to be patched to > >> understand this new memory and so does QEMU. > >> > >> Instead of reimplementing here whatever solution we will choose for > >> the guest, this advertises the biggest possible window size limited by > >> 32 bit (as defined by LoPAPR). > >> > >> This seems to be safe as: > >> 1. The guest visible emulated table is allocated in KVM (actual pages > >> are allocated in page fault handler) and QEMU (actual pages are allocated > >> when changed); > >> 2. The hardware table (and corresponding userspace addresses cache) > >> supports sparse allocation and also checks for locked_vm limit so > >> it is unable to cause the host any damage. > >> > >> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> > > > > This seems like a good idea to me, even without the NPU stuff. It > > always bothered me slightly that we based what's effectively the IOVA > > limit on the guest RAM size which doesn't have any direct connection > > to it. > > > > As long as it doesn't hit the locked memory limits, I don't see any > > reason we should prevent a guest from doing something weird like > > mapping a small bit of RAM over and over in IOVA space, or mapping its > > RAM sparsely in IOVA space. > > > >> --- > >> hw/ppc/spapr_rtas_ddw.c | 19 +++---------------- > >> 1 file changed, 3 insertions(+), 16 deletions(-) > >> > >> diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c > >> index 329feb1..df60351 100644 > >> --- a/hw/ppc/spapr_rtas_ddw.c > >> +++ b/hw/ppc/spapr_rtas_ddw.c > >> @@ -96,9 +96,8 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu, > >> uint32_t nret, target_ulong rets) > >> { > >> sPAPRPHBState *sphb; > >> - uint64_t buid, max_window_size; > >> + uint64_t buid; > >> uint32_t avail, addr, pgmask = 0; > >> - MachineState *machine = MACHINE(spapr); > >> > >> if ((nargs != 3) || (nret != 5)) { > >> goto param_error_exit; > >> @@ -114,27 +113,15 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU > >> *cpu, > >> /* Translate page mask to LoPAPR format */ > >> pgmask = spapr_page_mask_to_query_mask(sphb->page_size_mask); > >> > >> - /* > >> - * This is "Largest contiguous block of TCEs allocated specifically > >> - * for (that is, are reserved for) this PE". > >> - * Return the maximum number as maximum supported RAM size was in 4K > >> pages. > >> - */ > >> - if (machine->ram_size == machine->maxram_size) { > >> - max_window_size = machine->ram_size; > >> - } else { > >> - max_window_size = machine->device_memory->base + > >> - memory_region_size(&machine->device_memory->mr); > >> - } > >> - > >> avail = SPAPR_PCI_DMA_MAX_WINDOWS - > >> spapr_phb_get_active_win_num(sphb); > >> > >> rtas_st(rets, 0, RTAS_OUT_SUCCESS); > >> rtas_st(rets, 1, avail); > >> - rtas_st(rets, 2, max_window_size >> SPAPR_TCE_PAGE_SHIFT); > >> + rtas_st(rets, 2, 0xFFFFFFFF); /* Largest contiguous block of TCEs */ > > > > One detail though.. where does this limit actually come from? Is it > > actually a limit imposed by the hardware somewhere, or just because > > the RTAS call doesn't ahve room for anything more? > > I used this limit just because of the parameter size. > > > > > Previously limits would usually have been a power of 2; is it likely > > to matter that now it won't be? > > LoPAPR says it is "Largest contiguous block of TCEs" and no mention of > power of two but ibm,create-pe-dma-window takes a window shift so it is > assumed that windows can still be only power of two. Not sure. The > powernv code will fail if the requested window is not power of two. > > Replacing 0xffffffff with 0x80000000 won't make a difference though here > but probably will make the error clearer...
Yeah, given that the windows have to be a power of 2 in size, I think it makes sense for this to be a power of 2, even if it doesn't strictly have to be. So I think 0x80000000 would be a better option. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature