Em sex., 9 de abr. de 2021 01:36, Alexey Kardashevskiy <a...@ozlabs.ru> escreveu:
> > > On 08/04/2021 19:04, Michael Ellerman wrote: > > Alexey Kardashevskiy <a...@ozlabs.ru> writes: > >> On 08/04/2021 15:37, Michael Ellerman wrote: > >>> Leonardo Bras <leobra...@gmail.com> writes: > >>>> According to LoPAR, ibm,query-pe-dma-window output named "IO Page > Sizes" > >>>> will let the OS know all possible pagesizes that can be used for > creating a > >>>> new DDW. > >>>> > >>>> Currently Linux will only try using 3 of the 8 available options: > >>>> 4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, > 64M, > >>>> 128M, 256M and 16G. > >>> > >>> Do we know of any hardware & hypervisor combination that will actually > >>> give us bigger pages? > >> > >> > >> On P8 16MB host pages and 16MB hardware iommu pages worked. > >> > >> On P9, VM's 16MB IOMMU pages worked on top of 2MB host pages + 2MB > >> hardware IOMMU pages. > > > > The current code already tries 16MB though. > > > > I'm wondering if we're going to ask for larger sizes that have never > > been tested and possibly expose bugs. But it sounds like this is mainly > > targeted at future platforms. > > > I tried for fun to pass through a PCI device to a guest with this patch as: > > pbuild/qemu-killslof-aiku1904le-ppc64/qemu-system-ppc64 \ > -nodefaults \ > -chardev stdio,id=STDIO0,signal=off,mux=on \ > -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \ > -mon id=MON0,chardev=STDIO0,mode=readline \ > -nographic \ > -vga none \ > -enable-kvm \ > -m 16G \ > -kernel ./vmldbg \ > -initrd /home/aik/t/le.cpio \ > -device vfio-pci,id=vfio0001_01_00_0,host=0001:01:00.0 \ > -mem-prealloc \ > -mem-path qemu_hp_1G_node0 \ > -global spapr-pci-host-bridge.pgsz=0xffffff000 \ > -machine cap-cfpc=broken,cap-ccf-assist=off \ > -smp 1,threads=1 \ > -L /home/aik/t/qemu-ppc64-bios/ \ > -trace events=qemu_trace_events \ > -d guest_errors,mmu \ > -chardev socket,id=SOCKET0,server=on,wait=off,path=qemu.mon.1_1_0_0 \ > -mon chardev=SOCKET0,mode=control > > > The guest created a huge window: > > xhci_hcd 0000:00:00.0: ibm,create-pe-dma-window(2027) 0 8000000 20000000 > 22 22 returned 0 (liobn = 0x80000001 starting addr = 8000000 0) > > The first "22" is page_shift in hex (16GB), the second "22" is > window_shift (so we have 1 TCE). > > On the host side the window#1 was created with 1GB pages: > pci 0001:01 : [PE# fd] Setting up window#1 > 800000000000000..80007ffffffffff pg=40000000 > > > The XHCI seems working. Without the patch 16MB was the maximum. > > > > > >>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c > b/arch/powerpc/platforms/pseries/iommu.c > >>>> index 9fc5217f0c8e..6cda1c92597d 100644 > >>>> --- a/arch/powerpc/platforms/pseries/iommu.c > >>>> +++ b/arch/powerpc/platforms/pseries/iommu.c > >>>> @@ -53,6 +53,20 @@ enum { > >>>> DDW_EXT_QUERY_OUT_SIZE = 2 > >>>> }; > >>> > >>> A comment saying where the values come from would be good. > >>> > >>>> +#define QUERY_DDW_PGSIZE_4K 0x01 > >>>> +#define QUERY_DDW_PGSIZE_64K 0x02 > >>>> +#define QUERY_DDW_PGSIZE_16M 0x04 > >>>> +#define QUERY_DDW_PGSIZE_32M 0x08 > >>>> +#define QUERY_DDW_PGSIZE_64M 0x10 > >>>> +#define QUERY_DDW_PGSIZE_128M 0x20 > >>>> +#define QUERY_DDW_PGSIZE_256M 0x40 > >>>> +#define QUERY_DDW_PGSIZE_16G 0x80 > >>> > >>> I'm not sure the #defines really gain us much vs just putting the > >>> literal values in the array below? > >> > >> Then someone says "uuuuu magic values" :) I do not mind either way. > Thanks, > > > > Yeah that's true. But #defining them doesn't make them less magic, if > > you only use them in one place :) > > Defining them with "QUERY_DDW" in the names kinda tells where they are > from. Can also grep QEMU using these to see how the other side handles > it. Dunno. > > btw the bot complained about __builtin_ctz(SZ_16G) which should be > __builtin_ctzl(SZ_16G) so we have to ask Leonardo to repost anyway :) > Thanks for testing! http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210408201915.174217-1-leobra...@gmail.com/ I sent a v3 a few hours ago, fixing this by using __builtin_ctzll() instead of __builtin_ctz() in all sizes, and it worked like a charm. I also reverted to the previous approach of not having QUERY_DDW defines for masks, as Michael suggested. I can revert back to v2 approach if you guys decide it's better. Best regards, Leonardo Bras