On Sat, 21 Apr 2018 09:02:14 +0000 "Wuzongyong (Euler Dept)" <cordius...@huawei.com> wrote:
> > > > > Hi, > > > > > > > > > > The qemu process will stuck when hot-add large size memory to > > > > > the virtual machine with a device passtrhough. > > > > > We found it is too slow to pin and map pages in vfio_dma_do_map. > > > > > Is there any method to improve this process? > > > > > > > > At what size do you start to see problems? The time to map a > > > > section of memory should be directly proportional to the size. As > > > > the size is increased, it will take longer, but I don't know why > > > > you'd reach a point of not making forward progress. Is it actually > > > > stuck or is it just taking longer than you want? Using hugepages > > > > can certainly help, we still need to pin each PAGE_SIZE page within > > > > the hugepage, but we'll have larger contiguous regions and therefore > > > > call iommu_map() less frequently. Please share more data. Thanks, > > > > > > > > Alex > > > It just take longer time, instead of actually stuck. > > > We found that the problem exist when we hot-added 16G memory. And it > > > will consume tens of minutes when we hot-added 1T memory. > > > > Is the stall adding 1TB roughly 64 times the stall adding 16GB or do we > > have some inflection in the size vs time curve? There is a cost to > > pinning an mapping through the IOMMU, perhaps we can improve that, but I > > don't see how we can eliminate it or how it wouldn't be at least linear > > compared to the size of memory added without moving to a page request > > model, which hardly any hardware currently supports. A workaround might > > be to incrementally add memory in smaller chunks which generate a less > > noticeable stall. Thanks, > > > > Alex > I collected a part of report as below recorded by perf when I hot-added 24GB > memory: > + 63.41% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] > 0xffffffffffc7534a > + 63.41% 0.00% qemu-kvm [kernel.vmlinux] [k] > do_vfs_ioctl > + 63.41% 0.00% qemu-kvm [kernel.vmlinux] [k] > sys_ioctl > + 63.41% 0.00% qemu-kvm libc-2.17.so [.] > __GI___ioctl > + 63.41% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] > 0xffffffffffc71c59 > + 63.10% 0.00% qemu-kvm [vfio] [k] > vfio_fops_unl_ioctl > + 63.10% 0.00% qemu-kvm qemu-kvm-2.8.1-25.127 [.] > 0xffffffffffcbbb6a > + 63.10% 0.02% qemu-kvm [vfio_iommu_type1] [k] > vfio_iommu_type1_ioctl > + 60.67% 0.31% qemu-kvm [vfio_iommu_type1] [k] > vfio_pin_pages_remote > + 60.06% 0.46% qemu-kvm [vfio_iommu_type1] [k] > vaddr_get_pfn > + 59.61% 0.95% qemu-kvm [kernel.vmlinux] [k] > get_user_pages_fast > + 54.28% 0.02% qemu-kvm [kernel.vmlinux] [k] > get_user_pages_unlocked > + 54.24% 0.04% qemu-kvm [kernel.vmlinux] [k] > __get_user_pages > + 54.13% 0.01% qemu-kvm [kernel.vmlinux] [k] > handle_mm_fault > + 54.08% 0.03% qemu-kvm [kernel.vmlinux] [k] > do_huge_pmd_anonymous_page > + 52.09% 52.09% qemu-kvm [kernel.vmlinux] [k] > clear_page > + 9.42% 0.12% swapper [kernel.vmlinux] [k] > cpu_startup_entry > + 9.20% 0.00% swapper [kernel.vmlinux] [k] > start_secondary > + 8.85% 0.02% swapper [kernel.vmlinux] [k] > arch_cpu_idle > + 8.79% 0.07% swapper [kernel.vmlinux] [k] > cpuidle_idle_call > + 6.16% 0.29% swapper [kernel.vmlinux] [k] > apic_timer_interrupt > + 5.73% 0.07% swapper [kernel.vmlinux] [k] > smp_apic_timer_interrupt > + 4.34% 0.99% qemu-kvm [kernel.vmlinux] [k] > gup_pud_range > + 3.56% 0.16% swapper [kernel.vmlinux] [k] > local_apic_timer_interrupt > + 3.32% 0.41% swapper [kernel.vmlinux] [k] > hrtimer_interrupt > + 3.25% 3.21% qemu-kvm [kernel.vmlinux] [k] > gup_huge_pmd > + 2.31% 0.01% qemu-kvm [kernel.vmlinux] [k] > iommu_map > + 2.30% 0.00% qemu-kvm [kernel.vmlinux] [k] > intel_iommu_map > > It seems that the bottleneck is trying to pin pages through get_user_pages > instead of do iommu mapping. Sure, the IOMMU mapping is more lightweight than the page pinning, but both are required. We're pinning the pages for the purpose of IOMMU mapping them. It also seems the bulk of the time is spent clearing pages, which is necessary so as not to leak data from the kernel or other users to this process. Perhaps there are ways to take further advantage of hugepages in the pinning process, but as far as I'm aware we still need to pin at the PAGE_SIZE page rather than the hugepage. Thanks, Alex _______________________________________________ vfio-users mailing list vfio-users@redhat.com https://www.redhat.com/mailman/listinfo/vfio-users