"Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: > On 7/8/20 10:14 AM, Michael Ellerman wrote: >> "Aneesh Kumar K.V" <aneesh.ku...@linux.ibm.com> writes: >>> To enable memory unplug without splitting kernel page table >>> mapping, we force the max mapping size to the LMB size. LMB >>> size is the unit in which hypervisor will do memory add/remove >>> operation. >>> >>> This implies on pseries system, we now end up mapping >> >> Please expand on why it "implies" that for pseries. >> >>> memory with 2M page size instead of 1G. To improve >>> that we want hypervisor to hint the kernel about the hotplug >>> memory range. This was added that as part of >> That >>> >>> commit b6eca183e23e ("powerpc/kernel: Enables memory >>> hot-remove after reboot on pseries guests") >>> >>> But we still don't do that on PowerVM. Once we get PowerVM >> >> I think you mean PowerVM doesn't provide that hint yet? >> >> Realistically it won't until P10. So this means we'll always use 2MB on >> Power9 PowerVM doesn't it? >> >> What about KVM? >> >> Have you done any benchmarking on the impact of switching the linear >> mapping to 2MB pages? >> > > The TLB impact should be minimal because with a 256M LMB size partition > scoped entries are still 2M and hence we end up with TLBs of 2M size. > > >>> updated, we can then force the 2M mapping only to hot-pluggable >>> memory region using memblock_is_hotpluggable(). Till then >>> let's depend on LMB size for finding the mapping page size >>> for linear range. >>> > > updated > > > powerpc/mm/radix: Create separate mappings for hot-plugged memory > > To enable memory unplug without splitting kernel page table > mapping, we force the max mapping size to the LMB size. LMB > size is the unit in which hypervisor will do memory add/remove > operation. > > Pseries systems supports max LMB size of 256MB. Hence on pseries, > we now end up mapping memory with 2M page size instead of 1G. To improve > that we want hypervisor to hint the kernel about the hotplug > memory range. That was added that as part of > > commit b6eca18 ("powerpc/kernel: Enables memory > hot-remove after reboot on pseries guests") > > But PowerVM doesn't provide that hint yet. Once we get PowerVM > updated, we can then force the 2M mapping only to hot-pluggable > memory region using memblock_is_hotpluggable(). Till then > let's depend on LMB size for finding the mapping page size > for linear range. > > With this change KVM guest will also be doing linear mapping with > 2M page size.
... >>> @@ -494,17 +544,27 @@ void __init radix__early_init_devtree(void) >>> * Try to find the available page sizes in the device-tree >>> */ >>> rc = of_scan_flat_dt(radix_dt_scan_page_sizes, NULL); >>> - if (rc != 0) /* Found */ >>> - goto found; >>> + if (rc == 0) { >>> + /* >>> + * no page size details found in device tree >>> + * let's assume we have page 4k and 64k support >> >> Capitals and punctuation please? >> >>> + */ >>> + mmu_psize_defs[MMU_PAGE_4K].shift = 12; >>> + mmu_psize_defs[MMU_PAGE_4K].ap = 0x0; >>> + >>> + mmu_psize_defs[MMU_PAGE_64K].shift = 16; >>> + mmu_psize_defs[MMU_PAGE_64K].ap = 0x5; >>> + } >> >> Moving that seems like an unrelated change. It's a reasonable change but >> I'd rather you did it in a standalone patch. >> > > we needed that change so that we can call radix_memory_block_size() for > both found and !found case. But the found and !found cases converge at found:, which is where you call it. So I don't understand. But as I said below, it would be even simpler if you worked out the memory block size first. cheers >>> /* >>> - * let's assume we have page 4k and 64k support >>> + * Max mapping size used when mapping pages. We don't use >>> + * ppc_md.memory_block_size() here because this get called >>> + * early and we don't have machine probe called yet. Also >>> + * the pseries implementation only check for ibm,lmb-size. >>> + * All hypervisor supporting radix do expose that device >>> + * tree node. >>> */ >>> - mmu_psize_defs[MMU_PAGE_4K].shift = 12; >>> - mmu_psize_defs[MMU_PAGE_4K].ap = 0x0; >>> - >>> - mmu_psize_defs[MMU_PAGE_64K].shift = 16; >>> - mmu_psize_defs[MMU_PAGE_64K].ap = 0x5; >>> -found: >>> + radix_mem_block_size = radix_memory_block_size(); >> >> If you did that earlier in the function, before >> radix_dt_scan_page_sizes(), the logic would be simpler. >> >>> return; >>> }