On Mon, Oct 28, 2019 at 9:35 PM Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> wrote: > > On 10/29/19 4:38 AM, Dan Williams wrote: > > On Mon, Oct 28, 2019 at 2:48 AM Aneesh Kumar K.V > > <aneesh.ku...@linux.ibm.com> wrote: > >> > >> The page size used to map the namespace is arch dependent. For example > >> architectures like ppc64 use 16MB page size for direct-mapping. If the > >> namespace > >> size is not aligned to the mapping page size, we can observe kernel crash > >> during namespace init and destroy. > >> > >> This is due to kernel doing partial map/unmap of the resource range > >> > >> BUG: Unable to handle kernel data access at 0xc001000406000000 > >> Faulting instruction address: 0xc000000000090790 > >> NIP [c000000000090790] arch_add_memory+0xc0/0x130 > >> LR [c000000000090744] arch_add_memory+0x74/0x130 > >> Call Trace: > >> arch_add_memory+0x74/0x130 (unreliable) > >> memremap_pages+0x74c/0xa30 > >> devm_memremap_pages+0x3c/0xa0 > >> pmem_attach_disk+0x188/0x770 > >> nvdimm_bus_probe+0xd8/0x470 > >> really_probe+0x148/0x570 > >> driver_probe_device+0x19c/0x1d0 > >> device_driver_attach+0xcc/0x100 > >> bind_store+0x134/0x1c0 > >> drv_attr_store+0x44/0x60 > >> sysfs_kf_write+0x74/0xc0 > >> kernfs_fop_write+0x1b4/0x290 > >> __vfs_write+0x3c/0x70 > >> vfs_write+0xd0/0x260 > >> ksys_write+0xdc/0x130 > >> system_call+0x5c/0x68 > >> > >> Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com> > >> --- > >> arch/arm64/mm/flush.c | 11 +++++++++++ > >> arch/powerpc/lib/pmem.c | 21 +++++++++++++++++++-- > >> arch/x86/mm/pageattr.c | 12 ++++++++++++ > >> include/linux/libnvdimm.h | 1 + > >> 4 files changed, 43 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c > >> index ac485163a4a7..90c54c600023 100644 > >> --- a/arch/arm64/mm/flush.c > >> +++ b/arch/arm64/mm/flush.c > >> @@ -91,4 +91,15 @@ void arch_invalidate_pmem(void *addr, size_t size) > >> __inval_dcache_area(addr, size); > >> } > >> EXPORT_SYMBOL_GPL(arch_invalidate_pmem); > >> + > >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, > >> unsigned long size) > >> +{ > >> + u32 remainder; > >> + > >> + div_u64_rem(size, PAGE_SIZE * ndr_mappings, &remainder); > >> + if (remainder) > >> + return PAGE_SIZE * ndr_mappings; > >> + return 0; > >> +} > >> +EXPORT_SYMBOL_GPL(arch_validate_namespace_size); > >> #endif > >> diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c > >> index 377712e85605..2e661a08dae5 100644 > >> --- a/arch/powerpc/lib/pmem.c > >> +++ b/arch/powerpc/lib/pmem.c > >> @@ -17,14 +17,31 @@ void arch_wb_cache_pmem(void *addr, size_t size) > >> unsigned long start = (unsigned long) addr; > >> flush_dcache_range(start, start + size); > >> } > >> -EXPORT_SYMBOL(arch_wb_cache_pmem); > >> +EXPORT_SYMBOL_GPL(arch_wb_cache_pmem); > >> > >> void arch_invalidate_pmem(void *addr, size_t size) > >> { > >> unsigned long start = (unsigned long) addr; > >> flush_dcache_range(start, start + size); > >> } > >> -EXPORT_SYMBOL(arch_invalidate_pmem); > >> +EXPORT_SYMBOL_GPL(arch_invalidate_pmem); > >> + > >> +unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, > >> unsigned long size) > >> +{ > >> + u32 remainder; > >> + unsigned long linear_map_size; > >> + > >> + if (radix_enabled()) > >> + linear_map_size = PAGE_SIZE; > >> + else > >> + linear_map_size = (1UL << > >> mmu_psize_defs[mmu_linear_psize].shift); > > > > This seems more a "supported_alignments" problem, and less a namespace > > size or PAGE_SIZE problem, because if the starting address is > > misaligned this size validation can still succeed when it shouldn't. > > > > > Isn't supported_alignments an indication of how user want the namespace > to be mapped to applications? Ie, with the above restrictions we can > still do both 64K and 16M mapping of the namespace to userspace.
True, for the pfn device and the device-dax mapping size, but I'm suggesting adding another instance of alignment control at the raw namespace level. That would need to be disconnected from the device-dax page mapping granularity. > > Also for supported alignment the huge page mapping is further dependent > on the THP feature. > > The restrictions here are mostly w.r.t the direct-mapping page size used > by some architecture. Right, that's a base requirement for the namespace that can be independent of the user page mapping size for device-dax.