On 10/29/19 11:00 AM, Dan Williams wrote:
On Mon, Oct 28, 2019 at 9:35 PM Aneesh Kumar K.V
<aneesh.ku...@linux.ibm.com> wrote:

On 10/29/19 4:38 AM, Dan Williams wrote:
On Mon, Oct 28, 2019 at 2:48 AM Aneesh Kumar K.V
<aneesh.ku...@linux.ibm.com> wrote:

The page size used to map the namespace is arch dependent. For example
architectures like ppc64 use 16MB page size for direct-mapping. If the namespace
size is not aligned to the mapping page size, we can observe kernel crash
during namespace init and destroy.

This is due to kernel doing partial map/unmap of the resource range

BUG: Unable to handle kernel data access at 0xc001000406000000
Faulting instruction address: 0xc000000000090790
NIP [c000000000090790] arch_add_memory+0xc0/0x130
LR [c000000000090744] arch_add_memory+0x74/0x130
Call Trace:
   arch_add_memory+0x74/0x130 (unreliable)
   memremap_pages+0x74c/0xa30
   devm_memremap_pages+0x3c/0xa0
   pmem_attach_disk+0x188/0x770
   nvdimm_bus_probe+0xd8/0x470
   really_probe+0x148/0x570
   driver_probe_device+0x19c/0x1d0
   device_driver_attach+0xcc/0x100
   bind_store+0x134/0x1c0
   drv_attr_store+0x44/0x60
   sysfs_kf_write+0x74/0xc0
   kernfs_fop_write+0x1b4/0x290
   __vfs_write+0x3c/0x70
   vfs_write+0xd0/0x260
   ksys_write+0xdc/0x130
   system_call+0x5c/0x68

Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
---
   arch/arm64/mm/flush.c     | 11 +++++++++++
   arch/powerpc/lib/pmem.c   | 21 +++++++++++++++++++--
   arch/x86/mm/pageattr.c    | 12 ++++++++++++
   include/linux/libnvdimm.h |  1 +
   4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c
index ac485163a4a7..90c54c600023 100644
--- a/arch/arm64/mm/flush.c
+++ b/arch/arm64/mm/flush.c
@@ -91,4 +91,15 @@ void arch_invalidate_pmem(void *addr, size_t size)
          __inval_dcache_area(addr, size);
   }
   EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
+
+unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned 
long size)
+{
+       u32 remainder;
+
+       div_u64_rem(size, PAGE_SIZE * ndr_mappings, &remainder);
+       if (remainder)
+               return PAGE_SIZE * ndr_mappings;
+       return 0;
+}
+EXPORT_SYMBOL_GPL(arch_validate_namespace_size);
   #endif
diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 377712e85605..2e661a08dae5 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -17,14 +17,31 @@ void arch_wb_cache_pmem(void *addr, size_t size)
          unsigned long start = (unsigned long) addr;
          flush_dcache_range(start, start + size);
   }
-EXPORT_SYMBOL(arch_wb_cache_pmem);
+EXPORT_SYMBOL_GPL(arch_wb_cache_pmem);

   void arch_invalidate_pmem(void *addr, size_t size)
   {
          unsigned long start = (unsigned long) addr;
          flush_dcache_range(start, start + size);
   }
-EXPORT_SYMBOL(arch_invalidate_pmem);
+EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
+
+unsigned long arch_validate_namespace_size(unsigned int ndr_mappings, unsigned 
long size)
+{
+       u32 remainder;
+       unsigned long linear_map_size;
+
+       if (radix_enabled())
+               linear_map_size = PAGE_SIZE;
+       else
+               linear_map_size = (1UL << 
mmu_psize_defs[mmu_linear_psize].shift);

This seems more a "supported_alignments" problem, and less a namespace
size or PAGE_SIZE problem, because if the starting address is
misaligned this size validation can still succeed when it shouldn't.



Isn't supported_alignments an indication of how user want the namespace
to be mapped to applications?  Ie, with the above restrictions we can
still do both 64K and 16M mapping of the namespace to userspace.

True, for the pfn device and the device-dax mapping size, but I'm
suggesting adding another instance of alignment control at the raw
namespace level. That would need to be disconnected from the
device-dax page mapping granularity.


Can you explain what you mean by raw namespace level ? We don't have multiple values against which we need to check the alignment of namespace start and namespace size.

If you can outline how and where you would like to enforce that check I can start working on it.

-aneesh

Reply via email to