On 2/14/20 10:14 PM, Jeff Moyer wrote:
Dan Williams <dan.j.willi...@intel.com> writes:

On Thu, Feb 13, 2020 at 1:55 PM Jeff Moyer <jmo...@redhat.com> wrote:

Dan Williams <dan.j.willi...@intel.com> writes:

The pmem driver on PowerPC crashes with the following signature when
instantiating misaligned namespaces that map their capacity via
memremap_pages().

     BUG: Unable to handle kernel data access at 0xc001000406000000
     Faulting instruction address: 0xc000000000090790
     NIP [c000000000090790] arch_add_memory+0xc0/0x130
     LR [c000000000090744] arch_add_memory+0x74/0x130
     Call Trace:
      arch_add_memory+0x74/0x130 (unreliable)
      memremap_pages+0x74c/0xa30
      devm_memremap_pages+0x3c/0xa0
      pmem_attach_disk+0x188/0x770
      nvdimm_bus_probe+0xd8/0x470

With the assumption that only memremap_pages() has alignment
constraints, enforce memremap_compat_align() for
pmem_should_map_pages(), nd_pfn, or nd_dax cases.

Reported-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
Cc: Jeff Moyer <jmo...@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
Link: 
https://lore.kernel.org/r/158041477336.3889308.4581652885008605170.st...@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.willi...@intel.com>
---
  drivers/nvdimm/namespace_devs.c |   10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/drivers/nvdimm/namespace_devs.c b/drivers/nvdimm/namespace_devs.c
index 032dc61725ff..aff1f32fdb4f 100644
--- a/drivers/nvdimm/namespace_devs.c
+++ b/drivers/nvdimm/namespace_devs.c
@@ -1739,6 +1739,16 @@ struct nd_namespace_common 
*nvdimm_namespace_common_probe(struct device *dev)
               return ERR_PTR(-ENODEV);
       }

+     if (pmem_should_map_pages(dev) || nd_pfn || nd_dax) {
+             struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
+             resource_size_t start = nsio->res.start;
+
+             if (!IS_ALIGNED(start | size, memremap_compat_align())) {
+                     dev_dbg(&ndns->dev, "misaligned, unable to map\n");
+                     return ERR_PTR(-EOPNOTSUPP);
+             }
+     }
+
       if (is_namespace_pmem(&ndns->dev)) {
               struct nd_namespace_pmem *nspm;


Actually, I take back my ack.  :) This prevents a previously working
namespace from being successfully probed/setup.

Do you have a test case handy? I can see a potential gap with a
namespace that used internal padding to fix up the alignment.

# ndctl list -v -n namespace0.0
[
   {
     "dev":"namespace0.0",
     "mode":"fsdax",
     "map":"dev",
     "size":52846133248,
     "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e",
     "raw_uuid":"aff43777-015b-493f-bbf9-7c7b0fe33519",
     "sector_size":512,
     "align":4096,
     "blockdev":"pmem0",
     "numa_node":0
   }
]

# cat /sys/bus/nd/devices/region0/mappings
6

# grep namespace0.0 /proc/iomem
   1860000000-24e0003fff : namespace0.0

The goal of this check is to catch cases that are just going to fail
devm_memremap_pages(), and the expectation is that it could not have
worked before unless it was ported from another platform, or someone
flipped the page-size switch on PowerPC.

On x86, creation and probing of the namespace worked fine before this
patch.  What *doesn't* work is creating another fsdax namespace after
this one.  sector mode namespaces can still be created, though:

[
   {
     "dev":"namespace0.1",
     "mode":"sector",
     "size":53270768640,
     "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7",
     "sector_size":512,
     "blockdev":"pmem0.1s"
   },

# grep namespace0.1 /proc/iomem
   24e0004000-3160007fff : namespace0.1

I thought we were only going to enforce the alignment for a newly
created namespace?  This should only check whether the alignment
works for the current platform.

The model is a new default 16MB alignment is enforced at creation
time, but if you need to support previously created namespaces then
you can manually trim that alignment requirement to no less than
memremap_compat_align() because that's the point at which
devm_memremap_pages() will start failing or crashing.

The problem is that older kernels did not enforce alignment to
SUBSECTION_SIZE.  We shouldn't prevent those namespaces from being
accessed.  The probe itself will not cause the WARN_ON to trigger.
Creating new namespaces at misaligned addresses could, but you've
altered the free space allocation such that we won't hit that anymore.

If I drop this patch, the probe will still work, and allocating new
namespaces will also work:

# ndctl list
[
   {
     "dev":"namespace0.1",
     "mode":"sector",
     "size":53270768640,
     "uuid":"67ea2c74-d4b1-4fc9-9c1a-a7d2a6c2a4a7",
     "sector_size":512,
     "blockdev":"pmem0.1s"
   },
   {
     "dev":"namespace0.0",
     "mode":"fsdax",
     "map":"dev",
     "size":52846133248,
     "uuid":"b99f6f6a-2909-4189-9bfa-6eeebd95d40e",
     "sector_size":512,
     "align":4096,
     "blockdev":"pmem0"
   }
]
  ndctl create-namespace -m fsdax -s 36g -r 0
{
   "dev":"namespace0.2",
   "mode":"fsdax",
   "map":"dev",
   "size":"35.44 GiB (38.05 GB)",
   "uuid":"7893264c-c7ef-4cbe-95e1-ccf2aff041fb",
   "sector_size":512,
   "align":2097152,
   "blockdev":"pmem0.2"
}

proc/iomem:

1860000000-d55fffffff : Persistent Memory
   1860000000-24e0003fff : namespace0.0
   24e0004000-3160007fff : namespace0.1
   3162000000-3a61ffffff : namespace0.2

So, maybe the right thing is to make memremap_compat_align return
PAGE_SIZE for x86 instead of SUBSECTION_SIZE?



I did that as part of https://lore.kernel.org/linux-nvdimm/20200120140749.69549-2-aneesh.ku...@linux.ibm.com and applied the subsection details only when creating new namespace

https://lore.kernel.org/linux-nvdimm/20200120140749.69549-4-aneesh.ku...@linux.ibm.com


But I do agree with the approach that in-order to create a compatible namespace we need enforce max possible align value across all supported architectures.


On POWER we should still be able to enforce SUBSECTION_SIZE restrictions. We did put that as document w.r.t. distributions like Suse https://www.suse.com/support/kb/doc/?id=7024300



-aneesh

Reply via email to