On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.willi...@intel.com> wrote: > On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.willi...@intel.com> > wrote: >> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.k...@hpe.com> wrote: >>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.k...@hpe.com> >>>> wrote: >>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.k...@hpe.com> >>>> >> wrote: >>>> > : >>>> >> I'm not sure about this fix. The point of honoring >>>> >> vmem_altmap_offset() is because a portion of the resource that is >>>> >> passed to devm_memremap_pages() also contains the metadata info >>>> block >>>> >> for the device. The offset says "use everything past this point for >>>> >> pages". This may work for avoiding a crash, but it may corrupt info >>>> >> block metadata in the process. Can you provide more information >>>> >> about the failing scenario to be sure that we are not triggering a >>>> >> fault on an address that is not meant to have a page mapping? I.e. >>>> >> what is the host physical address of the page that caused this fault, >>>> >> and is it valid? >>>> > >>>> > The fault address in question was the 2nd page of an NVDIMM range. I >>>> > assumed this fault address was valid and needed to be handled. Here is >>>> > some info about the base and patched cases. Let me know if you need >>>> > more info. >>>> > >>>> > Base >>>> > ==== >>>> > >>>> > The following NVDIMM range was set to /dev/dax. >>>> >>>> With ndctl create-namespace or manually via sysfs? Specifically I'm >>>> looking for what the 'align' attribute was set to when the >>>> configuration was established. Can you provide a dump of the sysfs >>>> attributes for the /dev/dax parent device? >>> >>> I used the ndctl command below. >>> ndctl create-namespace -f -e namespace0.0 -m dax >>> >>> Here is additional info from my note for the base case. >>> >>> p {struct dev_pagemap} 0xffff88046d0453f0 >>> $3 = { >>> altmap = 0xffff88046d045410, >>> res = 0xffff88046d0453a8, >>> ref = 0xffff88046d0452f0, >>> dev = 0xffff880464790410 >>> } >>> >>> crash> p {struct vmem_altmap} 0xffff88046d045410 >>> $6 = { >>> base_pfn = 0x480000, >>> reserve = 0x2, // PHYS_PFN(SZ_8K) >>> free = 0x101fe, >>> align = 0x1fe, >>> alloc = 0x10000 >>> } >> >> Ah, so, on second look the 0x490200000 data offset looks correct. The >> total size of the address range is 16GB which equates to 256MB needed >> for struct page, plus 2MB more to re-align the data on the next 2MB >> boundary. >> >> The question now is why is the guest faulting on an access to an >> address less than 0x490200000? > > Does the attached patch fix this for you?
Sorry, should be this much simpler patch that also mirrors what driver/nvdimm/pmem.c is doing...
From 3369f0e825c12bb2f17c0f7d3ccecb7c60f645e0 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.willi...@intel.com> Date: Tue, 23 Aug 2016 19:59:31 -0700 Subject: [PATCH] dax: fix device-dax region base The data offset for a dax region needs to account for an altmap reservation in the resource range. Otherwise, device-dax is allowing mappings directly into the memmap or device info-block area, with crash signatures like the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30 Call Trace: follow_devmap_pmd+0x298/0x2c0 follow_page_mask+0x275/0x530 __get_user_pages+0xe3/0x750 __gfn_to_pfn_memslot+0x1b2/0x450 [kvm] ? hrtimer_try_to_cancel+0x2c/0x120 ? kvm_read_l1_tsc+0x55/0x60 [kvm] try_async_pf+0x66/0x230 [kvm] ? kvm_host_page_size+0x90/0xa0 [kvm] tdp_page_fault+0x130/0x280 [kvm] kvm_mmu_page_fault+0x5f/0xf0 [kvm] handle_ept_violation+0x94/0x180 [kvm_intel] vmx_handle_exit+0x1d3/0x1440 [kvm_intel] ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel] ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm] ? wake_up_q+0x44/0x80 kvm_vcpu_ioctl+0x33c/0x620 [kvm] ? __vfs_write+0x37/0x160 do_vfs_ioctl+0xa2/0x5d0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa4 Cc: <sta...@vger.kernel.org> Cc: Andrew Morton <a...@linux-foundation.org> Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory") Reported-by: Abhilash Kumar Mulumudi <m.abhilash-ku...@hpe.com> Reported-by: Toshi Kani <toshi.k...@hpe.com> Signed-off-by: Dan Williams <dan.j.willi...@intel.com> --- drivers/dax/pmem.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c index dfb168568af1..1f01e98c83c7 100644 --- a/drivers/dax/pmem.c +++ b/drivers/dax/pmem.c @@ -116,6 +116,9 @@ static int dax_pmem_probe(struct device *dev) if (rc) return rc; + /* adjust the dax_region resource to the start of data */ + res.start += le64_to_cpu(pfn_sb->dataoff); + nd_region = to_nd_region(dev->parent); dax_region = alloc_dax_region(dev, nd_region->id, &res, le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP); -- 2.5.5