On Tue, Aug 23, 2016 at 8:58 PM, Dan Williams <dan.j.willi...@intel.com> wrote:
> On Tue, Aug 23, 2016 at 7:53 PM, Dan Williams <dan.j.willi...@intel.com> 
> wrote:
>> On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.k...@hpe.com> wrote:
>>>> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.k...@hpe.com>
>>>> wrote:
>>>> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote:
>>>> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.k...@hpe.com>
>>>> >> wrote:
>>>> >  :
>>>> >> I'm not sure about this fix.  The point of honoring
>>>> >> vmem_altmap_offset() is because a portion of the resource that is
>>>> >> passed to devm_memremap_pages() also contains the metadata info
>>>> block
>>>> >> for the device.  The offset says "use everything past this point for
>>>> >> pages".  This may work for avoiding a crash, but it may corrupt info
>>>> >> block metadata in the process.  Can you provide more information
>>>> >> about the failing scenario to be sure that we are not triggering a
>>>> >> fault on an address that is not meant to have a page mapping?  I.e.
>>>> >> what is the host physical address of the page that caused this fault,
>>>> >> and is it valid?
>>>> >
>>>> > The fault address in question was the 2nd page of an NVDIMM range.  I
>>>> > assumed this fault address was valid and needed to be handled.  Here is
>>>> > some info about the base and patched cases.  Let me know if you need
>>>> > more info.
>>>> >
>>>> > Base
>>>> > ====
>>>> >
>>>> > The following NVDIMM range was set to /dev/dax.
>>>>
>>>> With ndctl create-namespace or manually via sysfs?  Specifically I'm
>>>> looking for what the 'align' attribute was set to when the
>>>> configuration was established.  Can you provide a dump of the sysfs
>>>> attributes for the /dev/dax parent device?
>>>
>>> I used the ndctl command below.
>>> ndctl create-namespace -f -e namespace0.0 -m dax
>>>
>>> Here is additional info from my note for the base case.
>>>
>>> p {struct dev_pagemap} 0xffff88046d0453f0
>>> $3 = {
>>>   altmap = 0xffff88046d045410,
>>>   res = 0xffff88046d0453a8,
>>>   ref = 0xffff88046d0452f0,
>>>   dev = 0xffff880464790410
>>> }
>>>
>>> crash> p {struct vmem_altmap} 0xffff88046d045410
>>> $6 = {
>>>   base_pfn = 0x480000,
>>>   reserve = 0x2,        // PHYS_PFN(SZ_8K)
>>>   free = 0x101fe,
>>>   align = 0x1fe,
>>>   alloc = 0x10000
>>> }
>>
>> Ah, so, on second look the 0x490200000 data offset looks correct.  The
>> total size of the address range is 16GB which equates to 256MB needed
>> for struct page, plus 2MB more to re-align the data on the next 2MB
>> boundary.
>>
>> The question now is why is the guest faulting on an access to an
>> address less than 0x490200000?
>
> Does the attached patch fix this for you?

Sorry, should be this much simpler patch that also mirrors what
driver/nvdimm/pmem.c is doing...
From 3369f0e825c12bb2f17c0f7d3ccecb7c60f645e0 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.willi...@intel.com>
Date: Tue, 23 Aug 2016 19:59:31 -0700
Subject: [PATCH] dax: fix device-dax region base

The data offset for a dax region needs to account for an altmap
reservation in the resource range.  Otherwise, device-dax is allowing
mappings directly into the memmap or device info-block area, with crash
signatures like the following:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
 Call Trace:
   follow_devmap_pmd+0x298/0x2c0
   follow_page_mask+0x275/0x530
   __get_user_pages+0xe3/0x750
   __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
   ? hrtimer_try_to_cancel+0x2c/0x120
   ? kvm_read_l1_tsc+0x55/0x60 [kvm]
   try_async_pf+0x66/0x230 [kvm]
   ? kvm_host_page_size+0x90/0xa0 [kvm]
   tdp_page_fault+0x130/0x280 [kvm]
   kvm_mmu_page_fault+0x5f/0xf0 [kvm]
   handle_ept_violation+0x94/0x180 [kvm_intel]
   vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
   ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
   ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
   kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
   ? wake_up_q+0x44/0x80
   kvm_vcpu_ioctl+0x33c/0x620 [kvm]
   ? __vfs_write+0x37/0x160
   do_vfs_ioctl+0xa2/0x5d0
   SyS_ioctl+0x79/0x90
   entry_SYSCALL_64_fastpath+0x1a/0xa4

Cc: <sta...@vger.kernel.org>
Cc: Andrew Morton <a...@linux-foundation.org>
Fixes: ab68f2622136 ("/dev/dax, pmem: direct access to persistent memory")
Reported-by: Abhilash Kumar Mulumudi <m.abhilash-ku...@hpe.com>
Reported-by: Toshi Kani <toshi.k...@hpe.com>
Signed-off-by: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/dax/pmem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index dfb168568af1..1f01e98c83c7 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -116,6 +116,9 @@ static int dax_pmem_probe(struct device *dev)
 	if (rc)
 		return rc;
 
+	/* adjust the dax_region resource to the start of data */
+	res.start += le64_to_cpu(pfn_sb->dataoff);
+
 	nd_region = to_nd_region(dev->parent);
 	dax_region = alloc_dax_region(dev, nd_region->id, &res,
 			le32_to_cpu(pfn_sb->align), addr, PFN_DEV|PFN_MAP);
-- 
2.5.5

Reply via email to