On Tue, Aug 23, 2016 at 6:29 PM, Kani, Toshimitsu <toshi.k...@hpe.com> wrote: >> On Tue, Aug 23, 2016 at 4:47 PM, Kani, Toshimitsu <toshi.k...@hpe.com> >> wrote: >> > On Tue, 2016-08-23 at 15:32 -0700, Dan Williams wrote: >> >> On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.k...@hpe.com> >> >> wrote: >> > : >> >> I'm not sure about this fix. The point of honoring >> >> vmem_altmap_offset() is because a portion of the resource that is >> >> passed to devm_memremap_pages() also contains the metadata info >> block >> >> for the device. The offset says "use everything past this point for >> >> pages". This may work for avoiding a crash, but it may corrupt info >> >> block metadata in the process. Can you provide more information >> >> about the failing scenario to be sure that we are not triggering a >> >> fault on an address that is not meant to have a page mapping? I.e. >> >> what is the host physical address of the page that caused this fault, >> >> and is it valid? >> > >> > The fault address in question was the 2nd page of an NVDIMM range. I >> > assumed this fault address was valid and needed to be handled. Here is >> > some info about the base and patched cases. Let me know if you need >> > more info. >> > >> > Base >> > ==== >> > >> > The following NVDIMM range was set to /dev/dax. >> >> With ndctl create-namespace or manually via sysfs? Specifically I'm >> looking for what the 'align' attribute was set to when the >> configuration was established. Can you provide a dump of the sysfs >> attributes for the /dev/dax parent device? > > I used the ndctl command below. > ndctl create-namespace -f -e namespace0.0 -m dax > > Here is additional info from my note for the base case. > > p {struct dev_pagemap} 0xffff88046d0453f0 > $3 = { > altmap = 0xffff88046d045410, > res = 0xffff88046d0453a8, > ref = 0xffff88046d0452f0, > dev = 0xffff880464790410 > } > > crash> p {struct vmem_altmap} 0xffff88046d045410 > $6 = { > base_pfn = 0x480000, > reserve = 0x2, // PHYS_PFN(SZ_8K) > free = 0x101fe, > align = 0x1fe, > alloc = 0x10000 > }
Ah, so, on second look the 0x490200000 data offset looks correct. The total size of the address range is 16GB which equates to 256MB needed for struct page, plus 2MB more to re-align the data on the next 2MB boundary. The question now is why is the guest faulting on an access to an address less than 0x490200000?