On Tue, Aug 23, 2016 at 11:43 AM, Toshi Kani <toshi.k...@hpe.com> wrote:
> The following BUG was observed while starting up KVM with nvdimm
> device as memory-backend-file to /dev/dax.
>
>  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>  IP: [<ffffffff811ac851>] get_zone_device_page+0x11/0x30
>  Call Trace:
>    follow_devmap_pmd+0x298/0x2c0
>    follow_page_mask+0x275/0x530
>    __get_user_pages+0xe3/0x750
>    __gfn_to_pfn_memslot+0x1b2/0x450 [kvm]
>    ? hrtimer_try_to_cancel+0x2c/0x120
>    ? kvm_read_l1_tsc+0x55/0x60 [kvm]
>    try_async_pf+0x66/0x230 [kvm]
>    ? kvm_host_page_size+0x90/0xa0 [kvm]
>    tdp_page_fault+0x130/0x280 [kvm]
>    kvm_mmu_page_fault+0x5f/0xf0 [kvm]
>    handle_ept_violation+0x94/0x180 [kvm_intel]
>    vmx_handle_exit+0x1d3/0x1440 [kvm_intel]
>    ? atomic_switch_perf_msrs+0x6f/0xa0 [kvm_intel]
>    ? vmx_vcpu_run+0x2d1/0x490 [kvm_intel]
>    kvm_arch_vcpu_ioctl_run+0x81d/0x16a0 [kvm]
>    ? wake_up_q+0x44/0x80
>    kvm_vcpu_ioctl+0x33c/0x620 [kvm]
>    ? __vfs_write+0x37/0x160
>    do_vfs_ioctl+0xa2/0x5d0
>    SyS_ioctl+0x79/0x90
>    entry_SYSCALL_64_fastpath+0x1a/0xa4
>
> devm_memremap_pages() calls for_each_device_pfn() to walk through
> all pfns in page_map.  pfn_first(), however, returns a wrong pfn
> that leaves page->pgmap uninitialized.
>
> Since arch_add_memory() has set up direct mappings to the NVDIMM
> range with altmap, pfn_first() should not modify the start pfn.
> Change pfn_first() to simply return pfn of res->start.
>
> Reported-and-tested-by: Abhilash Kumar Mulumudi <m.abhilash-ku...@hpe.com>
> Signed-off-by: Toshi Kani <toshi.k...@hpe.com>
> Cc: Dan Williams <dan.j.willi...@intel.com>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: Brian Starkey <brian.star...@arm.com>
> ---
>  kernel/memremap.c |    8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 251d16b..50ea577 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -210,15 +210,9 @@ static void pgmap_radix_release(struct resource *res)
>
>  static unsigned long pfn_first(struct page_map *page_map)
>  {
> -       struct dev_pagemap *pgmap = &page_map->pgmap;
>         const struct resource *res = &page_map->res;
> -       struct vmem_altmap *altmap = pgmap->altmap;
> -       unsigned long pfn;
>
> -       pfn = res->start >> PAGE_SHIFT;
> -       if (altmap)
> -               pfn += vmem_altmap_offset(altmap);
> -       return pfn;
> +       return res->start >> PAGE_SHIFT;
>  }

I'm not sure about this fix.  The point of honoring
vmem_altmap_offset() is because a portion of the resource that is
passed to devm_memremap_pages() also contains the metadata info block
for the device.  The offset says "use everything past this point for
pages".  This may work for avoiding a crash, but it may corrupt info
block metadata in the process.  Can you provide more information about
the failing scenario to be sure that we are not triggering a fault on
an address that is not meant to have a page mapping?  I.e. what is the
host physical address of the page that caused this fault, and is it
valid?

Reply via email to