Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

Andrew Morton Fri, 05 Feb 2016 13:59:43 -0800

On Fri, 5 Feb 2016 10:05:02 -0800 Jeremiah Mahler <jmmah...@gmail.com> wrote:


> all,
> 
> On a Lenovo X1 Carbon running -next (20160201+, 20160203+) I have
> experienced several system hangs.  I usually notice it first when
> my browser (Chrome) stops responding but then other programs will stop
> responding as well.  The only fix is a reboot.  It is sporadic but it
> will usually occur once a day.
> 
> In the logs there will be a
> 
>   unable to handle kernel NULL pointer dereference
> 
> message related to filemap_map_pages+0x10d/0x290 (below).
> 
> ------------------------------------------------------------
> ...
> [51985.993033] BUG: unable to handle kernel NULL pointer dereference at 
> 0000000000000008
> [51985.993087] IP: [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
> [51985.993123] PGD 2c772067 PUD 0 
> [51985.993144] Oops: 0000 [#1] SMP 
> [51985.993166] Modules linked in: ctr ccm cpufreq_conservative cpufreq_stats 
> cpufreq_userspace cpufreq_powersave binfmt_misc i915 arc4 iwldvm mac80211 
> x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul 
> crc32c_intel iTCO_wdt ghash_clmulni_intel iTCO_vendor_support 
> jitterentropy_rng sha256_generic hmac drbg snd_hda_codec_hdmi aesni_intel 
> snd_hda_codec_realtek aes_x86_64 iwlwifi glue_helper snd_hda_codec_generic 
> i2c_algo_bit lrw drm_kms_helper gf128mul ablk_helper cryptd snd_hda_intel drm 
> psmouse snd_hda_codec cfg80211 pcspkr evdev serio_raw snd_hwdep i2c_i801 
> snd_hda_core sg snd_pcm mei_me lpc_ich mfd_core mei shpchp snd_timer i2c_core 
> wmi thinkpad_acpi nvram snd battery tpm_tis soundcore ac tpm video button 
> intel_smartconnect btusb btbcm btintel bluetooth rfkill loop ipv6 autofs4
> [51985.993591]  ext4 crc16 mbcache jbd2 sd_mod ahci libahci libata ehci_pci 
> sdhci_pci scsi_mod xhci_pci sdhci xhci_hcd ehci_hcd mmc_core usbcore 
> usb_common thermal
> [51985.993680] CPU: 2 PID: 22993 Comm: chrome Not tainted 
> 4.5.0-rc2-next-20160203+ #11
> [51985.993714] Hardware name: LENOVO 3443CTO/3443CTO, BIOS G6ET59WW (2.03 ) 
> 09/11/2012
> [51985.993760] task: ffff88004bb04dc0 ti: ffff88002a2f8000 task.ti: 
> ffff88002a2f8000
> [51985.993804] RIP: 0010:[<ffffffff8114a19d>]  [<ffffffff8114a19d>] 
> filemap_map_pages+0x10d/0x290
> [51985.993845] RSP: 0000:ffff88002a2fbdf8  EFLAGS: 00010202
> [51985.993874] RAX: 00000007fffffff8 RBX: 0000000000000001 RCX: 
> 0000000000000003
> [51985.993911] RDX: 0000000000000000 RSI: ffffea00005bdd1c RDI: 
> ffffea00005bdd00
> [51985.993948] RBP: ffff8800beff4220 R08: 000000000000007f R09: 
> 0000000000000000
> [51985.993985] R10: 0000000000000000 R11: ffff8800a39382b8 R12: 
> ffff8801182b9440
> [51985.994023] R13: ffff88002a2fbe90 R14: ffff8800be568d80 R15: 
> 0000000000000008
> [51985.994061] FS:  00007f3e20276a40(0000) GS:ffff88011e300000(0000) 
> knlGS:0000000000000000
> [51985.994103] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [51985.994134] CR2: 0000000000000008 CR3: 00000000be6eb000 CR4: 
> 00000000001406e0
> [51985.994172] Stack:
> [51985.994184]  ffff8800beff4228 00007f3e0ae63000 0000000000000001 
> 0000000000000000
> [51985.994229]  0000000000000001 00007f3e0ae63000 ffff8800be568d80 
> 0000000000000054
> [51985.994273]  ffff880000000318 ffff88003584c318 ffff8800840076c0 
> ffffffff8117b073
> [51985.994318] Call Trace:
> [51985.994336]  [<ffffffff8117b073>] ? handle_mm_fault+0x13b3/0x1790
> [51985.994370]  [<ffffffff810aa7f1>] ? up_write+0x21/0x30
> [51985.994400]  [<ffffffff81059052>] ? __do_page_fault+0x192/0x410
> [51985.994434]  [<ffffffff814ffcc8>] ? page_fault+0x28/0x30
> [51985.994463] Code: 00 00 00 48 8b 54 24 10 49 3b 55 28 74 48 48 8b 44 24 18 
> 83 e8 01 29 d0 49 8d 04 c7 49 39 c7 74 19 49 83 c7 08 48 83 44 24 10 01 <49> 
> 83 3f 00 74 eb 4d 85 ff 0f 85 3b ff ff ff 48 8b 3c 24 48 8d 
> [51985.994656] RIP  [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
> [51985.994692]  RSP <ffff88002a2fbdf8>
> [51985.994711] CR2: 0000000000000008
> 
> ...
>
> Referring again to the RIP line from the trace.
> 
> [51985.994656] RIP  [<ffffffff8114a19d>] filemap_map_pages+0x10d/0x290
> 
> jeri@hudson:~/linux-next$ gdb vmlinux
> (gdb) list *0xffffffff8114a19d
> 0xffffffff8114a19d is in filemap_map_pages
> (include/linux/radix-tree.h:465).
> 460                   unsigned size = radix_tree_chunk_size(iter) - 1;
> 461   
> 462                   while (size--) {
> 463                           slot++;
> 464                           iter->index++;
> 465                           if (likely(*slot))
> 466                                   return slot;
> 467                           if (flags & RADIX_TREE_ITER_CONTIG) {
> 468                                   /* forbid switching to the next chunk */
> 469                                   iter->next_index = 0;
> (gdb)
> 
> Assuming I traced the addresses correctly, this indicates that the
> fault is triggered when the value in the slot pointer is accessed.
> Perhaps slot is being incremented beyond its valid range?

That's super helpful, thanks.

The faulting address was 0x0000000000000008, so radix_tree_next_slot()
was called with slot==NULL.

And looking at it, I don't see how this code can work at all:


:       radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, vmf->pgoff) {
:               if (iter.index > vmf->max_pgoff)
:                       break;
: repeat:
:               page = radix_tree_deref_slot(slot);
:               if (unlikely(!page))
:                       goto next;
:               if (radix_tree_exception(page)) {
:                       if (radix_tree_deref_retry(page)) {
:                               slot = radix_tree_iter_retry(&iter);

radix_tree_iter_retry() unconditionally returns NULL

:                               continue;

here we go and execute the third clause of the
radix_tree_for_each_slot() `for' statement:

             slot = radix_tree_next_slot(slot, iter, 0))

with slot==NULL.  This will dereference 0x8 every time.

:                       }
:                       goto next;
:               }

Re: [REGRESSION] mm: filemap_map_pages NULL pointer dereference

Reply via email to