On Sat, May 20, 2017 at 11:38:50AM +0900, Masami Hiramatsu wrote:
> Hi Luis,
> 
> On Fri, 19 May 2017 19:28:54 +0200
> "Luis R. Rodriguez" <mcg...@kernel.org> wrote:
> 
> > On Fri, May 19, 2017 at 05:40:16PM +0200, Luis R. Rodriguez wrote:
> > > On Fri, May 19, 2017 at 05:08:02AM +0200, Luis R. Rodriguez wrote:
> > > > On Fri, May 19, 2017 at 02:44:14AM +0200, Luis R. Rodriguez wrote:
> > > > > On Wed, May 17, 2017 at 10:53:06AM -0700, Kees Cook wrote:
> > > > > > On Wed, May 17, 2017 at 9:40 AM, Luis R. Rodriguez 
> > > > > > <mcg...@kernel.org> wrote:
> > > > > > > Yes, but I had killed that boot session again, so upon my next 
> > > > > > > boot
> > > > > > > I had a different layout, the ASLR gap was much larger:
> > > > > > >
> > > > > > > ---[ Modules ]---
> > > > > > > 0xffffffffc0000000-0xffffffffc01b0000        1728K                
> > > > > > >                pte
> > > > > > > 0xffffffffc01b0000-0xffffffffc01b1000           4K     RW         
> > > > > > >         GLB x  pte
> > > > > > > 0xffffffffc01b1000-0xffffffffc01b2000           4K                
> > > > > > >                pte
> > > > > > > 0xffffffffc01b2000-0xffffffffc01c6000          80K     ro         
> > > > > > >         GLB x  pte
> > > > > > > 0xffffffffc01c6000-0xffffffffc01cc000          24K     ro         
> > > > > > >         GLB NX pte
> > > > > > > 0xffffffffc01cc000-0xffffffffc01d5000          36K     RW         
> > > > > > >         GLB NX pte
> > > > > > >
> > > > > > > As you can guess if we follow similar pattern the RW hole is the 
> > > > > > > one this boot
> > > > > > > warned about:
> > > > > > >
> > > > > > > [    1.450483] x86/mm: Found insecure W+X mapping at address 
> > > > > > > ffffffffc01b0000/0xffffffffc01b0000
> > > > > > > [    1.451280] ------------[ cut here ]------------
> > > > > > > [    1.451721] WARNING: CPU: 1 PID: 1 at 
> > > > > > > arch/x86/mm/dump_pagetables.c:236 note_page+0x630/0x7e0
> > > > > > > [    1.452499] Modules linked in:
> > > > > > > [    1.452791] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> > > > > > > 4.12.0-rc1-next-20170515+ #145
> > > > > > >
> > > > > > > I checked and indeed 0xffffffffc01b2000 is part of a module, it 
> > > > > > > was not the first one
> > > > > > > on the /proc/modules list but then again /proc/modules does not 
> > > > > > > seem to have a specific
> > > > > > > order other than perhaps being pegged into a linked list of 
> > > > > > > modules once they go live,
> > > > > > > and it seems its typically output backwards from when that 
> > > > > > > happened, sorting that
> > > > > > > by address we get:
> > > > > > 
> > > > > > Right, sorry, I'd expect it at the bottom of the list in
> > > > > > /proc/modules, but that's fine, it's there.
> > > > > > 
> > > > > > >
> > > > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3
> > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > > > mbcache 16384 1 ext4, Live 0xffffffffc01d6000 (E)
> > > > > > > scsi_mod 217088 4 sg,sr_mod,sd_mod,libata, Live 
> > > > > > > 0xffffffffc01df000 (E)
> > > > > > >
> > > > > > > And this then seems to be the first module loaded:
> > > > > > >
> > > > > > > e1000 143360 0 - Live 0xffffffffc01b2000 (E)
> > > > > > >
> > > > > > > The output of dmesg seems to confirm this as per the list of 
> > > > > > > modules sorted
> > > > > > > as per above.
> > > > > > >
> > > > > > >> Something touched the module gap and left is RW+x...
> > > > > > >
> > > > > > > Lemme try booting with e1000 renamed to e1000.ko.ignore and see 
> > > > > > > how that goes.
> > > > > > 
> > > > > > Is it possible a module got loaded before e1000 and then unloaded?
> > > > > > That seems odd, but maybe unload isn't cleaning up?
> > > > > > 
> > > > > > >> Are you able to bisect this?
> > > > > > >
> > > > > > > This issue has been present for a while so since I recall this I 
> > > > > > > might be
> > > > > > > able to reduce the number of needed target kernels to bisect. 
> > > > > > > Lemme tinker
> > > > > > > a bit and if no clear culprit comes up then will try bisect.
> > > > > > 
> > > > > > Okay, thanks!
> > > > > 
> > > > > Sorry to report that this issue is present since the feature's 
> > > > > addition. So
> > > > > the issue is there since its addition and is still present today. 
> > > > > *But* it
> > > > > may also be a configuration issue, given I have booted this guest 
> > > > > *without*
> > > > > this issue ...
> > > > > 
> > > > > So:
> > > > > 
> > > > > git checkout -b WX e1a58320a38dfa72be48a0f1a3a92273663ba6db
> > > > > 
> > > > > That boots with the warning. To help debug further I've minimized my 
> > > > > modules
> > > > > to only a few: scsi_mod, e1000, libata.
> > > > > 
> > > > > I suspect at this point this is not the fault of a particular module 
> > > > > but
> > > > > instead just an accounting semantic (>= or <= on an edge) but let's 
> > > > > see.
> > > > > 
> > > > > I now boot on 4.3.0-rc3 on commit (e1a58320a38df ("x86/mm: Warn on W^X
> > > > > mappings") and I with:
> > > > > 
> > > > > [    0.949435] ------------[ cut here ]------------                   
> > > > >           
> > > > > [    0.949992] WARNING: CPU: 2 PID: 1 at 
> > > > > arch/x86/mm/dump_pagetables.c:225 note_page+0x635/0x7e0()
> > > > > [    0.950996] x86/mm: Found insecure W+X mapping at address 
> > > > > ffffffffc0000000/0xffffffffc0000000
> > > > > [    0.951814] Modules linked in:                                     
> > > > >           
> > > > > [    0.952123] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 
> > > > > 4.3.0-rc3-FINAL-TEST-WITH-WX-NOFLOPPY+ #365
> > > > > [    0.952929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > > > [    0.954033]  0000000000000000 000000001f722925 ffff88013a5d7d40 
> > > > > ffffffff812ff335
> > > > > [    0.954742]  ffff88013a5d7d88 ffff88013a5d7d78 ffffffff81079be2 
> > > > > ffff88013a5d7e90
> > > > > [    0.955522]  0000000000000000 0000000000000004 0000000000000000 
> > > > > 0000000000000000
> > > > > [    0.956256] Call Trace:                                            
> > > > >           
> > > > > [    0.956496]  [<ffffffff812ff335>] dump_stack+0x44/0x5f             
> > > > >           
> > > > > [    0.956953]  [<ffffffff81079be2>] warn_slowpath_common+0x82/0xc0   
> > > > >           
> > > > > [    0.957519]  [<ffffffff81079c7c>] warn_slowpath_fmt+0x5c/0x80      
> > > > >           
> > > > > [    0.958066]  [<ffffffff8106c155>] note_page+0x635/0x7e0            
> > > > >           
> > > > > [    0.958595]  [<ffffffff8106c5eb>] 
> > > > > ptdump_walk_pgd_level_core+0x2eb/0x410     
> > > > > [    0.959219]  [<ffffffff8106c7b7>] 
> > > > > ptdump_walk_pgd_level_checkwx+0x17/0x20    
> > > > > [    0.959856]  [<ffffffff8106260d>] mark_rodata_ro+0xed/0x100        
> > > > >           
> > > > > [    0.960372]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80            
> > > > >           
> > > > > [    0.960869]  [<ffffffff815aa7ed>] kernel_init+0x1d/0xe0            
> > > > >           
> > > > > [    0.961358]  [<ffffffff815b798f>] ret_from_fork+0x3f/0x70          
> > > > >           
> > > > > [    0.961900]  [<ffffffff815aa7d0>] ? rest_init+0x80/0x80            
> > > > >           
> > > > > [    0.962389] ---[ end trace 6125ebcb24c9e3d0 ]---                   
> > > > >           
> > > > > [    0.962822] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages 
> > > > > found.         
> > > > >                                                                       
> > > > >           
> > > > >                                                                       
> > > > >           
> > > > > ---[ High Kernel Mapping ]---                                         
> > > > >           
> > > > > 0xffffffff80000000-0xffffffff81000000          16M                    
> > > > >            pmd
> > > > > 0xffffffff81000000-0xffffffff81600000           6M     ro         PSE 
> > > > >     GLB x  pmd
> > > > > 0xffffffff81600000-0xffffffff81a00000           4M     ro         PSE 
> > > > >     GLB NX pmd
> > > > > 0xffffffff81a00000-0xffffffff81c00000           2M     RW             
> > > > >     GLB NX pte
> > > > > 0xffffffff81c00000-0xffffffff82200000           6M     RW         PSE 
> > > > >     GLB NX pmd
> > > > > 0xffffffff82200000-0xffffffff82400000           2M     RW             
> > > > >     GLB NX pte
> > > > > 0xffffffff82400000-0xffffffffc0000000         988M                    
> > > > >            pmd
> > > > > ---[ Modules ]---                                                     
> > > > >           
> > > > > 0xffffffffc0000000-0xffffffffc0001000           4K     RW             
> > > > >     GLB x  pte
> > > > > 0xffffffffc0001000-0xffffffffc0002000           4K                    
> > > > >            pte
> > > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW             
> > > > >     GLB x  pte
> > > > > 
> > > > > root@piggy:~# cat /proc/modules | sort -k 6 | head -3                 
> > > > >           
> > > > > scsi_mod 221979 4 sg,sd_mod,sr_mod,libata, Live 0xffffffffc0002000 
> > > > > (E)          
> > > > > e1000 127757 0 - Live 0xffffffffc004d000 (E)                          
> > > > >           
> > > > > libata 229931 2 ata_generic,ata_piix, Live 0xffffffffc0076000 (E) 
> > > > > 
> > > > > So that 4K RW seems suspect of getting used for allocation purpose on 
> > > > > edge
> > > > > for a particular reason and it also happens to be on the edge of the 
> > > > > high
> > > > > kernel mapping. Could it be the boundary semantic issue ?
> > > > > 
> > > > > For instance can it be that since 0xffffffffc0002000 is given to the 
> > > > > first
> > > > > module by the allocator, scsi_mod, and since that address is 
> > > > > *technically*
> > > > > part of two boundaries we get a splat ?
> > > > > 
> > > > > 0xffffffffc0001000-0xffffffffc0002000           4K                    
> > > > >            pte
> > > > > 0xffffffffc0002000-0xffffffffc0039000         220K     RW             
> > > > >     GLB x  pte
> > > > 
> > > > Note on the latest linux-next and on the commit that introduced this 
> > > > the config
> > > > and kernel yields only *one* page:
> > > > 
> > > > x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > > > 
> > > > I believe this is more indications my suspicion might be right.
> > > 
> > > If the following is a legit forced way to get query the kernel to ask it 
> > > who owns a page then perhaps this technique can be used in the future to
> > > figure out who the hell caused this. Catalin, can you confirm? In this
> > > case this is perhaps not a leaked page but I am trying to abuse the
> > > kmemleak debugfs API to query who allocated the page. Is that fine?
> > > 
> > > [    0.916771] WARNING: CPU: 0 PID: 1 at 
> > > arch/x86/mm/dump_pagetables.c:235 note_page+0x63c/0x7e0
> > > [    0.917636] x86/mm: Found insecure W+X mapping at address 
> > > ffffffffc03d5000/0xffffffffc03d5000
> > > [    0.918502] Modules linked in:
> > > [    0.918819] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> > > 4.11.0-mcgrof-force-config #340
> > > [    0.919631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> > > [    0.920011] Call Trace:
> > > [    0.920011]  dump_stack+0x63/0x81
> > > [    0.920011]  __warn+0xcb/0xf0
> > > [    0.920011]  warn_slowpath_fmt+0x5a/0x80
> > > [    0.920011]  note_page+0x63c/0x7e0
> > > [    0.920011]  ptdump_walk_pgd_level_core+0x3b1/0x460
> > > [    0.920011]  ? 0xffffffff86c00000
> > > [    0.920011]  ptdump_walk_pgd_level_checkwx+0x17/0x20
> > > [    0.920011]  mark_rodata_ro+0xf4/0x100
> > > [    0.920011]  ? rest_init+0x80/0x80
> > > [    0.920011]  kernel_init+0x2a/0x100
> > > [    0.920011]  ret_from_fork+0x2c/0x40
> > > [    0.925474] ---[ end trace dca00cd779490a2b ]---
> > > [    0.925959] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.
> > > 
> > > echo dump=0xffffffffc03d5000 > /sys/kernel/debug/kmemleak
> > > dmesg | tail
> > > 
> > > [   49.209565] kmemleak: Object 0xffffffffc03d5000 (size 335):
> > > [   49.210814] kmemleak:   comm "swapper/0", pid 1, jiffies 4294892440
> > > [   49.212148] kmemleak:   min_count = 2
> > > [   49.212852] kmemleak:   count = 0
> > > [   49.213363] kmemleak:   flags = 0x1
> > > [   49.213363] kmemleak:   checksum = 0
> > > [   49.213363] kmemleak:   backtrace:
> > > [   49.213363]      kmemleak_alloc+0x4a/0xa0
> > > [   49.213363]      __vmalloc_node_range+0x20a/0x2b0
> > > [   49.213363]      module_alloc+0x67/0xc0
> > > [   49.213363]      arch_ftrace_update_trampoline+0xba/0x260
> > > [   49.213363]      ftrace_startup+0x90/0x210
> > > [   49.213363]      register_ftrace_function+0x4b/0x60
> > > [   49.213363]      arm_kprobe+0x84/0xe0
> > > [   49.213363]      register_kprobe+0x56e/0x5b0
> > > [   49.213363]      init_test_probes+0x61/0x560
> > > [   49.213363]      init_kprobes+0x1e3/0x206
> > > [   49.213363]      do_one_initcall+0x52/0x1a0
> > > [   49.213363]      kernel_init_freeable+0x178/0x200
> > > [   49.213363]      kernel_init+0xe/0x100
> > > [   49.213363]      ret_from_fork+0x2c/0x40
> > > [   49.213363]      0xffffffffffffffff
> > 
> > Aha! And the winner is:
> > 
> > CONFIG_KPROBES_SANITY_TEST
> > 
> > I confirm disabling it on 4.3.0-rc3 and on linux-next next-20170519 avoids 
> > the WARN.
> > I also can confirm using the 'echo dump=mem-area > 
> > /sys/kernel/debug/kmemleak' yields
> > the same trace for both of these kernels.
> > 
> > So -- the above kmemleak hack seems to actually work to seek who owns that 
> > page.
> > 
> > Now to figure out how the hell kernel/test_kprobes.c screws around with 
> > things.
> 
> Ah, that was fixed recently;
> 
> https://marc.info/?l=linux-kernel&m=149076389011850
> 
> Note that this patch depends another patch in the series;
> 
> https://marc.info/?l=linux-kernel&m=149076370111812&w=2

I actually boot tested linux-next tag next-20170519 which carries these
patches and the WARNING still is there. Please note the issue is with
CONFIG_KPROBES_SANITY_TEST enabled.

[    1.025601] x86/mm: Found insecure W+X mapping at address 
ffffffffc01e7000/0xffffffffc01e7000
[    1.026429] ------------[ cut here ]------------
[    1.026885] WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:236 
note_page+0x630/0x7e0
[    1.027711] Modules linked in:
[    1.028032] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.12.0-rc1-next-20170519 #151
[    1.028788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[    1.029928] task: ffff9fd47a5ccc80 task.stack: ffffb6bcc0630000
[    1.030509] RIP: 0010:note_page+0x630/0x7e0
[    1.030917] RSP: 0000:ffffb6bcc0633df0 EFLAGS: 00010286
[    1.031425] RAX: 0000000000000051 RBX: ffffb6bcc0633e88 RCX: ffffffffbb656708
[    1.032132] RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000246
[    1.032834] RBP: ffffb6bcc0633e28 R08: 203a6d6d2f363878 R09: 0000000000000161
[    1.033539] R10: ffffb6bcc0633dd8 R11: 736e6920646e756f R12: 0000000000000000
[    1.034235] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
[    1.034927] FS:  0000000000000000(0000) GS:ffff9fd47fc80000(0000) 
knlGS:0000000000000000
[    1.035722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.036290] CR2: ffffb6bcc073c000 CR3: 0000000053209000 CR4: 00000000000006e0
[    1.036839] Call Trace:
[    1.037034]  ptdump_walk_pgd_level_core+0x3e7/0x490
[    1.037367]  ? 0xffffffffbaa00000
[    1.037705]  ptdump_walk_pgd_level_checkwx+0x17/0x20
[    1.038187]  mark_rodata_ro+0xf4/0x100
[    1.038559]  ? rest_init+0x80/0x80
[    1.038890]  kernel_init+0x2f/0x100
[    1.039235]  ret_from_fork+0x2c/0x40
[    1.039582] Code: 48 c7 43 28 00 00 00 00 48 89 43 20 e9 05 fd ff ff 48 8b 
73 10 48 c7 c7 f0 3d 3e bb c6 05 f8 eb bc 00 01 48 89 f2 e8 1d 02 12 00 <0f> ff 
e9 1f fa ff ff 48 8b 70 20 48 c7 c7 3c ba 3e bb e8 06 02
[    1.041416] ---[ end trace e726c1b63e5a81a9 ]---
[    1.041872] x86/mm: Checked W+X mappings: FAILED, 1 W+X pages found.

root@piggy:~# echo dump=0xffffffffc01e7000 > /sys/kernel/debug/kmemleak

On dmesg:

May 23 07:44:51 piggy kernel: kmemleak: Object 0xffffffffc01e7000 (size 335):
May 23 07:44:51 piggy kernel: kmemleak:   comm "swapper/0", pid 1, jiffies 
4294892451
May 23 07:44:51 piggy kernel: kmemleak:   min_count = 2
May 23 07:44:51 piggy kernel: kmemleak:   count = 2
May 23 07:44:51 piggy kernel: kmemleak:   flags = 0x1
May 23 07:44:51 piggy kernel: kmemleak:   checksum = 0
May 23 07:44:51 piggy kernel: kmemleak:   backtrace:
May 23 07:44:51 piggy kernel:      kmemleak_alloc+0x4a/0xa0
May 23 07:44:51 piggy kernel:      __vmalloc_node_range+0x20c/0x2b0
May 23 07:44:51 piggy kernel:      module_alloc+0x67/0xc0
May 23 07:44:51 piggy kernel:      arch_ftrace_update_trampoline+0xc1/0x240
May 23 07:44:51 piggy kernel:      ftrace_startup+0x92/0x210
May 23 07:44:51 piggy kernel:      register_ftrace_function+0x4b/0x60
May 23 07:44:51 piggy kernel:      arm_kprobe+0x84/0xc0
May 23 07:44:51 piggy kernel:      register_kprobe+0x59c/0x5e0
May 23 07:44:51 piggy kernel:      init_test_probes+0x61/0x560
May 23 07:44:51 piggy kernel:      init_kprobes+0x1ea/0x20d
May 23 07:44:51 piggy kernel:      do_one_initcall+0x52/0x1a0
May 23 07:44:51 piggy kernel:      kernel_init_freeable+0x17d/0x205
May 23 07:44:51 piggy kernel:      kernel_init+0xe/0x100
May 23 07:44:51 piggy kernel:      ret_from_fork+0x2c/0x40
May 23 07:44:51 piggy kernel:      0xffffffffffffffff

  Luis

Reply via email to