I've hit this twice today. It's odd, because afaics, none of this code
has really changed in a long time.

        Dave

Oops: 0000 [#1] SMP
CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:ffffc9000bbffc40 EFLAGS: 00010286
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb
RDX: 0000000000000000 RSI: ffffc9000bbffc50 RDI: 0000000000000000
RBP: ffffc9000bbffc63 R08: 0000000000000000 R09: 0000000000000002
R10: ffffc9000bbffb70 R11: ffffc9000bbffc64 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000003 R15: ffff8804c10d7840
FS:  00007f7cb8965700(0000) GS:ffff88050a200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000003e21ae003 CR4: 00000000001606e0
DR0: 00007fb1d6c22000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7cb82603aa
RSP: 002b:00007ffd60770bc8 EFLAGS: 00000246
 ORIG_RAX: 000000000000003d
RAX: ffffffffffffffda RBX: 00007f7cb6cd4000 RCX: 00007f7cb82603aa
RDX: 000000000000000b RSI: 00007ffd60770bd0 RDI: 0000000000007cca
RBP: 0000000000007cca R08: 00007f7cb8965700 R09: 00007ffd607c7080
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd60770bd0 R14: 00007f7cb6cd4058 R15: 00000000cccccccd
Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 
e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a 
f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 
RIP: proc_flush_task+0x8e/0x1b0 RSP: ffffc9000bbffc40
CR2: 0000000000000000
---[ end trace 53d67a6481059862 ]---

All code
========
   0:   c1 e2 04                shl    $0x4,%edx
   3:   44 8b 60 30             mov    0x30(%rax),%r12d
   7:   48 8b 40 38             mov    0x38(%rax),%rax
   b:   44 8b 34 11             mov    (%rcx,%rdx,1),%r14d
   f:   48 c7 c2 60 3a f5 81    mov    $0xffffffff81f53a60,%rdx
  16:   44 89 e1                mov    %r12d,%ecx
  19:   4c 8b 68 58             mov    0x58(%rax),%r13
  1d:   e8 4b b4 77 00          callq  0x77b46d
  22:   89 44 24 14             mov    %eax,0x14(%rsp)
  26:   48 8d 74 24 10          lea    0x10(%rsp),%rsi
  2b:*  49 8b 7d 00             mov    0x0(%r13),%rdi     <-- trapping 
instruction
  2f:   e8 b9 6a f9 ff          callq  0xfffffffffff96aed
  34:   48 85 c0                test   %rax,%rax
  37:   74 1a                   je     0x53
  39:   48 89 c7                mov    %rax,%rdi
  3c:   48                      rex.W
  3d:   89                      .byte 0x89
  3e:   44                      rex.R
  3f:   24                      .byte 0x24

Code starting with the faulting instruction
===========================================
   0:   49 8b 7d 00             mov    0x0(%r13),%rdi
   4:   e8 b9 6a f9 ff          callq  0xfffffffffff96ac2
   9:   48 85 c0                test   %rax,%rax
   c:   74 1a                   je     0x28
   e:   48 89 c7                mov    %rax,%rdi
  11:   48                      rex.W
  12:   89                      .byte 0x89
  13:   44                      rex.R
  14:   24                      .byte 0x24



This looks like an inlined part of proc_flush_task_mnt

        dentry = d_hash_and_lookup(mnt->mnt_root, &name);
    4f99:       48 8d 74 24 10          lea    0x10(%rsp),%rsi
    4f9e:       49 8b 7d 00             mov    0x0(%r13),%rdi
    4fa2:       e8 00 00 00 00          callq  4fa7 <proc_flush_task+0x97>

So it looks like this..

3097         for (i = 0; i <= pid->level; i++) {
3098                 upid = &pid->numbers[i];
3099                 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
3100                                         tgid->numbers[i].nr);
3101         }

somehow passed a null upid->ns->proc_mnt down there.

I'll try and narrow down a reproducer tomorrow.

Any obvious recent changes that might explain this, or did I just
finally appease the entropy gods enough to find the right combination of
args to hit this ?

        Dave

Reply via email to