On 02/12/2018 05:03 PM, Dmitry Vyukov wrote: > On Mon, Feb 12, 2018 at 5:00 PM, syzbot > <syzbot+adb03f3f0bb57ce3a...@syzkaller.appspotmail.com> wrote: >> Hello, >> >> syzbot hit the following crash on bpf-next commit >> 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +0000) >> Merge tag 'usercopy-v4.16-rc1' of >> git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux >> >> So far this crash happened 898 times on bpf-next, net-next, upstream. >> C reproducer is attached. >> syzkaller reproducer is attached. >> Raw console output is attached. >> compiler: gcc (GCC) 7.1.1 20170620 >> .config is attached. > > The reproducer first causes several tasks spending minutes at this stack: > > [ 110.762189] NMI backtrace for cpu 2 > [ 110.762206] CPU: 2 PID: 3760 Comm: syz-executor Not tainted 4.15.0+ #96 > [ 110.762210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 110.762224] RIP: 0010:mutex_spin_on_owner+0x303/0x420 > [ 110.762232] INFO: NMI handler (nmi_cpu_backtrace_handler) took too > long to run: 1.103 msecs > [ 110.762237] RSP: 0018:ffff88005be470e8 EFLAGS: 00000246 > [ 110.762268] RAX: ffff88006ca00000 RBX: 0000000000000000 RCX: > ffffffff81554165 > [ 110.762275] RDX: 0000000000000001 RSI: 1ffffffff0d97884 RDI: > 0000000000000000 > [ 110.762281] RBP: ffff88005be47210 R08: dffffc0000000001 R09: > fffffbfff0db2b75 > [ 110.762286] R10: fffffbfff0db2b74 R11: ffffffff86d95ba7 R12: > ffffffff86d95ba0 > [ 110.762292] R13: ffffed000b7c8e25 R14: dffffc0000000000 R15: > ffff880064691040 > [ 110.762300] FS: 00007f84ed029700(0000) GS:ffff88006cb00000(0000) > knlGS:0000000000000000 > [ 110.762305] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 110.762311] CR2: 00007fd565f7b1b0 CR3: 000000005bddf002 CR4: > 00000000001606e0 > [ 110.762316] Call Trace: > [ 110.762383] __mutex_lock.isra.1+0x97d/0x1440 > [ 110.762659] __mutex_lock_slowpath+0xe/0x10 > [ 110.762668] mutex_lock+0x3e/0x50 > [ 110.762677] pcpu_alloc+0x846/0xfe0 > [ 110.762778] __alloc_percpu_gfp+0x27/0x30 > [ 110.762801] array_map_alloc+0x484/0x690 > [ 110.762832] SyS_bpf+0xa27/0x4770 > [ 110.763190] do_syscall_64+0x297/0x760 > [ 110.763260] entry_SYSCALL_64_after_hwframe+0x21/0x86 > > and later machine dies with: > > [ 191.484308] Kernel panic - not syncing: Out of memory and no > killable processes... > [ 191.484308] > [ 191.485740] CPU: 3 PID: 746 Comm: kworker/3:1 Not tainted 4.15.0+ #96 > [ 191.486761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 191.488071] Workqueue: events pcpu_balance_workfn > [ 191.488821] Call Trace: > [ 191.489299] dump_stack+0x175/0x225 > [ 191.490590] panic+0x22a/0x4be > [ 191.493061] out_of_memory.cold.31+0x20/0x21 > [ 191.496380] __alloc_pages_slowpath+0x1d98/0x28a0 > [ 191.503616] __alloc_pages_nodemask+0x89c/0xc60 > [ 191.507876] pcpu_populate_chunk+0x1fd/0x9b0 > [ 191.510114] pcpu_balance_workfn+0x1019/0x1450 > [ 191.517804] process_one_work+0x9d5/0x1460 > [ 191.522714] worker_thread+0x1cc/0x1410 > [ 191.529319] kthread+0x304/0x3c0 > > The original message with attachments is here: > https://groups.google.com/d/msg/syzkaller-bugs/Km3xEZu9zzU/rO-7XuwZAgAJ
[ +Dennis, +Tejun ] Looks like we're stuck in percpu allocator with key/value size of 4 bytes each and large number of entries (max_entries) in the reproducer in above link. Could we have some __GFP_NORETRY semantics and let allocations fail instead of triggering OOM killer?