+Alexander On Sun, Dec 17, 2017 at 08:55:57PM +0800, Fengguang Wu wrote: > Hello, > > FYI this happens in mainline kernel 4.15.0-rc3. > It looks like a new regression. > > It occurs in 4 out of 28 boots. > > [ 166.090516] > ================================================================== > [ 166.092419] BUG: KASAN: use-after-free in fib_table_flush+0x76c/0x870: > fib_table_flush at > net/ipv4/fib_trie.c:1868 > [ 166.092907] Read of size 8 at addr ffff880012fc0b18 by task > kworker/u2:3/173 > [ 166.093402] > [ 166.093528] CPU: 0 PID: 173 Comm: kworker/u2:3 Not tainted 4.15.0-rc3 #31 > [ 166.094018] Workqueue: netns cleanup_net > [ 166.094298] Call Trace: > [ 166.094489] print_address_description+0xa6/0x370: > print_address_description at > mm/kasan/report.c:253 > [ 166.094867] ? fib_table_flush+0x76c/0x870: > fib_table_flush at > net/ipv4/fib_trie.c:1868 > [ 166.095159] kasan_report+0x226/0x330: > kasan_report_error at > mm/kasan/report.c:352 > (inlined by) kasan_report at > mm/kasan/report.c:409 > [ 166.095420] fib_table_flush+0x76c/0x870: > fib_table_flush at > net/ipv4/fib_trie.c:1868 > [ 166.095698] ? fib_table_flush_external+0x5a0/0x5a0: > fib_table_flush at > net/ipv4/fib_trie.c:1836 > [ 166.096067] ? ip_fib_net_exit+0x94/0x360: > ip_fib_net_exit at > net/ipv4/fib_frontend.c:1313 (discriminator 16) > [ 166.096350] ip_fib_net_exit+0x228/0x360: > ip_fib_net_exit at > net/ipv4/fib_frontend.c:1316 > [ 166.096629] ? ip_fib_net_exit+0x360/0x360: > fib_net_exit at > net/ipv4/fib_frontend.c:1355 > [ 166.096930] ops_exit_list+0xa8/0x160 > [ 166.097233] cleanup_net+0x414/0x860: > cleanup_net at > net/core/net_namespace.c:483 (discriminator 9) > [ 166.097487] ? net_drop_ns+0x80/0x80: > cleanup_net at > net/core/net_namespace.c:439 > [ 166.097748] ? kvm_sched_clock_read+0x5/0x10: > kvm_sched_clock_read at > arch/x86/kernel/kvmclock.c:101 > [ 166.098051] ? native_sched_clock_from_tsc+0x40/0x70: > __preempt_count_dec_and_test at > arch/x86/include/asm/preempt.h:91 > (inlined by) cyc2ns_read_end > at arch/x86/kernel/tsc.c:81 > (inlined by) cycles_2_ns at > arch/x86/kernel/tsc.c:135 > (inlined by) > native_sched_clock_from_tsc at arch/x86/kernel/tsc.c:219 > [ 166.098399] ? sched_clock_cpu+0xf/0x70: > sched_clock_cpu at > kernel/sched/clock.c:363 > [ 166.098672] ? __lock_acquire+0x3b2/0x1fc0 > [ 166.099054] ? lock_downgrade+0x6a0/0x6a0: > lock_release at > kernel/locking/lockdep.c:4013 > [ 166.099337] ? lock_acquire+0x117/0x260: > get_current at > arch/x86/include/asm/current.h:15 > (inlined by) lock_acquire at > kernel/locking/lockdep.c:4006 > [ 166.099609] ? process_one_work+0x70f/0x11c0: > process_one_work at > kernel/workqueue.c:2087 > [ 166.099938] process_one_work+0x791/0x11c0: > process_one_work at > kernel/workqueue.c:2118 > [ 166.100229] ? kvm_sched_clock_read+0x5/0x10: > kvm_sched_clock_read at > arch/x86/kernel/kvmclock.c:101 > [ 166.100532] ? sched_clock+0x2d/0x40: > paravirt_sched_clock at > arch/x86/include/asm/paravirt.h:174 > (inlined by) sched_clock at > arch/x86/kernel/tsc.c:227 > [ 166.100792] ? cancel_delayed_work_sync+0x20/0x20: > process_one_work at > kernel/workqueue.c:2014 > [ 166.101123] worker_thread+0xe8/0x1070: > __read_once_size at > include/linux/compiler.h:183 > (inlined by) list_empty at > include/linux/list.h:203 > (inlined by) worker_thread at > kernel/workqueue.c:2247 > [ 166.101392] ? __kthread_parkme+0x164/0x230: > __kthread_parkme at > kernel/kthread.c:188 > [ 166.101689] ? process_one_work+0x11c0/0x11c0: > worker_thread at > kernel/workqueue.c:2189 > [ 166.102006] kthread+0x2fd/0x400: > kthread at kernel/kthread.c:238 > [ 166.102240] ? kthread_create_on_node+0xf0/0xf0: > kthread at kernel/kthread.c:198 > [ 166.102561] ret_from_fork+0x1f/0x30: > ret_from_fork at > arch/x86/entry/entry_64.S:447 > [ 166.102855] > [ 166.102972] Allocated by task 1907: > [ 166.103235] __kmalloc+0xf6/0x1a0: > __kmalloc at mm/slub.c:3765 > [ 166.103475] fib_trie_table+0xe8/0x240: > fib_trie_table at > net/ipv4/fib_trie.c:2081 > [ 166.103748] fib_net_init+0x1bc/0x570: > fib4_rules_init at > net/ipv4/fib_frontend.c:59 > (inlined by) ip_fib_net_init > at net/ipv4/fib_frontend.c:1287 > (inlined by) fib_net_init at > net/ipv4/fib_frontend.c:1335 > [ 166.104032] ops_init+0x1c0/0x360: > ops_init at > net/core/net_namespace.c:119 > [ 166.104269] setup_net+0x23c/0x530: > setup_net at > net/core/net_namespace.c:296 > [ 166.104512] copy_net_ns+0x170/0x350: > copy_net_ns at > net/core/net_namespace.c:420 > [ 166.104779] create_new_namespaces+0x343/0x730: > create_new_namespaces at > kernel/nsproxy.c:107 > [ 166.105091] unshare_nsproxy_namespaces+0xa1/0x150: > unshare_nsproxy_namespaces at > kernel/nsproxy.c:206 (discriminator 4) > [ 166.105427] SyS_unshare+0x338/0x6c0 > [ 166.105682] do_syscall_64+0x21f/0xb80: > do_syscall_64 at > arch/x86/entry/common.c:285 > [ 166.105954] return_from_SYSCALL_64+0x0/0x65: > return_from_SYSCALL_64 at > arch/x86/entry/entry_64.S:259 > [ 166.106253] > [ 166.106367] Freed by task 11: > [ 166.106581] kfree+0x102/0x1d0: > slab_free at mm/slub.c:2973 > (inlined by) kfree at > mm/slub.c:3899 > [ 166.106838] rcu_do_batch+0x331/0x7f0: > rcu_lock_release at > include/linux/rcupdate.h:249 > (inlined by) __rcu_reclaim at > kernel/rcu/rcu.h:196 > (inlined by) rcu_do_batch at > kernel/rcu/tree.c:2758 > [ 166.107102] rcu_cpu_kthread+0x12a/0x160: > rcu_preempt_do_callbacks at > kernel/rcu/tree_plugin.h:687 > (inlined by) > rcu_kthread_do_work at kernel/rcu/tree_plugin.h:1142 > (inlined by) rcu_cpu_kthread > at kernel/rcu/tree_plugin.h:1184 > [ 166.107381] smpboot_thread_fn+0x3c1/0x820: > smpboot_thread_fn at > kernel/smpboot.c:164 > [ 166.107669] kthread+0x2fd/0x400: > kthread at kernel/kthread.c:238 > [ 166.107928] ret_from_fork+0x1f/0x30: > ret_from_fork at > arch/x86/entry/entry_64.S:447 > [ 166.108181] > [ 166.108295] The buggy address belongs to the object at ffff880012fc0ae0 > [ 166.108295] which belongs to the cache kmalloc-64 of size 64 > [ 166.109179] The buggy address is located 56 bytes inside of > [ 166.109179] 64-byte region [ffff880012fc0ae0, ffff880012fc0b20)
Hi Alexander, Note that CONFIG_IP_MULTIPLE_TABLES is disabled, so both the main and local table are allocated during init and also share the same trie. I think that what happens is that ip_fib_net_exit() frees the main table and its trie via an RCU callback which is scheduled before the local table is iterated over, thus resulting in a use-after-free. I can reliably trigger the bug by adding synchronize_rcu() at the end of each iteration of the loop. Problem goes away if we iterate over the tables in reverse order which is symmetric to fib4_rules_init(). What do you think?