On Fri, 2018-02-02 at 15:30 +0100, Paolo Abeni wrote: > Li Shuang reported an Oops with cls_u32 due to an use-after-free > in u32_destroy_key(). The use-after-free can be triggered with: > > dev=lo > tc qdisc add dev $dev root handle 1: htb default 10 > tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256 > tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\ > 10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1: > tc qdisc del dev $dev root > > Which causes the following kasan splat: > > ================================================================== > BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 > [cls_u32] > Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571 > > CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87 > Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 > Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32] > Call Trace: > dump_stack+0xd6/0x182 > ? dma_virt_map_sg+0x22e/0x22e > print_address_description+0x73/0x290 > kasan_report+0x277/0x360 > ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] > u32_destroy_key.constprop.21+0x117/0x140 [cls_u32] > u32_delete_key_freepf_work+0x1c/0x30 [cls_u32] > process_one_work+0xae0/0x1c80 > ? sched_clock+0x5/0x10 > ? pwq_dec_nr_in_flight+0x3c0/0x3c0 > ? _raw_spin_unlock_irq+0x29/0x40 > ? trace_hardirqs_on_caller+0x381/0x570 > ? _raw_spin_unlock_irq+0x29/0x40 > ? finish_task_switch+0x1e5/0x760 > ? finish_task_switch+0x208/0x760 > ? preempt_notifier_dec+0x20/0x20 > ? __schedule+0x839/0x1ee0 > ? check_noncircular+0x20/0x20 > ? firmware_map_remove+0x73/0x73 > ? find_held_lock+0x39/0x1c0 > ? worker_thread+0x434/0x1820 > ? lock_contended+0xee0/0xee0 > ? lock_release+0x1100/0x1100 > ? init_rescuer.part.16+0x150/0x150 > ? retint_kernel+0x10/0x10 > worker_thread+0x216/0x1820 > ? process_one_work+0x1c80/0x1c80 > ? lock_acquire+0x1a5/0x540 > ? lock_downgrade+0x6b0/0x6b0 > ? sched_clock+0x5/0x10 > ? lock_release+0x1100/0x1100 > ? compat_start_thread+0x80/0x80 > ? do_raw_spin_trylock+0x190/0x190 > ? _raw_spin_unlock_irq+0x29/0x40 > ? trace_hardirqs_on_caller+0x381/0x570 > ? _raw_spin_unlock_irq+0x29/0x40 > ? finish_task_switch+0x1e5/0x760 > ? finish_task_switch+0x208/0x760 > ? preempt_notifier_dec+0x20/0x20 > ? __schedule+0x839/0x1ee0 > ? kmem_cache_alloc_trace+0x143/0x320 > ? firmware_map_remove+0x73/0x73 > ? sched_clock+0x5/0x10 > ? sched_clock_cpu+0x18/0x170 > ? find_held_lock+0x39/0x1c0 > ? schedule+0xf3/0x3b0 > ? lock_downgrade+0x6b0/0x6b0 > ? __schedule+0x1ee0/0x1ee0 > ? do_wait_intr_irq+0x340/0x340 > ? do_raw_spin_trylock+0x190/0x190 > ? _raw_spin_unlock_irqrestore+0x32/0x60 > ? process_one_work+0x1c80/0x1c80 > ? process_one_work+0x1c80/0x1c80 > kthread+0x312/0x3d0 > ? kthread_create_worker_on_cpu+0xc0/0xc0 > ret_from_fork+0x3a/0x50 > > Allocated by task 1688: > kasan_kmalloc+0xa0/0xd0 > __kmalloc+0x162/0x380 > u32_change+0x1220/0x3c9e [cls_u32] > tc_ctl_tfilter+0x1ba6/0x2f80 > rtnetlink_rcv_msg+0x4f0/0x9d0 > netlink_rcv_skb+0x124/0x320 > netlink_unicast+0x430/0x600 > netlink_sendmsg+0x8fa/0xd60 > sock_sendmsg+0xb1/0xe0 > ___sys_sendmsg+0x678/0x980 > __sys_sendmsg+0xc4/0x210 > do_syscall_64+0x232/0x7f0 > return_from_SYSCALL_64+0x0/0x75 > > Freed by task 112: > kasan_slab_free+0x71/0xc0 > kfree+0x114/0x320 > rcu_process_callbacks+0xc3f/0x1600 > __do_softirq+0x2bf/0xc06 > > The buggy address belongs to the object at ffff881b83dae600 > which belongs to the cache kmalloc-4096 of size 4096 > The buggy address is located 24 bytes inside of > 4096-byte region [ffff881b83dae600, ffff881b83daf600) > The buggy address belongs to the page: > page:ffffea006e0f6a00 count:1 mapcount:0 mapping: (null) index:0x0 > compound_mapcount: 0 > flags: 0x17ffffc0008100(slab|head) > raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007 > raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc > >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ================================================================== > > The problem is that the htnode is freed before the linked knodes and the > latter will try to access the first at u32_destroy_key() time. > This change addresses the issue using the htnode refcnt to guarantee > the correct free order. While at it also add a RCU annotation, > to keep sparse happy. > > v1 -> v2: use rtnl_derefence() instead of RCU read locks > > Reported-by: Li Shuang <shu...@redhat.com> > Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter") > Signed-off-by: Paolo Abeni <pab...@redhat.com> > --- > net/sched/cls_u32.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c > index 60c892c36a60..10440fbf3c68 100644 > --- a/net/sched/cls_u32.c > +++ b/net/sched/cls_u32.c > @@ -398,10 +398,12 @@ static int u32_init(struct tcf_proto *tp) > static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n, > bool free_pf) > { > + struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); > + > tcf_exts_destroy(&n->exts); > tcf_exts_put_net(&n->exts); > - if (n->ht_down) > - n->ht_down->refcnt--; > + if (ht && ht->refcnt-- == 0) oops... this ^^^^^^^^^^^^ should be: '--th->refcnt' ...
> > + kfree(ht); #ifdef CONFIG_CLS_U32_PERF > if (free_pf) > free_percpu(n->pf); > @@ -624,7 +626,12 @@ static int u32_destroy_hnode(struct tcf_proto *tp, > struct tc_u_hnode *ht, > idr_destroy(&ht->handle_idr); > idr_remove_ext(&tp_c->handle_idr, ht->handle); > RCU_INIT_POINTER(*hn, ht->next); > - kfree_rcu(ht, rcu); > + > + /* u32_destroy_key() will will later free ht for us, if > + * it's still referenced by some knode > + */ > + if (ht->refcnt == 0) > + kfree_rcu(ht, rcu); > return 0; > } > } > @@ -667,7 +674,11 @@ static void u32_destroy(struct tcf_proto *tp, struct > netlink_ext_ack *extack) > > while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) { > RCU_INIT_POINTER(tp_c->hlist, ht->next); > - kfree_rcu(ht, rcu); > + /* u32_destroy_key() will will later free ht for us, if > + * it's still referenced by some knode > + */ > + if (ht->refcnt == 0) > + kfree_rcu(ht, rcu); And it's probably better merge the previous loop with this one. I'll include the above changes in v3 Paolo