rds: Refactor __rds_conn_create for blocking transport cleanup

Allison Henderson Tue, 24 Feb 2026 22:45:20 -0800

On Tue, 2026-02-24 at 16:16 +0000, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see 
> https://urldefense.com/v3/__https://netdev-ai.bots.linux.dev/ai-local.html__;!!ACWV5N9M2RV99hQ!Ki9pK6KFAOgmcbdBBDA6lG7dXM_7sJiBAcp2aRSeDi8bRG9ACJajfiLdK-YgNvXyOPqrJpC5y7k9fq6huS8$
>  
> ---
> net/rds: Refactor __rds_conn_create for blocking transport cleanup
> 
> The next patch will delegate fanout operations to a background worker,
> which adds cancel_work_sync() to rds_tcp_conn_free(). This is called
> during a connection cleanup and requires an operations to be blocking.
> However, the error path of __rds_conn_create() currently calls
> trans->conn_free() while holding rds_conn_lock (spinlock) and
> rcu_read_lock, which creates an atomic context where cancel_work_sync()
> cannot sleep.
> 
> Link: 
> https://urldefense.com/v3/__https://ci.syzbot.org/series/1a5ef180-c02c-401d-9df7-670b18570a55__;!!ACWV5N9M2RV99hQ!Ki9pK6KFAOgmcbdBBDA6lG7dXM_7sJiBAcp2aRSeDi8bRG9ACJajfiLdK-YgNvXyOPqrJpC5y7k9noDt20w$
>  
> 
> > diff --git a/net/rds/connection.c b/net/rds/connection.c
> > --- a/net/rds/connection.c
> > +++ b/net/rds/connection.c
> > @@ -170,6 +170,7 @@ static struct rds_connection *__rds_conn_create(struct 
> > net *net,
> >     struct hlist_head *head = rds_conn_bucket(laddr, faddr);
> >     struct rds_transport *loop_trans;
> >     struct rds_conn_path *free_cp = NULL;
> > +   struct rds_transport *free_trans = NULL;
> >     unsigned long flags;
> >     int ret, i;
> >     int npaths = (trans->t_mp_capable ? RDS_MPATH_WORKERS : 1);
> > @@ -305,7 +306,7 @@
> >     if (parent) {
> >             /* Creating passive conn */
> >             if (parent->c_passive) {
> > -                   trans->conn_free(conn->c_path[0].cp_transport_data);
> > +                   free_trans = trans;
> >                     free_cp = conn->c_path;
> >                     kmem_cache_free(rds_conn_slab, conn);
> >                     conn = parent->c_passive;
> 
> [ ... ]
> 
> > @@ -321,18 +322,7 @@
> >             found = rds_conn_lookup(net, head, laddr, faddr, trans,
> >                                     tos, dev_if);
> >             if (found) {
> > -                   struct rds_conn_path *cp;
> > -                   int i;
> > -
> > -                   for (i = 0; i < npaths; i++) {
> > -                           cp = &conn->c_path[i];
> > -                           /* The ->conn_alloc invocation may have
> > -                            * allocated resource for all paths, so all
> > -                            * of them may have to be freed here.
> > -                            */
> > -                           if (cp->cp_transport_data)
> > -                                   trans->conn_free(cp->cp_transport_data);
> > -                   }
> > +                   free_trans = trans;
> >                     free_cp = conn->c_path;
> >                     kmem_cache_free(rds_conn_slab, conn);
> >                     conn = found;
> 
> Does this create a use-after-free in other threads that iterate the
> transport list?
> 
> Looking at the race-loss path when rds_conn_lookup() finds an existing
> connection, the old code called trans->conn_free() before
> kmem_cache_free(conn). Now the patch defers conn_free() to after
> locks are released at the out: label, but kmem_cache_free(conn) still
> executes inside the critical section (line 327 after this change).
> 
> This creates a window where rds_tcp_connection objects remain on
> rds_tcp_conn_list with tc->t_cpath->cp_conn pointing to the
> already-freed conn object.
> 
> The timeline would be:
> 
> CPU 0 (__rds_conn_create race-loss)
>   spin_lock_irqsave(&rds_conn_lock)
>   found = rds_conn_lookup()  // finds existing conn
>   free_trans = trans
>   free_cp = conn->c_path
>   kmem_cache_free(rds_conn_slab, conn)  // conn freed
>   conn = found
>   spin_unlock_irqrestore(&rds_conn_lock)
>   rcu_read_unlock()
>   // ... later at out: label
>   free_trans->conn_free(...)  // removes tc from list, but too late
> 
> CPU 1 (rds_tcp_kill_sock)
>   spin_lock_irq(&rds_tcp_conn_lock)
>   list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node)
>     c_net = read_pnet(&tc->t_cpath->cp_conn->c_net)  // UAF here!
> 
> The commit message states "connections that 'lose' the race are safe to
> clean up outside the critical section since they were never added to
> the hashtable, and therefore are inaccessible to other threads."
> 
> However, while the rds_connection was never added to the RDS hashtable,
> the transport data (tc) WAS added to rds_tcp_conn_list during
> rds_tcp_conn_alloc(). Looking at that function in net/rds/tcp.c:
> 
> rds_tcp_conn_alloc() {
>     // ... allocate tc structures ...
>     spin_lock_irq(&rds_tcp_conn_lock);
>     for (i = 0; i < RDS_MPATH_WORKERS; i++) {
>         tc = conn->c_path[i].cp_transport_data;
>         tc->t_tcp_node_detached = false;
>         list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list);
>     }
>     spin_unlock_irq(&rds_tcp_conn_lock);
> }
> 
> So the tc objects are accessible to other threads via the transport
> list, even though the conn itself never made it into the RDS hashtable.
> 
> Functions that iterate rds_tcp_conn_list include:
> - rds_tcp_kill_sock() at net/rds/tcp.c:610
> - rds_tcp_destroy_conns()
> - rds_tcp_sysctl_reset()
> 
> All of these dereference tc->t_cpath->cp_conn during iteration.
> 
> In the old code, conn_free() ran BEFORE kmem_cache_free(conn), so the tc
> was removed from the list before conn was freed. No UAF window existed.
> 
> Could the kmem_cache_free(rds_conn_slab, conn) be moved to after
> conn_free() at the out: label, or should conn_free() be called before
> kmem_cache_free() inside the critical section for the race-loss paths?
I think it probably needs to go after the the out label.  Will update.  Thank 
you!


Allison

> 
> > @@ -349,9 +339,23 @@
> > 
> >  out:
> >     if (free_cp) {
> > -           for (i = 0; i < npaths; i++)
> > +           for (i = 0; i < npaths; i++) {
> > +                   /*
> > +                    * The trans->conn_alloc call may have allocated
> > +                    * resources for the cp paths, which will need to
> > +                    * be freed before freeing cp itself.  We do this here
> > +                    * after the out: label so it runs outside the
> > +                    * rds_conn_lock spinlock and rcu_read_lock section,
> > +                    * since both destroy_workqueue() and conn_free can
> > +                    * block. The local free_trans pointer is only set in
> > +                    * the race-loss paths where conn_alloc() succeeded.
> > +                    */
> > +                   if (free_trans && free_cp[i].cp_transport_data)
> > +                           free_trans->conn_free
> > +                                   (free_cp[i].cp_transport_data);
> >                     if (free_cp[i].cp_wq != rds_wq)
> >                             destroy_workqueue(free_cp[i].cp_wq);
> > +           }
> >             kfree(free_cp);
> >     }

Re: [net-next,v5,1/2] net/rds: Refactor __rds_conn_create for blocking transport cleanup

Reply via email to