rds: Refactor __rds_conn_create for blocking transport cleanup

Simon Horman Tue, 24 Feb 2026 08:19:33 -0800

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.


For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/rds: Refactor __rds_conn_create for blocking transport cleanup

The next patch will delegate fanout operations to a background worker,
which adds cancel_work_sync() to rds_tcp_conn_free(). This is called
during a connection cleanup and requires an operations to be blocking.
However, the error path of __rds_conn_create() currently calls
trans->conn_free() while holding rds_conn_lock (spinlock) and
rcu_read_lock, which creates an atomic context where cancel_work_sync()
cannot sleep.

Link: https://ci.syzbot.org/series/1a5ef180-c02c-401d-9df7-670b18570a55

> diff --git a/net/rds/connection.c b/net/rds/connection.c
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -170,6 +170,7 @@ static struct rds_connection *__rds_conn_create(struct 
> net *net,
>       struct hlist_head *head = rds_conn_bucket(laddr, faddr);
>       struct rds_transport *loop_trans;
>       struct rds_conn_path *free_cp = NULL;
> +     struct rds_transport *free_trans = NULL;
>       unsigned long flags;
>       int ret, i;
>       int npaths = (trans->t_mp_capable ? RDS_MPATH_WORKERS : 1);
> @@ -305,7 +306,7 @@
>       if (parent) {
>               /* Creating passive conn */
>               if (parent->c_passive) {
> -                     trans->conn_free(conn->c_path[0].cp_transport_data);
> +                     free_trans = trans;
>                       free_cp = conn->c_path;
>                       kmem_cache_free(rds_conn_slab, conn);
>                       conn = parent->c_passive;

[ ... ]

> @@ -321,18 +322,7 @@
>               found = rds_conn_lookup(net, head, laddr, faddr, trans,
>                                       tos, dev_if);
>               if (found) {
> -                     struct rds_conn_path *cp;
> -                     int i;
> -
> -                     for (i = 0; i < npaths; i++) {
> -                             cp = &conn->c_path[i];
> -                             /* The ->conn_alloc invocation may have
> -                              * allocated resource for all paths, so all
> -                              * of them may have to be freed here.
> -                              */
> -                             if (cp->cp_transport_data)
> -                                     trans->conn_free(cp->cp_transport_data);
> -                     }
> +                     free_trans = trans;
>                       free_cp = conn->c_path;
>                       kmem_cache_free(rds_conn_slab, conn);
>                       conn = found;

Does this create a use-after-free in other threads that iterate the
transport list?

Looking at the race-loss path when rds_conn_lookup() finds an existing
connection, the old code called trans->conn_free() before
kmem_cache_free(conn). Now the patch defers conn_free() to after
locks are released at the out: label, but kmem_cache_free(conn) still
executes inside the critical section (line 327 after this change).

This creates a window where rds_tcp_connection objects remain on
rds_tcp_conn_list with tc->t_cpath->cp_conn pointing to the
already-freed conn object.

The timeline would be:

CPU 0 (__rds_conn_create race-loss)
  spin_lock_irqsave(&rds_conn_lock)
  found = rds_conn_lookup()  // finds existing conn
  free_trans = trans
  free_cp = conn->c_path
  kmem_cache_free(rds_conn_slab, conn)  // conn freed
  conn = found
  spin_unlock_irqrestore(&rds_conn_lock)
  rcu_read_unlock()
  // ... later at out: label
  free_trans->conn_free(...)  // removes tc from list, but too late

CPU 1 (rds_tcp_kill_sock)
  spin_lock_irq(&rds_tcp_conn_lock)
  list_for_each_entry_safe(tc, _tc, &rds_tcp_conn_list, t_tcp_node)
    c_net = read_pnet(&tc->t_cpath->cp_conn->c_net)  // UAF here!

The commit message states "connections that 'lose' the race are safe to
clean up outside the critical section since they were never added to
the hashtable, and therefore are inaccessible to other threads."

However, while the rds_connection was never added to the RDS hashtable,
the transport data (tc) WAS added to rds_tcp_conn_list during
rds_tcp_conn_alloc(). Looking at that function in net/rds/tcp.c:

rds_tcp_conn_alloc() {
    // ... allocate tc structures ...
    spin_lock_irq(&rds_tcp_conn_lock);
    for (i = 0; i < RDS_MPATH_WORKERS; i++) {
        tc = conn->c_path[i].cp_transport_data;
        tc->t_tcp_node_detached = false;
        list_add_tail(&tc->t_tcp_node, &rds_tcp_conn_list);
    }
    spin_unlock_irq(&rds_tcp_conn_lock);
}

So the tc objects are accessible to other threads via the transport
list, even though the conn itself never made it into the RDS hashtable.

Functions that iterate rds_tcp_conn_list include:
- rds_tcp_kill_sock() at net/rds/tcp.c:610
- rds_tcp_destroy_conns()
- rds_tcp_sysctl_reset()

All of these dereference tc->t_cpath->cp_conn during iteration.

In the old code, conn_free() ran BEFORE kmem_cache_free(conn), so the tc
was removed from the list before conn was freed. No UAF window existed.

Could the kmem_cache_free(rds_conn_slab, conn) be moved to after
conn_free() at the out: label, or should conn_free() be called before
kmem_cache_free() inside the critical section for the race-loss paths?

> @@ -349,9 +339,23 @@
>
>  out:
>       if (free_cp) {
> -             for (i = 0; i < npaths; i++)
> +             for (i = 0; i < npaths; i++) {
> +                     /*
> +                      * The trans->conn_alloc call may have allocated
> +                      * resources for the cp paths, which will need to
> +                      * be freed before freeing cp itself.  We do this here
> +                      * after the out: label so it runs outside the
> +                      * rds_conn_lock spinlock and rcu_read_lock section,
> +                      * since both destroy_workqueue() and conn_free can
> +                      * block. The local free_trans pointer is only set in
> +                      * the race-loss paths where conn_alloc() succeeded.
> +                      */
> +                     if (free_trans && free_cp[i].cp_transport_data)
> +                             free_trans->conn_free
> +                                     (free_cp[i].cp_transport_data);
>                       if (free_cp[i].cp_wq != rds_wq)
>                               destroy_workqueue(free_cp[i].cp_wq);
> +             }
>               kfree(free_cp);
>       }

Re: [net-next,v5,1/2] net/rds: Refactor __rds_conn_create for blocking transport cleanup

Reply via email to