> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: Wednesday, May 27, 2026 3:38 AM
> To: Wei Hu <[email protected]>
> Cc: [email protected]; Long Li <[email protected]>; Wei Hu
> <[email protected]>
> Subject: [EXTERNAL] Re: [PATCH v3 0/2] net/mana: add device reset support
> 
> 
> Warnings
> 
> 3. Secondary process: qsbr does not actually quiesce secondary lcores.
> 
>    rte_rcu_qsbr_thread_register is only called from
>    mana_dev_configure, which the secondary never runs.  The
>    secondary's rx_burst/tx_burst still call thread_online/offline
>    against the shared qsv, but the secondary tids are unregistered,
>    so they are invisible to rte_rcu_qsbr_check in the primary, and
>    the secondary MP handler (mana_mp_reset_enter) does not call
>    qsbr_check at all -- it just sets db_page = NULL and munmaps.
> 
>    The dev_state check at the top of secondary tx_burst is racy:
>    the page can be munmapped after the in-loop read of db_page but
>    before the doorbell write at the bottom.  The "All secondary
>    threads are quiescent" log line in mana_mp_reset_enter is not
>    true.
> 
>    The secondary needs a real reader-side primitive -- its own
>    qsbr with secondary lcore registration, or an rwlock the MP
>    handler takes before munmap.
> 

Thanks for the v3 review, @Stephen. I will send out v4 to incorporate most
of the review comments except for this one. 

The review on this point is not correct. Here I am providing analysis from
 AI and my own test results to show why.

The concern is that "rte_rcu_qsbr_thread_register is only called
from mana_dev_configure, which the secondary never runs", so
secondary tids are unregistered and invisible to rte_rcu_qsbr_check.

This is incorrect. The RCU thread IDs are per-queue, not per-process.

The tid used in rte_rcu_qsbr_thread_online/offline is:

  - RX path: tid = rxq->rxq_idx        (0 to num_queues-1)
  - TX path: tid = num_queues + txq_idx (num_queues to 2*num_queues-1)

These tids are registered once by primary in mana_dev_configure
(mana.c:117), which calls rte_rcu_qsbr_thread_register for IDs
0..2*num_queues-1. The QSV (priv->dev_state_qsv) is allocated
with rte_zmalloc_socket in shared hugepage memory (mana.c:2207).
The queue structures (rxq, txq) also live in shared memory via
dev->data->rx_queues[] / tx_queues[].

When a secondary lcore polls a queue, it calls
rte_rcu_qsbr_thread_online(qsv, tid) with the SAME shared QSV
and SAME queue-based tid. DPDK's fundamental model requires that
a given queue is polled by exactly one lcore at a time (queues
are not thread-safe), so the tid maps 1:1 to a lcore regardless
of which process that lcore belongs to. The tid IS registered ―
registration is a property of the QSV structure, not of the process.

The reset synchronization flow (mana_reset_enter, mana.c:1458-1488):

  1. Primary: dev_state = MANA_DEV_RESET_ENTER (release store)
  2. Primary: rte_rcu_qsbr_start + rte_rcu_qsbr_check ― blocks
     until ALL registered tids have passed through thread_offline
  3. Secondary lcores in rx_burst/tx_burst eventually call
     rte_rcu_qsbr_thread_offline, and on re-entry see
     dev_state != ACTIVE → return 0 without touching db_page
  4. Only after QSBR check passes (ALL tids quiescent, including
     secondary's) does primary proceed to mana_dev_stop (line 1472)
  5. Primary sends MANA_MP_REQ_RESET_ENTER IPC (line 1481)
  6. Secondary MP handler: sets db_page = NULL, munmaps

The alleged race ― "db_page can be munmapped after the in-loop
read but before the doorbell write" ― cannot happen because:

  - db_page is munmapped at step 6, which is AFTER step 2 (QSBR).
  - After QSBR passes, any secondary re-entry into tx_burst/rx_burst
    reads dev_state with acquire ordering (tx.c:214, rx.c:468),
    sees RESET_ENTER, and returns immediately without using the
    local db_page copy.
  - The memory ordering is sound: primary's release store of
    dev_state (step 1) happens-before QSBR start (step 2).
    Secondary's thread_offline (release) synchronizes-with
    primary's QSBR check (acquire). On secondary re-entry,
    thread_online + acquire load of dev_state sees the update.

The secondary MP handler does NOT need its own QSBR check ―
the primary's check at step 2 already covers all data path
threads including secondary's.

Note: The log "All secondary threads are quiescent" in
mana_mp_reset_enter is misleading and has been removed.
The actual quiescence is enforced by the primary's QSBR
check before IPC is sent.

Followings are from my own test. I added the debug log in mana_tx_burst() call.

diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 115bc722f4..038a3d9e00 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -454,6 +454,9 @@ struct mana_txq {
        struct mana_stats stats;
        unsigned int socket;
        unsigned int txq_idx;
+#if 1
+       unsigned int call_cnt;
+#endif
 };

 struct mana_rxq {
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index 3b07a8b9a6..3ae825190e 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -210,6 +210,14 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
        }

        rte_rcu_qsbr_thread_online(dstate_qsv, tid);
+#if 1
+       if (txq->call_cnt++ % 1000000 == 0) {
+               if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+                       DP_LOG(ERR, "secondary tid %u online", tid);
+               else
+                       DP_LOG(ERR, "primary tid %u online", tid);
+       }
+#endif

        if (unlikely(rte_atomic_load_explicit(&priv->dev_state,
                            rte_memory_order_acquire) != MANA_DEV_ACTIVE || 
!db_page)) {


Primary command:
dpdk-testpmd --log-level='.*,8' -l 2-4 ... --proc-type=primary -- --nb-cores 2 
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 0

Secondary command:
dpdk-testpmd --log-level='.*,8' -l 6-8 ... --proc-type=secondary -- --nb-cores 
2 
--forward-mode=txonly ... --txq=4 --rxq=4 ... --num-procs=2 --proc-id 1

Here in both commands, I specify the number of txq and rxq to be 4.

After primary start, the log shows 8 threads total were registered in 
mana_dev_configure 
and their tids:

Configuring Port 0 (socket 0)
MANA_DRIVER: mana_dev_configure(): priv 0x1003cdc80, port 0, dev port 1, 
num_queues: 4
MANA_DRIVER: mana_dev_configure(): Register thread 0x0 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x1 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x2 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x3 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x4 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x5 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x6 for priv 0x1003cdc80, 
port 0
MANA_DRIVER: mana_dev_configure(): Register thread 0x7 for priv 0x1003cdc80, 
port 0
ETHDEV: Port 0 Rx offload RSS_HASH is not requested but enabled

Note tid 0x0 to 0x3 are rx threads, 0x4 to 0x7 are tx threads. Then the primary 
only printed out the tid 4 and 5 sere online:

MANA_DRIVER: primary tid 4 online
MANA_DRIVER: primary tid 5 online

Then I started the secondary and saw tid 6 and 7 were online in its 
mana_tx_burst call:

MANA_DRIVER: secondary tid 7 online
MANA_DRIVER: secondary tid 6 online
MANA_DRIVER: secondary tid 7 online

This simply means primary has registered all threads which would be up in both 
primary and secondary in mana_ddev_configure. Primary only starts half of the 
threads since  --num-procs=2 passed in already tells primary that there will be 
a secondary to start. 
When the secondary starts, it takes the rest two tx and rx threads. All these 
threads 
have already been registered in mana_dev_configure() by calling 
rte_rcu_qsbr_thread_register() in the primary.

So, the review comments on this point is incorrect. All threads are covered and
there is no race at all.

Thanks,
Wei

Reply via email to