On 4/12/2024 5:50 PM, Alan Elder wrote: > > >> -----Original Message----- >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Friday, April 12, 2024 6:23 AM >> To: Alan Elder <alan.el...@microsoft.com>; Long Li <lon...@microsoft.com>; >> Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> >> Cc: dev@dpdk.org; stephen <step...@networkplumber.org> >> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx >> queues >> >> On 4/11/2024 9:45 PM, Alan Elder wrote: >>>> -----Original Message----- >>>> From: Ferruh Yigit <ferruh.yi...@amd.com> >>>> Sent: Thursday, April 11, 2024 7:38 AM >>>> To: Alan Elder <alan.el...@microsoft.com>; Long Li >>>> <lon...@microsoft.com>; Andrew Rybchenko >>>> <andrew.rybche...@oktetlabs.ru> >>>> Cc: dev@dpdk.org; stephen <step...@networkplumber.org> >>>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > >>>> Rx queues >>>> >>>> On 3/19/2024 2:16 PM, Alan Elder wrote: >>>>> The previous code allowed the number of Tx queues to be set higher >>>>> than the number of Rx queues. If a packet was sent on a Tx queue >>>>> with index >>>>>> = number Rx queues there was a segfault. >>>>> This commit fixes the issue by creating an Rx queue for every Tx >>>>> queue meaning that an event buffer is allocated to handle receiving >>>>> Tx completion messages. >>>>> >>>>> mbuf pool and Rx ring are not allocated for these additional Rx >>>>> queues and RSS configuration ensures that no packets are received on >> them. >>>>> >>>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device") >>>>> Cc: sthem...@microsoft.com >>>>> Cc: sta...@dpdk.org >>>>> >>>>> Signed-off-by: Alan Elder <alan.el...@microsoft.com> >>>>> >>>> >>>> Hi Alan, >>>> >>>> What is the root cause of the crash, is it in driver scope or application? >>> >>> Hi Ferruh, >>> >>> The root cause of the crash was in the driver - a packet received on a Tx >> queue that had no corresponding Rx queue would cause the dev->data- >>> rx_queues[] array to be accessed past the length of the array. >>> >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith >>> >> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn >> _rxtx. >>> >> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1 >> 4a64 >>> >> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6 >> 3848514 >>> >> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo >> iV2luMzI >>> >> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q >> cSm6g0 >>> dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0 >>> >>> >> >> Why there is an access to Rx queue when processing Tx queue? >> >> A backtrace of the crash can help to understand the issue, can you please >> include this in commit log, plus some explanation why crash happens? >> >> Thanks, >> Ferruh > > Hi Ferruh, > > Netvsc slow path needs to handle Tx completion messages (to know when it can > reclaim Tx buffers). Tx completion messages are received on Rx queue, which > is why the Rx queue is accessed as part of transmit processing. > > An example call stack is: > > #6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h > #7 hn_process_events (hv=, queue_id=2, tx_limit=) at > /drivers/net/netvsc/hn_rxtx.c > #8 hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c > > Which leads to the SEGV as 0x20 is not a valid address. > > I'll update the commit messages and resubmit the patch. > >
Hi Alan, Thanks for the detail. 'hn_xmit_pkts()' calls 'hn_process_events()' with the Tx queue_id, but 'hn_process_events()' seems designed for Rx event processing since it uses 'queue_id' to get rxq. Does it help to pass queue type to 'hn_process_events()'? And the patch creates Rx queues for access Tx queues. Are the Tx completion packets needs to be delivered to Rx queue with exact same Tx queue_id by design? Or the new Rx queues created just to prevent the crash, by providing 'rxq->ring_lock' etc? Please also check comments on v4, thanks.