> -----Original Message-----
> From: Ferruh Yigit <ferruh.yi...@amd.com>
> Sent: Friday, April 12, 2024 6:23 AM
> To: Alan Elder <alan.el...@microsoft.com>; Long Li <lon...@microsoft.com>;
> Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <step...@networkplumber.org>
> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
> queues
> 
> On 4/11/2024 9:45 PM, Alan Elder wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yi...@amd.com>
> >> Sent: Thursday, April 11, 2024 7:38 AM
> >> To: Alan Elder <alan.el...@microsoft.com>; Long Li
> >> <lon...@microsoft.com>; Andrew Rybchenko
> >> <andrew.rybche...@oktetlabs.ru>
> >> Cc: dev@dpdk.org; stephen <step...@networkplumber.org>
> >> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
> >> Rx queues
> >>
> >> On 3/19/2024 2:16 PM, Alan Elder wrote:
> >>> The previous code allowed the number of Tx queues to be set higher
> >>> than the number of Rx queues.  If a packet was sent on a Tx queue
> >>> with index
> >>>> = number Rx queues there was a segfault.
> >>> This commit fixes the issue by creating an Rx queue for every Tx
> >>> queue meaning that an event buffer is allocated to handle receiving
> >>> Tx completion messages.
> >>>
> >>> mbuf pool and Rx ring are not allocated for these additional Rx
> >>> queues and RSS configuration ensures that no packets are received on
> them.
> >>>
> >>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> >>> Cc: sthem...@microsoft.com
> >>> Cc: sta...@dpdk.org
> >>>
> >>> Signed-off-by: Alan Elder <alan.el...@microsoft.com>
> >>>
> >>
> >> Hi Alan,
> >>
> >> What is the root cause of the crash, is it in driver scope or application?
> >
> > Hi Ferruh,
> >
> > The root cause of the crash was in the driver - a packet received on a Tx
> queue that had no corresponding Rx queue would cause the dev->data-
> >rx_queues[] array to be accessed past the length of the array.
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
> _rxtx.
> >
> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
> 4a64
> >
> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
> 3848514
> >
> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> iV2luMzI
> >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
> cSm6g0
> > dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
> >
> >
> 
> Why there is an access to Rx queue when processing Tx queue?
> 
> A backtrace of the crash can help to understand the issue, can you please
> include this in commit log, plus some explanation why crash happens?
> 
> Thanks,
> Ferruh

Hi Ferruh,

Netvsc slow path needs to handle Tx completion messages (to know when it can 
reclaim Tx buffers).  Tx completion messages are received on Rx queue, which is 
why the Rx queue is accessed as part of transmit processing.

An example call stack is:

#6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
#7  hn_process_events (hv=, queue_id=2, tx_limit=) at 
/drivers/net/netvsc/hn_rxtx.c
#8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c

Which leads to the SEGV as 0x20 is not a valid address.

I'll update the commit messages and resubmit the patch.

Thanks,
Alan 

Reply via email to