On 4/12/2024 5:50 PM, Alan Elder wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>> Sent: Friday, April 12, 2024 6:23 AM
>> To: Alan Elder <alan.el...@microsoft.com>; Long Li <lon...@microsoft.com>;
>> Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>
>> Cc: dev@dpdk.org; stephen <step...@networkplumber.org>
>> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 4/11/2024 9:45 PM, Alan Elder wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yi...@amd.com>
>>>> Sent: Thursday, April 11, 2024 7:38 AM
>>>> To: Alan Elder <alan.el...@microsoft.com>; Long Li
>>>> <lon...@microsoft.com>; Andrew Rybchenko
>>>> <andrew.rybche...@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; stephen <step...@networkplumber.org>
>>>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
>>>> Rx queues
>>>>
>>>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>>>> The previous code allowed the number of Tx queues to be set higher
>>>>> than the number of Rx queues.  If a packet was sent on a Tx queue
>>>>> with index
>>>>>> = number Rx queues there was a segfault.
>>>>> This commit fixes the issue by creating an Rx queue for every Tx
>>>>> queue meaning that an event buffer is allocated to handle receiving
>>>>> Tx completion messages.
>>>>>
>>>>> mbuf pool and Rx ring are not allocated for these additional Rx
>>>>> queues and RSS configuration ensures that no packets are received on
>> them.
>>>>>
>>>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>>>> Cc: sthem...@microsoft.com
>>>>> Cc: sta...@dpdk.org
>>>>>
>>>>> Signed-off-by: Alan Elder <alan.el...@microsoft.com>
>>>>>
>>>>
>>>> Hi Alan,
>>>>
>>>> What is the root cause of the crash, is it in driver scope or application?
>>>
>>> Hi Ferruh,
>>>
>>> The root cause of the crash was in the driver - a packet received on a Tx
>> queue that had no corresponding Rx queue would cause the dev->data-
>>> rx_queues[] array to be accessed past the length of the array.
>>>
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>>
>> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
>> _rxtx.
>>>
>> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
>> 4a64
>>>
>> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
>> 3848514
>>>
>> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
>> iV2luMzI
>>>
>> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
>> cSm6g0
>>> dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
>>>
>>>
>>
>> Why there is an access to Rx queue when processing Tx queue?
>>
>> A backtrace of the crash can help to understand the issue, can you please
>> include this in commit log, plus some explanation why crash happens?
>>
>> Thanks,
>> Ferruh
> 
> Hi Ferruh,
> 
> Netvsc slow path needs to handle Tx completion messages (to know when it can 
> reclaim Tx buffers).  Tx completion messages are received on Rx queue, which 
> is why the Rx queue is accessed as part of transmit processing.
> 
> An example call stack is:
> 
> #6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
> #7  hn_process_events (hv=, queue_id=2, tx_limit=) at 
> /drivers/net/netvsc/hn_rxtx.c
> #8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c
> 
> Which leads to the SEGV as 0x20 is not a valid address.
> 
> I'll update the commit messages and resubmit the patch.
> 
> 

Hi Alan,

Thanks for the detail.

'hn_xmit_pkts()' calls 'hn_process_events()' with the Tx queue_id, but
'hn_process_events()' seems designed for Rx event processing since it
uses 'queue_id' to get rxq.
Does it help to pass queue type to 'hn_process_events()'?


And the patch creates Rx queues for access Tx queues. Are the Tx
completion packets needs to be delivered to Rx queue with exact same Tx
queue_id by design?
Or the new Rx queues created just to prevent the crash, by providing
'rxq->ring_lock' etc?

Please also check comments on v4, thanks.

Reply via email to