On 9/26/2024 3:03 PM, Meade, Niall wrote: >> From: Ferruh Yigit <ferruh.yi...@amd.com> >> Sent: Thursday, September 26, 2024 12:16 AM >> To: Meade, Niall <niall.me...@intel.com>; Thomas Monjalon >> <tho...@monjalon.net>; Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>; >> Roman Zhukov <roman.zhu...@arknetworks.am> >> Cc: dev@dpdk.org <dev@dpdk.org> >> Subject: Re: [PATCH v1] ethdev: fix int overflow in descriptor count logic > <snip> >>> The resolution involves upcasting nb_desc to a uint32_t before the >>> RTE_ALIGN_CEIL macro is applied. This change ensures that the subsequent >>> call to RTE_ALIGN_FLOOR(nb_desc + (nb_align - 1), nb_align) does not >>> result in an overflow, as it would when nb_desc is a uint16_t. By using >>> a uint32_t for these operations, the correct behavior is maintained >>> without the risk of overflow. >>> >> >> Hi Niall, > > Hi Ferruh, > >> Thanks for the patch. >> >> For the 'RTE_ALIGN_CEIL(val, align)' macro, 'align' should be power of >> two, as 'desc_lim->nb_align' is uint16_t, max value it can get is 2^15. >> 'val' should be smaller than or equal to 'align', so '*nb_desc' can be >> maximum 2^15. >> >> So RTE_ALIGN_CEIL(2^15-1, 2^15) = 2^15, I think this should work fine >> (although I didn't test). >> >> And even with your uint32_t cast, I think following will fail: >> RTE_ALIGN_CEIL(2^16-1, 2^15) >> (again, not tested). >> > > I tested my code with these values and the behaviour is as expected from > what I can see. > At a high level I ran into this issue when passing uint16_tMAX into > rte_eth_dev_adjust_nb_rx_tx_desc() with the intent of selecting the maximum > ring descriptor size but the minimum was selected. > >> Or maybe I am missing a case, can you please give some actual numbers to >> show the problem and the fix? > > Yes sure! If we take an example of val= (2^16)-1 and align= 32. > RTE_ALIGN_CEIL(val, align) calls RTE_ALIGN_FLOOR(val + align - 1, align). With > val as a uint16_t this subsequent macro call results in a wrap around for val > (originally was the max uint16_t and now we are attempting to add align to > it). The returned value of RTE_ALIGN_CEIL() in this case is 0. This results in > nb_desc being set to 0, and later set to the minimum ring descriptor size for > that NIC with *nb_desc = RTE_MAX(*nb_desc, desc_lim->nb_min). > > While this example is an unreasonably large request for a descriptor ring > size, > the expected behaviour would be that the descriptor ring size defaults back to > the maximum possible for that particular NIC, not to the minimum which it > currently does. > By introducing a uint32_t, the wrap around in RTE_ALIGN_FLOOR() is avoided, > keeping the large value of nb_desc_32 which is later set to an appropriate > size > in RTE_MIN(*nb_desc_32, desc_lim->nb_max) >
I see the problem now, thanks. When value > (2^16 - align), next aligned value is 2^16, which is UINT16_MAX + 1, hence wraps to 0, this is kind of expected. For the relevant code, assuming 'desc_lim->nb_max' & 'desc_lim->nb_min' are already aligned to 'desc_lim->nb_align', following should fix the issue, that seems simpler to me, what do you think: ``` if (desc_lim->nb_max != 0) *nb_desc = RTE_MIN(*nb_desc, desc_lim->nb_max); nb_desc_32 = RTE_MAX(nb_desc_32, desc_lim->nb_min); if (desc_lim->nb_align != 0) *nb_desc = RTE_ALIGN_CEIL(*nb_desc, desc_lim->nb_align); ``` Basically just changing the order of the operations... It is not easy to see the problem, can you please give sample values in the commit log (for '*nb_desc', 'nb_align', 'nb_max' & 'nb_min'), that makes much easier to see why above works.