On Tue, Mar 03, 2026 at 11:56:29AM +0100, Paolo Abeni wrote: > On 2/27/26 11:15 AM, Dipayaan Roy wrote: > > On certain systems configured with 4K PAGE_SIZE, utilizing page_pool > > fragments for RX buffers results in a significant throughput regression. > > Profiling reveals that this regression correlates with high overhead in the > > fragment allocation and reference counting paths on these specific > > platforms, rendering the multi-buffer-per-page strategy counterproductive. > > > > To mitigate this, bypass the page_pool fragment path and force a single RX > > packet per page allocation when all the following conditions are met: > > 1. The system is configured with a 4K PAGE_SIZE. > > 2. A processor-specific quirk is detected via SMBIOS Type 4 data. > > > > This approach restores expected line-rate performance by ensuring > > predictable RX refill behavior on affected hardware. > > > > There is no behavioral change for systems using larger page sizes > > (16K/64K), or platforms where this processor-specific quirk do not > > apply. > > > > Signed-off-by: Dipayaan Roy <[email protected]> > > --- > > .../net/ethernet/microsoft/mana/gdma_main.c | 120 ++++++++++++++++++ > > drivers/net/ethernet/microsoft/mana/mana_en.c | 23 +++- > > include/net/mana/gdma.h | 10 ++ > > 3 files changed, 151 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 0055c231acf6..26bbe736a770 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -9,6 +9,7 @@ > > #include <linux/msi.h> > > #include <linux/irqdomain.h> > > #include <linux/export.h> > > +#include <linux/dmi.h> > > > > #include <net/mana/mana.h> > > #include <net/mana/hw_channel.h> > > @@ -1955,6 +1956,115 @@ static bool mana_is_pf(unsigned short dev_id) > > return dev_id == MANA_PF_DEVICE_ID; > > } > > > > +/* > > + * Table for Processor Version strings found from SMBIOS Type 4 > > information, > > + * for processors that needs to force single RX buffer per page quirk for > > + * meeting line rate performance with ARM64 + 4K pages. > > + * Note: These strings are exactly matched with version fetched from > > SMBIOS. > > + */ > > +static const char * const mana_single_rxbuf_per_page_quirk_tbl[] = { > > + "Cobalt 200", > > +}; > > + > > +static const char *smbios_get_string(const struct dmi_header *hdr, u8 idx) > > +{ > > + const u8 *start, *end; > > + u8 i; > > + > > + /* Indexing starts from 1. */ > > + if (!idx) > > + return NULL; > > + > > + start = (const u8 *)hdr + hdr->length; > > + end = start + SMBIOS_STR_AREA_MAX; > > + > > + for (i = 1; i < idx; i++) { > > + while (start < end && *start) > > + start++; > > + if (start < end) > > + start++; > > + if (start + 1 < end && start[0] == 0 && start[1] == 0) > > + return NULL; > > + } > > + > > + if (start >= end || *start == 0) > > + return NULL; > > + > > + return (const char *)start; > > If I read correctly, the above sort of duplicate dmi_decode_table(). > Yes, its not exported. > I think you are better of: > - use the mana_get_proc_ver_from_smbios() decoder to store the > SMBIOS_TYPE4_PROC_VERSION_OFFSET index into gd > - do a 2nd walk with a different decoder to fetch the string at the > specified index. Sure, will implement the 2nd walk for fetching string in v2.
> > /P Thank you Paolo, for the comments, and apologies in my delay in response as this week I am on-call. I will send out v2 with the changes suggested. Regards

