RE: [PATCH v3 01/19] mbuf: replace term sanity check
> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Friday, 19 May 2023 19.46 > > Replace rte_mbuf_sanity_check() with rte_mbuf_verify() > to match the similar macro RTE_VERIFY() in rte_debug.h > > The term sanity check is on the Tier 2 list of words > that should be replaced. > > Signed-off-by: Stephen Hemminger Acked-by: Morten Brørup
Re: [PATCH] eal: fix eal init may failed when too much continuous memsegs under legacy mode
Hi, On 5/16/2023 1:21 PM, Fengnan Chang wrote: Under legacy mode, if the number of continuous memsegs greater than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though another memseg list is empty, because only one memseg list used to check in remap_needed_hugepages. For example: hugepage configure: 20480 13370 7110 startup log: EAL: Detected memory type: socket_id:0 hugepage_sz:2097152 EAL: Detected memory type: socket_id:1 hugepage_sz:2097152 EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152 EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152 EAL: Requesting 13370 pages of size 2MB from socket 0 EAL: Requesting 7110 pages of size 2MB from socket 1 EAL: Attempting to map 14220M on socket 1 EAL: Allocated 14220M on socket 1 EAL: Attempting to map 26740M on socket 0 EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in configuration. Unrelated, but this is probably a wrong message, this should've called out the config options to change, not their values. Sounds like a log message needs fixing somewhere... EAL: Couldn't remap hugepage files into memseg lists EAL: FATAL: Cannot init memory EAL: Cannot init memory Signed-off-by: Fengnan Chang Signed-off-by: Lin Li --- lib/eal/linux/eal_memory.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 60fc8cc6ca..36b9e78f5f 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages) if (cur->size == 0) break; + if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST) + new_memseg = 1; I don't think this is quite right, because technically, `RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment lists - larger page sizes segment lists will hit their limits earlier. So, while this will work for 2MB pages, it won't work for page sizes which segment list length is smaller than the maximum (such as 1GB pages). I think this solution could be improved upon by trying to break up the contiguous area instead. I suspect the core of the issue is not even the fact that we're exceeding limits of one memseg list, but that we're always attempting to map exactly N pages in `remap_hugepages`, which results in us leaving large contiguous zones inside memseg lists unused because we couldn't satisfy current allocation request and skipped to a new memseg list. For example, let's suppose we found a large contiguous area that would've exceeded limits of current memseg list. Sooner or later, this contiguous area will end, and we'll attempt to remap this virtual area into a memseg list. Whenever that happens, we call into the remap code, which will start with first segment, attempt to find exactly N number of free spots, fail to do so, and skip to the next segment list. Thus, sooner or later, if we get contiguous areas that are large enough, we will not populate our memseg lists but instead skip through them, and start with a new memseg list every time we need a large contiguous area. We prioritize having a large contiguous area over using up all of our memory map. If, instead, we could break up the allocation - that is, use `rte_fbarray_find_biggest_free()` instead of `rte_fbarray_find_next_n_free()`, and keep doing it until we run out of segment lists, we will achieve the same result your patch does, but have it work for all page sizes, because now we would be targeting the actual issue (under-utilization of memseg lists), not its symptoms (exceeding segment list limits for large allocations). This logic could either be inside `remap_hugepages`, or we could just return number of pages mapped from `remap_hugepages`, and have the calling code (`remap_needed_hugepages`) try again, this time with a different start segment, reflecting how much pages we actually mapped. IMO this would be easier to implement, as `remap_hugepages` is overly complex as it is! if (cur_page == 0) new_memseg = 1; else if (cur->socket_id != prev->socket_id) -- Thanks, Anatoly
Re: [v4] net/gve: check driver compatibility
We are not validating anything. This is for our internal analysis and product requirements. On Fri, May 19, 2023 at 1:56 PM Stephen Hemminger < step...@networkplumber.org> wrote: > On Fri, 19 May 2023 13:46:18 -0700 > Rushil Gupta wrote: > > > +#include > > > > #include "../gve_logs.h" > > > > +#ifdef __linux__ > > +#include > > +#endif > > + > > typedef uint8_t u8; > > typedef uint16_t u16; > > typedef uint32_t u32; > > @@ -73,6 +78,12 @@ typedef rte_iova_t dma_addr_t; > > > > #define msleep(ms) rte_delay_ms(ms) > > > > +#define OS_VERSION_STRLEN 128 > > +struct os_version_string { > > + char os_version_str1[OS_VERSION_STRLEN]; > > + char os_version_str2[OS_VERSION_STRLEN]; > > +}; > > + > > Not sure this a good idea. Are you having the host validate > against DPDK versions. This is a bad idea. > > Better to use feature bits like virtio and not be creating > and validating strings about versions. For example, ever minor > stable release changes this. >
[PATCH 1/1] vfio: Make buildable with MUSL runtime
From: Philip Prindeville pread64() and pwrite64() are declared in in MUSL and other (i.e. not glibc) C runtimes. Signed-off-by: Philip Prindeville --- drivers/bus/pci/linux/pci_vfio.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c index fab3483d9f8108fafaff6d022cd41165d1ec5a61..fe83e1a04ec2e7a891425144be3e0aacecb24f09 100644 --- a/drivers/bus/pci/linux/pci_vfio.c +++ b/drivers/bus/pci/linux/pci_vfio.c @@ -2,6 +2,7 @@ * Copyright(c) 2010-2014 Intel Corporation */ +#include #include #include #include -- 2.34.1
Re: [PATCH v4] lib/net: add MPLS insert and strip functionality
Gentle Ping On Tue, Apr 18, 2023 at 6:18 PM Tanzeel Ahmed wrote: > Ping. > > On Thu, Mar 9, 2023 at 11:35 PM Tanzeel Ahmed > wrote: > >> Any updates? >> >> >> On Sat, Feb 25, 2023 at 6:53 PM Tanzeel-inline >> wrote: >> >>> From: Tanzeel Ahmed >>> >>> This patch is new version of [PATCH] lib/net: added push MPLS header API. >>> I have also added the MPLS strip functionality to address the question >>> asked in last patch. >>> >>> > You should explain why you add this function. >>> None of the foundational NICs currently supports MPLS insertion and >>> stripping, this functionality can help users who rely on MPLS in their >>> network application. >>> >>> > I'm not sure it should be inline >>> I did for performance in high-traffic application. >>> >>> Signed-off-by: Tanzeel Ahmed >>> >>> --- >>> v4: >>> * Removed extra void cast. >>> * rte_pktmbuf_append/mtod now return void*. >>> The memmove result is casted to rte_ether_hdr*. >>> >>> v3: >>> * fixed patch check failure issue >>> >>> v2: >>> * marked experimental >>> * coding style fixed >>> * changed rte_memcpy to memcpy >>> * mpls header marked as const in parameter >>> * added MPLS stripping functionality >>> --- >>> .mailmap | 1 + >>> lib/net/rte_mpls.h | 97 >>> ++ >>> 2 files changed, 98 insertions(+) >>> >>> diff --git a/.mailmap b/.mailmap >>> index a9f4f28..2af4e0d 100644 >>> --- a/.mailmap >>> +++ b/.mailmap >>> @@ -1312,6 +1312,7 @@ Takeshi Yoshimura < >>> t.yoshimura8...@gmail.com> >>> Takuya Asada >>> Tal Avraham >>> Tal Shnaiderman >>> +Tanzeel Ahmed >>> Tao Y Yang >>> Tao Zhu >>> Taripin Samuel >>> diff --git a/lib/net/rte_mpls.h b/lib/net/rte_mpls.h >>> index 51523e7..d7e267f 100644 >>> --- a/lib/net/rte_mpls.h >>> +++ b/lib/net/rte_mpls.h >>> @@ -13,6 +13,8 @@ >>> >>> #include >>> #include >>> +#include >>> +#include >>> >>> #ifdef __cplusplus >>> extern "C" { >>> @@ -36,6 +38,101 @@ struct rte_mpls_hdr { >>> uint8_t ttl; /**< Time to live. */ >>> } __rte_packed; >>> >>> +#define RTE_MPLS_HLEN 4 /**< Length of MPLS header. */ >>> + >>> +/** >>> + * @warning >>> + * @b EXPERIMENTAL: this API may change without prior notice. >>> + * >>> + * Insert MPLS header into the packet. >>> + * If it's first MPLS header to be inserted in the packet, >>> + * - Updates the ether type. >>> + * - Sets the MPLS bottom-of-stack bit to 1. >>> + * >>> + * @param m >>> + * The pointer to the mbuf. >>> + * @param mp >>> + * The pointer to the MPLS header. >>> + * @return >>> + * 0 on success, -1 on error >>> + */ >>> +__rte_experimental >>> +static inline int >>> +rte_mpls_push_over_l2(struct rte_mbuf **m, const struct rte_mpls_hdr >>> *mp) >>> +{ >>> + void *os, *ns; >>> + struct rte_ether_hdr *nh; >>> + struct rte_mpls_hdr *mph; >>> + >>> + /* Can't insert header if mbuf is shared */ >>> + if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1) >>> + return -EINVAL; >>> + >>> + /* Can't insert header if ethernet frame doesn't exist */ >>> + if (rte_pktmbuf_data_len(*m) < RTE_ETHER_HDR_LEN) >>> + return -EINVAL; >>> + >>> + os = rte_pktmbuf_mtod(*m, void *); >>> + ns = (void *)rte_pktmbuf_prepend(*m, sizeof(struct >>> rte_mpls_hdr)); >>> + if (ns == NULL) >>> + return -ENOSPC; >>> + >>> + nh = (struct rte_ether_hdr *)memmove(ns, os, RTE_ETHER_HDR_LEN); >>> + >>> + /* Copy the MPLS header after ethernet frame */ >>> + mph = rte_pktmbuf_mtod_offset(*m, struct rte_mpls_hdr*, >>> + sizeof(struct rte_ether_hdr)); >>> + memcpy(mph, mp, RTE_MPLS_HLEN); >>> + >>> + mph->tag_msb = rte_cpu_to_be_16(mp->tag_msb); >>> + >>> + /* If first MPLS header, update ether type and bottom-of-stack >>> bit */ >>> + if (nh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) { >>> + nh->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS); >>> + mph->bs = 1; >>> + } else { >>> + mph->bs = 0; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +/** >>> + * @warning >>> + * @b EXPERIMENTAL: this API may change without prior notice. >>> + * >>> + * Strips MPLS from the packet. Doesn't update the ether type >>> + * >>> + * @param m >>> + * The pointer to the mbuf. >>> + * @return >>> + * 0 on success, -1 on error >>> + */ >>> +__rte_experimental >>> +static inline int >>> +rte_mpls_strip_over_l2(struct rte_mbuf *m) >>> +{ >>> + struct rte_ether_hdr *eh = rte_pktmbuf_mtod(m, struct >>> rte_ether_hdr *); >>> + struct rte_mpls_hdr *mph; >>> + bool mpls_exist = true; >>> + >>> + if (eh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) >>> + return -1; >>> + >>> + /* Stripping all MPLS header */ >>> + while (mpls_exist) { >>> + mph = rte_pktmbuf_mtod_offset(m, struct rte_mpls_hdr*, >>> +