RE: [PATCH v3 01/19] mbuf: replace term sanity check

2023-05-20 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Friday, 19 May 2023 19.46
> 
> Replace rte_mbuf_sanity_check() with rte_mbuf_verify()
> to match the similar macro RTE_VERIFY() in rte_debug.h
> 
> The term sanity check is on the Tier 2 list of words
> that should be replaced.
> 
> Signed-off-by: Stephen Hemminger 

Acked-by: Morten Brørup 



Re: [PATCH] eal: fix eal init may failed when too much continuous memsegs under legacy mode

2023-05-20 Thread Burakov, Anatoly

Hi,

On 5/16/2023 1:21 PM, Fengnan Chang wrote:

Under legacy mode, if the number of continuous memsegs greater
than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though
another memseg list is empty, because only one memseg list used
to check in remap_needed_hugepages.

For example:
hugepage configure:
20480
13370
7110

startup log:
EAL: Detected memory type: socket_id:0 hugepage_sz:2097152
EAL: Detected memory type: socket_id:1 hugepage_sz:2097152
EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152
EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152
EAL: Requesting 13370 pages of size 2MB from socket 0
EAL: Requesting 7110 pages of size 2MB from socket 1
EAL: Attempting to map 14220M on socket 1
EAL: Allocated 14220M on socket 1
EAL: Attempting to map 26740M on socket 0
EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in
configuration.


Unrelated, but this is probably a wrong message, this should've called 
out the config options to change, not their values. Sounds like a log 
message needs fixing somewhere...



EAL: Couldn't remap hugepage files into memseg lists
EAL: FATAL: Cannot init memory
EAL: Cannot init memory

Signed-off-by: Fengnan Chang 
Signed-off-by: Lin Li 
---
  lib/eal/linux/eal_memory.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
index 60fc8cc6ca..36b9e78f5f 100644
--- a/lib/eal/linux/eal_memory.c
+++ b/lib/eal/linux/eal_memory.c
@@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *hugepages, 
int n_pages)
if (cur->size == 0)
break;
  
+		if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST)

+   new_memseg = 1;


I don't think this is quite right, because technically, 
`RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment 
lists - larger page sizes segment lists will hit their limits earlier. 
So, while this will work for 2MB pages, it won't work for page sizes 
which segment list length is smaller than the maximum (such as 1GB pages).


I think this solution could be improved upon by trying to break up the 
contiguous area instead. I suspect the core of the issue is not even the 
fact that we're exceeding limits of one memseg list, but that we're 
always attempting to map exactly N pages in `remap_hugepages`, which 
results in us leaving large contiguous zones inside memseg lists unused 
because we couldn't satisfy current allocation request and skipped to a 
new memseg list.


For example, let's suppose we found a large contiguous area that 
would've exceeded limits of current memseg list. Sooner or later, this 
contiguous area will end, and we'll attempt to remap this virtual area 
into a memseg list. Whenever that happens, we call into the remap code, 
which will start with first segment, attempt to find exactly N number of 
free spots, fail to do so, and skip to the next segment list.


Thus, sooner or later, if we get contiguous areas that are large enough, 
we will not populate our memseg lists but instead skip through them, and 
start with a new memseg list every time we need a large contiguous area. 
We prioritize having a large contiguous area over using up all of our 
memory map.


If, instead, we could break up the allocation - that is, use 
`rte_fbarray_find_biggest_free()` instead of 
`rte_fbarray_find_next_n_free()`, and keep doing it until we run out of 
segment lists, we will achieve the same result your patch does, but have 
it work for all page sizes, because now we would be targeting the actual 
issue (under-utilization of memseg lists), not its symptoms (exceeding 
segment list limits for large allocations).


This logic could either be inside `remap_hugepages`, or we could just 
return number of pages mapped from `remap_hugepages`, and have the 
calling code (`remap_needed_hugepages`) try again, this time with a 
different start segment, reflecting how much pages we actually mapped. 
IMO this would be easier to implement, as `remap_hugepages` is overly 
complex as it is!



if (cur_page == 0)
new_memseg = 1;
else if (cur->socket_id != prev->socket_id)


--
Thanks,
Anatoly



Re: [v4] net/gve: check driver compatibility

2023-05-20 Thread Rushil Gupta
We are not validating anything.
This is for our internal analysis and product requirements.

On Fri, May 19, 2023 at 1:56 PM Stephen Hemminger <
step...@networkplumber.org> wrote:

> On Fri, 19 May 2023 13:46:18 -0700
> Rushil Gupta  wrote:
>
> > +#include 
> >
> >  #include "../gve_logs.h"
> >
> > +#ifdef __linux__
> > +#include 
> > +#endif
> > +
> >  typedef uint8_t u8;
> >  typedef uint16_t u16;
> >  typedef uint32_t u32;
> > @@ -73,6 +78,12 @@ typedef rte_iova_t dma_addr_t;
> >
> >  #define msleep(ms)   rte_delay_ms(ms)
> >
> > +#define OS_VERSION_STRLEN 128
> > +struct os_version_string {
> > + char os_version_str1[OS_VERSION_STRLEN];
> > + char os_version_str2[OS_VERSION_STRLEN];
> > +};
> > +
>
> Not sure this a good idea. Are you having the host validate
> against DPDK versions. This is a bad idea.
>
> Better to use feature bits like virtio and not be creating
> and validating strings about versions.  For example, ever minor
> stable release changes this.
>


[PATCH 1/1] vfio: Make buildable with MUSL runtime

2023-05-20 Thread Philip Prindeville
From: Philip Prindeville 

pread64() and pwrite64() are declared in  in MUSL and
other (i.e. not glibc) C runtimes.

Signed-off-by: Philip Prindeville 
---
 drivers/bus/pci/linux/pci_vfio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/pci/linux/pci_vfio.c b/drivers/bus/pci/linux/pci_vfio.c
index 
fab3483d9f8108fafaff6d022cd41165d1ec5a61..fe83e1a04ec2e7a891425144be3e0aacecb24f09
 100644
--- a/drivers/bus/pci/linux/pci_vfio.c
+++ b/drivers/bus/pci/linux/pci_vfio.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2010-2014 Intel Corporation
  */
 
+#include 
 #include 
 #include 
 #include 
-- 
2.34.1



Re: [PATCH v4] lib/net: add MPLS insert and strip functionality

2023-05-20 Thread Tanzeel Ahmed
Gentle Ping

On Tue, Apr 18, 2023 at 6:18 PM Tanzeel Ahmed 
wrote:

> Ping.
>
> On Thu, Mar 9, 2023 at 11:35 PM Tanzeel Ahmed 
> wrote:
>
>> Any updates?
>>
>>
>> On Sat, Feb 25, 2023 at 6:53 PM Tanzeel-inline 
>> wrote:
>>
>>> From: Tanzeel Ahmed 
>>>
>>> This patch is new version of [PATCH] lib/net: added push MPLS header API.
>>> I have also added the MPLS strip functionality to address the question
>>> asked in last patch.
>>>
>>> > You should explain why you add this function.
>>> None of the foundational NICs currently supports MPLS insertion and
>>> stripping, this functionality can help users who rely on MPLS in their
>>> network application.
>>>
>>> > I'm not sure it should be inline
>>> I did for performance in high-traffic application.
>>>
>>> Signed-off-by: Tanzeel Ahmed 
>>>
>>> ---
>>> v4:
>>> * Removed extra void cast.
>>> * rte_pktmbuf_append/mtod now return void*.
>>>   The memmove result is casted to rte_ether_hdr*.
>>>
>>> v3:
>>> * fixed patch check failure issue
>>>
>>> v2:
>>> * marked experimental
>>> * coding style fixed
>>> * changed rte_memcpy to memcpy
>>> * mpls header marked as const in parameter
>>> * added MPLS stripping functionality
>>> ---
>>>  .mailmap   |  1 +
>>>  lib/net/rte_mpls.h | 97
>>> ++
>>>  2 files changed, 98 insertions(+)
>>>
>>> diff --git a/.mailmap b/.mailmap
>>> index a9f4f28..2af4e0d 100644
>>> --- a/.mailmap
>>> +++ b/.mailmap
>>> @@ -1312,6 +1312,7 @@ Takeshi Yoshimura  <
>>> t.yoshimura8...@gmail.com>
>>>  Takuya Asada 
>>>  Tal Avraham 
>>>  Tal Shnaiderman  
>>> +Tanzeel Ahmed 
>>>  Tao Y Yang 
>>>  Tao Zhu 
>>>  Taripin Samuel 
>>> diff --git a/lib/net/rte_mpls.h b/lib/net/rte_mpls.h
>>> index 51523e7..d7e267f 100644
>>> --- a/lib/net/rte_mpls.h
>>> +++ b/lib/net/rte_mpls.h
>>> @@ -13,6 +13,8 @@
>>>
>>>  #include 
>>>  #include 
>>> +#include 
>>> +#include 
>>>
>>>  #ifdef __cplusplus
>>>  extern "C" {
>>> @@ -36,6 +38,101 @@ struct rte_mpls_hdr {
>>> uint8_t  ttl;   /**< Time to live. */
>>>  } __rte_packed;
>>>
>>> +#define RTE_MPLS_HLEN 4 /**< Length of MPLS header. */
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * Insert MPLS header into the packet.
>>> + * If it's first MPLS header to be inserted in the packet,
>>> + *  - Updates the ether type.
>>> + *  - Sets the MPLS bottom-of-stack bit to 1.
>>> + *
>>> + * @param m
>>> + *   The pointer to the mbuf.
>>> + * @param mp
>>> + *   The pointer to the MPLS header.
>>> + * @return
>>> + *   0 on success, -1 on error
>>> + */
>>> +__rte_experimental
>>> +static inline int
>>> +rte_mpls_push_over_l2(struct rte_mbuf **m, const struct rte_mpls_hdr
>>> *mp)
>>> +{
>>> +   void *os, *ns;
>>> +   struct rte_ether_hdr *nh;
>>> +   struct rte_mpls_hdr *mph;
>>> +
>>> +   /* Can't insert header if mbuf is shared */
>>> +   if (!RTE_MBUF_DIRECT(*m) || rte_mbuf_refcnt_read(*m) > 1)
>>> +   return -EINVAL;
>>> +
>>> +   /* Can't insert header if ethernet frame doesn't exist */
>>> +   if (rte_pktmbuf_data_len(*m) < RTE_ETHER_HDR_LEN)
>>> +   return -EINVAL;
>>> +
>>> +   os = rte_pktmbuf_mtod(*m, void *);
>>> +   ns = (void *)rte_pktmbuf_prepend(*m, sizeof(struct
>>> rte_mpls_hdr));
>>> +   if (ns == NULL)
>>> +   return -ENOSPC;
>>> +
>>> +   nh = (struct rte_ether_hdr *)memmove(ns, os, RTE_ETHER_HDR_LEN);
>>> +
>>> +   /* Copy the MPLS header after ethernet frame */
>>> +   mph = rte_pktmbuf_mtod_offset(*m, struct rte_mpls_hdr*,
>>> +   sizeof(struct rte_ether_hdr));
>>> +   memcpy(mph, mp, RTE_MPLS_HLEN);
>>> +
>>> +   mph->tag_msb = rte_cpu_to_be_16(mp->tag_msb);
>>> +
>>> +   /* If first MPLS header, update ether type and bottom-of-stack
>>> bit */
>>> +   if (nh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS)) {
>>> +   nh->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS);
>>> +   mph->bs = 1;
>>> +   } else {
>>> +   mph->bs = 0;
>>> +   }
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * Strips MPLS from the packet. Doesn't update the ether type
>>> + *
>>> + * @param m
>>> + *   The pointer to the mbuf.
>>> + * @return
>>> + *   0 on success, -1 on error
>>> + */
>>> +__rte_experimental
>>> +static inline int
>>> +rte_mpls_strip_over_l2(struct rte_mbuf *m)
>>> +{
>>> +   struct rte_ether_hdr *eh = rte_pktmbuf_mtod(m, struct
>>> rte_ether_hdr *);
>>> +   struct rte_mpls_hdr *mph;
>>> +   bool mpls_exist = true;
>>> +
>>> +   if (eh->ether_type != rte_cpu_to_be_16(RTE_ETHER_TYPE_MPLS))
>>> +   return -1;
>>> +
>>> +   /* Stripping all MPLS header */
>>> +   while (mpls_exist) {
>>> +   mph = rte_pktmbuf_mtod_offset(m, struct rte_mpls_hdr*,
>>> +