Re: [dpdk-dev] [PATCH 1/3] ethdev: add RSS hash level
On 12/7/19 3:59 AM, Ajit Khaparde wrote: This patch adds ability to configure RSS hash level in hardware. This feature will allow an application to select RSS hash calculation on outer or inner headers for tunneled packets. Signed-off-by: Ajit Khaparde --- lib/librte_ethdev/rte_ethdev.h | 27 +++ 1 file changed, 27 insertions(+) diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 18a9defc2..5189bdbab 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -444,11 +444,35 @@ struct rte_vlan_filter_conf { * The *rss_hf* field of the *rss_conf* structure indicates the different * types of IPv4/IPv6 packets to which the RSS hashing must be applied. * Supplying an *rss_hf* equal to zero disables the RSS feature. + * + * The *rss_level* field of the *rss_conf* structure indicates the + * Packet encapsulation level RSS hash @p types apply to. + * + * - @p 0 requests the default behavior. Depending on the packet + * type, it can mean outermost, innermost, anything in between or + * even no RSS. + * + * It basically stands for the innermost encapsulation level RSS + * can be performed on according to PMD and device capabilities. + * + * - @p 1 requests RSS to be performed on the outermost packet + * encapsulation level. + * + * - @p 2 and subsequent values request RSS to be performed on the + * specified inner packet encapsulation level, from outermost to + * innermost (lower to higher values). + * + * Support for values other than @p 0 is dependent on the underlying + * hardware in use. + * + * Requesting a specific RSS level on unrecognized traffic results + * in undefined behavior. */ struct rte_eth_rss_conf { uint8_t *rss_key;/**< If not NULL, 40-byte hash key. */ uint8_t rss_key_len; /**< hash key length in bytes. */ uint64_t rss_hf; /**< Hash functions to apply - see below. */ + uint32_t rss_level; /**< RSS hash level */ }; I'm not sure that offload flag is required in this case. I think maximum supported rss_level in dev_info will provide more information and per-queue level does not make sense in this case. Even if per-queue group control is required, it should be doable via rte_flow API RSS action. Anyway, it looks like it is ABI breakage with all consequences. In 64-bit case it is possible to put it before rss_hf to avoid ABI breakage, but it will break ABI on 32-bit anyway. /* @@ -599,6 +623,8 @@ rte_eth_rss_hf_refine(uint64_t rss_hf) ETH_RSS_GENEVE | \ ETH_RSS_NVGRE) +#define ETH_RSS_LEVEL_DEFAULT 0 + /* * Definitions used for redirection table entry size. * Some RSS RETA sizes may not be supported by some drivers, check the @@ -1103,6 +1129,7 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_SCTP_CKSUM 0x0002 #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x0004 #define DEV_RX_OFFLOAD_RSS_HASH 0x0008 +#define DEV_RX_OFFLOAD_RSS_LEVEL 0x0010 #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ DEV_RX_OFFLOAD_UDP_CKSUM | \
[dpdk-dev] CONFIG_RTE_MAX_MEM_MB fails in DPDK18.05
Hello All, Currently, we are facing an issue with memory allocation failure in memseg_primary_init(). When we configure the CONFIG_RTE_MAX_MEM_MB to 512MB and correspondingly configured the number of huge pages for our platform. But the virtual memory allocation is failing. It appears that its trying to allocate CONFIG_RTE_MAX_MEMSEG_PER_LIST * Huge page size (i.e. 8192 * 2MB = 0x4) and virtual memory allocation is failing. Also tried changing the CONFIG_RTE_MAX_MEMSEG_PER_LIST to 64 with which virtual memory allocation is passing for the 128MB (64 * 2MB). But looks like 128MB memory is not enough and it is causing the PCIe enumeration failure. Not able allocate virtual memory beyond 128MB by increasing the CONFIG_RTE_MAX_MEMSEG_PER_LIST beyond 64. Is there are any settings(argument) which we need to pass as part of rte_eal_init() to get success in the virtual memory allocation? Please advise. Thanks, Kamaraj
Re: [dpdk-dev] eventdev DSW question
On 2019-12-06 23:22, Venky Venkatesh wrote: To my understanding, per eventdev API, events are considered in flight between NEW to RELEASE (implicit/explicit). DSW considers events to be in flight between time of enqueue (on the source port), to the time of release (on the destination port). This is regardless if they are FORWARD or NEW-type events. The SW event device only counts NEW events toward the max number of in-flights events limitation. It will however release the atomic context and return the credits at step 4 below. Since there's no centralized point which can backpressure FORWARD events (which I think is how SW avoids being overwhelmed by FORWARDed events), DSW needs to ensure that the max_num_events is never exceeded. Now consider an event (event-1) going thru the following stages: 1. NEW from core-3 2. dequeued by core-1 3. FORWARD 4. core-1 does a next dequeue The event is RELEASEd already at this point. 5. dequeued by core-2 6. RELEASE by core-2/implicit release on next dequeue by core-2 The way I understand DSW implementation this event would use credit at step 1 AND step 3 while releasing in step2 -- right now credit usage is for non_release (i.e NEW and FORWARD). So if between step-2 and step-3 another core puts in a NEW of event-2 that could utilize all the credits of the system and could thus fail step-3 of event-1. The NEW events producer ports' new_event_threshold should be significantly lower than the maximum numbers of in-flight events, and thus leave credits for FORWARDed events. This to my knowledge is not conformant with eventdev. One way to address this is to track the credits for that which are currently in core and not make those credits available to NEW but only for FORWARDs ... there are more details of course. Hope this explains Thanks -Venky On Fri, Dec 6, 2019 at 12:37 PM Mattias Rönnblom < mattias.ronnb...@ericsson.com> wrote: On 2019-12-06 17:32, Venky Venkatesh wrote: Thanks Mattias for the clarifications. 1 more question: This time it is about the inflight accounting for DSW. Here is my understanding: it seems to consider only the events which are *inside the scheduler* as in flight. Yes, like all event devices, I believe. I am trying to distinguish it from those which have been currently given to cores by the scheduler. The latter are not considered in flight since we dsw_port_return_credits as soon as dsw_event_dequeue_burst. A new dequeue means an implicit release of all unreleased events dequeued in the previous call. It's standard Eventdev semantics. So if these events which are in core currently do a FORWARD, there is a chance that those can fail. Ideally those FORWARDs should not fail -- which can happen with the current design as some NEWs can hog those credits freed up by the ones which have been dequeued by cores. What you do to avoid this situation is set the new_event_threshold low-enough, so NEW events don't block FORWARDed ones. Is this design of DSW intentional or an omission? If it is an omission I can work on a possible fix and run it by you. This is not really a DSW design, but rather how Eventdev works.
Re: [dpdk-dev] [PATCH 1/3] ethdev: add RSS hash level
On Sat, Dec 7, 2019 at 1:14 AM Andrew Rybchenko wrote: > On 12/7/19 3:59 AM, Ajit Khaparde wrote: > > This patch adds ability to configure RSS hash level in hardware. > > This feature will allow an application to select RSS hash calculation > > on outer or inner headers for tunneled packets. > > > > Signed-off-by: Ajit Khaparde > > --- > > lib/librte_ethdev/rte_ethdev.h | 27 +++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/lib/librte_ethdev/rte_ethdev.h > b/lib/librte_ethdev/rte_ethdev.h > > index 18a9defc2..5189bdbab 100644 > > --- a/lib/librte_ethdev/rte_ethdev.h > > +++ b/lib/librte_ethdev/rte_ethdev.h > > @@ -444,11 +444,35 @@ struct rte_vlan_filter_conf { > >* The *rss_hf* field of the *rss_conf* structure indicates the > different > >* types of IPv4/IPv6 packets to which the RSS hashing must be applied. > >* Supplying an *rss_hf* equal to zero disables the RSS feature. > > + * > > + * The *rss_level* field of the *rss_conf* structure indicates the > > + * Packet encapsulation level RSS hash @p types apply to. > > + * > > + * - @p 0 requests the default behavior. Depending on the packet > > + * type, it can mean outermost, innermost, anything in between or > > + * even no RSS. > > + * > > + * It basically stands for the innermost encapsulation level RSS > > + * can be performed on according to PMD and device capabilities. > > + * > > + * - @p 1 requests RSS to be performed on the outermost packet > > + * encapsulation level. > > + * > > + * - @p 2 and subsequent values request RSS to be performed on the > > + * specified inner packet encapsulation level, from outermost to > > + * innermost (lower to higher values). > > + * > > + * Support for values other than @p 0 is dependent on the underlying > > + * hardware in use. > > + * > > + * Requesting a specific RSS level on unrecognized traffic results > > + * in undefined behavior. > >*/ > > struct rte_eth_rss_conf { > > uint8_t *rss_key;/**< If not NULL, 40-byte hash key. */ > > uint8_t rss_key_len; /**< hash key length in bytes. */ > > uint64_t rss_hf; /**< Hash functions to apply - see below. */ > > + uint32_t rss_level; /**< RSS hash level */ > > }; > > I'm not sure that offload flag is required in this case. > I think maximum supported rss_level in dev_info will provide more information and per-queue level does not make sense > in this case. Even if per-queue group control is required, > it should be doable via rte_flow API RSS action. > This is dev config and not flow specific configuration. Ofcourse while passing the rss_config, not all the queues may be specified, but that is not a new behavior and it is upto the application anyway. Are we transitioning the device level configuration to rte_flow/flow based scheme? > > Anyway, it looks like it is ABI breakage with all consequences. > In 64-bit case it is possible to put it before rss_hf to avoid > ABI breakage, but it will break ABI on 32-bit anyway. > Right. I sent the proposal for review early to get it cleaned up and ready when the window opens. > > > /* > > @@ -599,6 +623,8 @@ rte_eth_rss_hf_refine(uint64_t rss_hf) > > ETH_RSS_GENEVE | \ > > ETH_RSS_NVGRE) > > > > +#define ETH_RSS_LEVEL_DEFAULT0 > > + > > /* > >* Definitions used for redirection table entry size. > >* Some RSS RETA sizes may not be supported by some drivers, check the > > @@ -1103,6 +1129,7 @@ struct rte_eth_conf { > > #define DEV_RX_OFFLOAD_SCTP_CKSUM 0x0002 > > #define DEV_RX_OFFLOAD_OUTER_UDP_CKSUM 0x0004 > > #define DEV_RX_OFFLOAD_RSS_HASH 0x0008 > > +#define DEV_RX_OFFLOAD_RSS_LEVEL 0x0010 > > > > #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | \ > >DEV_RX_OFFLOAD_UDP_CKSUM | \ > > > >
Re: [dpdk-dev] [PATCH 1/3] ethdev: add RSS hash level
On Fri, 6 Dec 2019 16:59:17 -0800 Ajit Khaparde wrote: > */ > struct rte_eth_rss_conf { > uint8_t *rss_key;/**< If not NULL, 40-byte hash key. */ > uint8_t rss_key_len; /**< hash key length in bytes. */ > uint64_t rss_hf; /**< Hash functions to apply - see below. */ > + uint32_t rss_level; /**< RSS hash level */ > }; > This is an API/ABI change which is not allowed per current policy. API/ABI is frozen since 19.11 until the DPDK 20.11 release You need to figure out another way to do this.
Re: [dpdk-dev] [PATCH v3] kernel/linux: fix kernel dir for meson
On 12/04, Bruce Richardson wrote: >On Wed, Dec 04, 2019 at 10:18:21PM +0800, Ye Xiaolong wrote: >> On 12/04, Luca Boccassi wrote: >> >On Tue, 2019-12-03 at 23:59 +0800, Xiaolong Ye wrote: >> >> kernel_dir option in meson build is equivalent to RTE_KERNELDIR in >> >> make >> >> system, for cross-compilation case, users would specify it as local >> >> kernel src dir like >> >> >> >> //target-arm_glibc/linux-arm/linux-4.19.81/ >> >> >> >> Current meson build would fail to compile kernel module if user >> >> specify >> >> kernel_dir as above, this patch fixes this issue. >> >> >> >> After this change, for normal build case, user can specify >> >> /lib/modules/ or /lib/modules//build >> >> as >> >> kernel_dir. For cross compilation case, user can specify any >> >> directory >> >> that contains kernel source code as the kernel_dir. >> >> >> >> Fixes: 317832f97c16 ("kernel/linux: fix modules install path") >> >> Cc: >> >> sta...@dpdk.org >> >> >> >> Cc: >> >> iryz...@nfware.com >> >> >> >> >> >> Signed-off-by: Xiaolong Ye < >> >> xiaolong...@intel.com >> > >> >The convention used by upstream and all distros is that kernel headers >> >are in /build. Why can't the cross compilation case also >> >follow this convention, rather than adding complications to the >> >> Yes, cross-compilation can follow this convention, but one common case is >> that >> users download and put kernel src (the same kernel that's running in the >> target machine) >> to one arbitrary dir, he then use this dir as kernel_dir to build kernel >> modules, >> it's extra burden for users to create extra build dir to hold the kernel >> headers. >> > >As part of the build of the kernel, do you not do a "modules_install" step, >which should set up things correctly for later builds? Yes, this cmd helps. But for make build, user could specify both /lib/modules//build and any kernel src dir as RTE_KERNELDIR, I think this patch can help give user consistent experience when they migrate from make to meson build. Thanks, Xiaolong > >/Bruce
Re: [dpdk-dev] discussion: creating a new class for vdpa driversxiao.w.w...@intel.com
From: Andrew Rybchenko > On 12/6/19 8:32 AM, Liang, Cunming wrote: > > > > > >> -Original Message- > >> From: Bie, Tiwei > >> Sent: Friday, December 6, 2019 12:28 PM > >> To: Matan Azrad > >> Cc: Wang, Xiao W ; Thomas Monjalon > >> ; maxime.coque...@redhat.com; Wang, > Zhihong > >> ; Yigit, Ferruh ; > >> Shahaf Shuler ; Ori Kam > ; > >> dev@dpdk.org; Slava Ovsiienko ; Asaf > Penso > >> ; Olga Shern ; Liang, > Cunming > >> > >> Subject: Re: discussion: creating a new class for vdpa > >> driversxiao.w.w...@intel.com > >> > >> On Thu, Dec 05, 2019 at 01:26:36PM +, Matan Azrad wrote: > >>> Hi all > >>> > >>> As described in RFC “[RFC] net: new vdpa PMD for Mellanox devices”, > >>> a new vdpa drivers is going to be added for Mellanox devices – > >>> mlx5_vdpa > >>> > >>> The only vdpa driver now is the IFC driver that is located in net > >>> directory. > >>> > >>> The IFC driver and the new mlx5_vdpa driver provide the vdpa ops and > >>> not the eth_dev ops. > >>> > >>> All the others drivers in net provide the eth-dev ops. > >>> > >>> I suggest to create a new class for vdpa drivers, to move IFC to > >>> this class and to add the mlx5_vdpa to this class too. > >>> > >>> Later, all the new drivers that implements the vdpa ops will be > >>> added to the vdpa class. > >> > >> +1. Sounds like a good idea to me. > > +1 > > vDPA drivers are vendor-specific and expected to talk to vendor NIC. I.e. > there are significant chances to share code with network drivers (e.g. base > driver). Should base driver be moved to drivers/common in this case or is it > still allows to have vdpa driver in drivers/net together with ethdev driver? Yes, I think this should be the method, shared code should be moved to the drivers/common directory. I think there is a precedence with shared code in common which shares a vendor specific code between crypto and net. Actually, this is my plan to share mlx5 vdpa code with mlx5 net code by the drivers/common dir (see RFC). Matan