[dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss
Hi Thomas, > -Original Message- > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > Sent: Tuesday, January 27, 2015 8:13 PM > To: Ouyang, Changchun > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] ixgbe: Fix an unnecessary check in vf rss > > > To follow up the comments from Wodkowski, PawelX, remove this > > unnecessary check, as check_mq_mode has already check the queue > number > > in device configure stage, if the queue number of vf is not correct, > > it will return error code and exit, so it doesn't need check again > > here in device start stage(note: pf_host_configure is called in device start > stage). > > > > This fixes commit 42d2f78abcb77ecb769be4149df550308169ef0f > > > > Signed-off-by: Changchun Ouyang > > Suggested-by: Pawel Wodkowski > Fixes: 42d2f78abcb77 ("configure VF RSS") > > Applied > Thanks very much for the applying! > Changchun, as you are working on ixgbe, maybe you would like to review > some ixgbe patches from others? > No problem, I will try to do it when my bandwidth allows me to do it, :-) Thanks Changchun
[dpdk-dev] [PATCH v2] testpmd check return value of rte_eth_dev_vlan_filter()
On 1/28/2015 1:20 AM, Michal Jastrzebski wrote: > This patch modifies testpmd behavior when setting: > rx_vlan add all vf_port (enabling all vlanids > to be passed thru rx filter on VF). > Rx_vlan_all_filter_set() function, > checks if the next vlanid can be enabled by the driver. > Number of vlanids is limited by the NIC and thus the NIC > do not allow to enable more vlanids than it can allocate > in VFTA table. But what about if it is caused by other issue to lead a enable failure? > v2 - fix formatting errors > > Signed-off-by: Michal Jastrzebski > --- > app/test-pmd/config.c | 15 +-- > app/test-pmd/testpmd.h|2 +- > lib/librte_ether/rte_ethdev.c |4 ++-- > 3 files changed, 12 insertions(+), 9 deletions(-) > > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c > index c40f819..eda737e 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -1643,21 +1643,22 @@ rx_vlan_filter_set(portid_t port_id, int on) > "diag=%d\n", port_id, on, diag); > } > > -void > +int > rx_vft_set(portid_t port_id, uint16_t vlan_id, int on) > { > int diag; > > if (port_id_is_invalid(port_id)) > - return; > + return 1; > if (vlan_id_is_invalid(vlan_id)) > - return; > + return 1; > diag = rte_eth_dev_vlan_filter(port_id, vlan_id, on); > if (diag == 0) > - return; > + return 0; > printf("rte_eth_dev_vlan_filter(port_pi=%d, vlan_id=%d, on=%d) failed " > "diag=%d\n", > port_id, vlan_id, on, diag); > + return -1; > } > > void > @@ -1667,8 +1668,10 @@ rx_vlan_all_filter_set(portid_t port_id, int on) > > if (port_id_is_invalid(port_id)) > return; > - for (vlan_id = 0; vlan_id < 4096; vlan_id++) > - rx_vft_set(port_id, vlan_id, on); > + for (vlan_id = 0; vlan_id < 4096; vlan_id++){ Before "{" you use a Tab? One white space is OK. Thanks, Michael > + if (rx_vft_set(port_id, vlan_id, on)) > + break; > + } > } > > void > diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h > index 8f5e6c7..e0186b9 100644 > --- a/app/test-pmd/testpmd.h > +++ b/app/test-pmd/testpmd.h > @@ -492,7 +492,7 @@ void rx_vlan_strip_set_on_queue(portid_t port_id, > uint16_t queue_id, int on); > > void rx_vlan_filter_set(portid_t port_id, int on); > void rx_vlan_all_filter_set(portid_t port_id, int on); > -void rx_vft_set(portid_t port_id, uint16_t vlan_id, int on); > +int rx_vft_set(portid_t port_id, uint16_t vlan_id, int on); > void vlan_extend_set(portid_t port_id, int on); > void vlan_tpid_set(portid_t port_id, uint16_t tp_id); > void tx_vlan_set(portid_t port_id, uint16_t vlan_id); > diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c > index ea3a1fb..064b5d6 100644 > --- a/lib/librte_ether/rte_ethdev.c > +++ b/lib/librte_ether/rte_ethdev.c > @@ -1519,8 +1519,8 @@ rte_eth_dev_vlan_filter(uint8_t port_id, uint16_t > vlan_id, int on) > return (-EINVAL); > } > FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vlan_filter_set, -ENOTSUP); > - (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on); > - return (0); > + > + return (*dev->dev_ops->vlan_filter_set)(dev, vlan_id, on); > } > > int
[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
> -Original Message- > From: Ananyev, Konstantin > Sent: Tuesday, January 27, 2015 8:20 PM > To: Wang, Zhihong; Richardson, Bruce; 'Marc Sune' > Cc: 'dev at dpdk.org' > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > -Original Message- > > From: Ananyev, Konstantin > > Sent: Tuesday, January 27, 2015 11:30 AM > > To: Wang, Zhihong; Richardson, Bruce; Marc Sune > > Cc: dev at dpdk.org > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > > > > > -Original Message- > > > From: Wang, Zhihong > > > Sent: Tuesday, January 27, 2015 1:42 AM > > > To: Ananyev, Konstantin; Richardson, Bruce; Marc Sune > > > Cc: dev at dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > > > > > > > > > -Original Message- > > > > From: Ananyev, Konstantin > > > > Sent: Tuesday, January 27, 2015 2:29 AM > > > > To: Wang, Zhihong; Richardson, Bruce; Marc Sune > > > > Cc: dev at dpdk.org > > > > Subject: RE: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > > > > Hi Zhihong, > > > > > > > > > -Original Message- > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, > > > > > Zhihong > > > > > Sent: Friday, January 23, 2015 6:52 AM > > > > > To: Richardson, Bruce; Marc Sune > > > > > Cc: dev at dpdk.org > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce > > > > > > Richardson > > > > > > Sent: Wednesday, January 21, 2015 9:26 PM > > > > > > To: Marc Sune > > > > > > Cc: dev at dpdk.org > > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization > > > > > > > > > > > > On Wed, Jan 21, 2015 at 02:21:25PM +0100, Marc Sune wrote: > > > > > > > > > > > > > > On 21/01/15 14:02, Bruce Richardson wrote: > > > > > > > >On Wed, Jan 21, 2015 at 01:36:41PM +0100, Marc Sune wrote: > > > > > > > >>On 21/01/15 04:44, Wang, Zhihong wrote: > > > > > > > -Original Message- > > > > > > > From: Richardson, Bruce > > > > > > > Sent: Wednesday, January 21, 2015 12:15 AM > > > > > > > To: Neil Horman > > > > > > > Cc: Wang, Zhihong; dev at dpdk.org > > > > > > > Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy > > > > > > > optimization > > > > > > > > > > > > > > On Tue, Jan 20, 2015 at 10:11:18AM -0500, Neil Horman wrote: > > > > > > > >On Tue, Jan 20, 2015 at 03:01:44AM +, Wang, Zhihong > wrote: > > > > > > > >>>-Original Message- > > > > > > > >>>From: Neil Horman [mailto:nhorman at tuxdriver.com] > > > > > > > >>>Sent: Monday, January 19, 2015 9:02 PM > > > > > > > >>>To: Wang, Zhihong > > > > > > > >>>Cc: dev at dpdk.org > > > > > > > >>>Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy > > > > > > > >>>optimization > > > > > > > >>> > > > > > > > >>>On Mon, Jan 19, 2015 at 09:53:30AM +0800, > > > > > > > >>>zhihong.wang at intel.com > > > > > > > wrote: > > > > > > > This patch set optimizes memcpy for DPDK for both > > > > > > > SSE and AVX > > > > > > > platforms. > > > > > > > It also extends memcpy test coverage with unaligned > > > > > > > cases and more test > > > > > > > >>>points. > > > > > > > Optimization techniques are summarized below: > > > > > > > > > > > > > > 1. Utilize full cache bandwidth > > > > > > > > > > > > > > 2. Enforce aligned stores > > > > > > > > > > > > > > 3. Apply load address alignment based on > > > > > > > architecture features > > > > > > > > > > > > > > 4. Make load/store address available as early as > > > > > > > possible > > > > > > > > > > > > > > 5. General optimization techniques like inlining, > > > > > > > branch reducing, prefetch pattern access > > > > > > > > > > > > > > Zhihong Wang (4): > > > > > > > Disabled VTA for memcpy test in app/test/Makefile > > > > > > > Removed unnecessary test cases in test_memcpy.c > > > > > > > Extended test coverage in test_memcpy_perf.c > > > > > > > Optimized memcpy in arch/x86/rte_memcpy.h for > > > > > > > both SSE > > > > > > and AVX > > > > > > > platforms > > > > > > > > > > > > > > app/test/Makefile | > > > > > > > 6 + > > > > > > > app/test/test_memcpy.c | > > > > > > > 52 +- > > > > > > > app/test/test_memcpy_perf.c| 238 > +--- > > > > > > > .../common/include/arch/x86/rte_memcpy.h | > 664 > > > > > > > >>>+++-- > > > > > > > 4 files changed, 656 insertions(+), 304 > > > > > > > deletions(-) > > > > > > >
[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt
Hi Stephen, > -Original Message- > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > Sent: Tuesday, January 27, 2015 6:00 PM > To: Xie, Huawei > Cc: Ouyang, Changchun; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State > interrupt > > On Tue, 27 Jan 2015 09:04:07 + > "Xie, Huawei" wrote: > > > > -Original Message- > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang > > > Changchun > > > Sent: Tuesday, January 27, 2015 10:36 AM > > > To: dev at dpdk.org > > > Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link > > > State interrupt > > > > > > Virtio has link state interrupt which can be used. > > > > > > Signed-off-by: Stephen Hemminger > > > Signed-off-by: Changchun Ouyang > > > --- > > > lib/librte_pmd_virtio/virtio_ethdev.c | 78 > > > +++-- > > > -- > > > lib/librte_pmd_virtio/virtio_pci.c| 22 ++ > > > lib/librte_pmd_virtio/virtio_pci.h| 4 ++ > > > 3 files changed, 86 insertions(+), 18 deletions(-) > > > > > > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c > > > b/lib/librte_pmd_virtio/virtio_ethdev.c > > > index 5df3b54..ef87ff8 100644 > > > --- a/lib/librte_pmd_virtio/virtio_ethdev.c > > > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c > > > @@ -845,6 +845,34 @@ static int virtio_resource_init(struct > > > rte_pci_device *pci_dev __rte_unused) #endif > > > > > > /* > > > + * Process Virtio Config changed interrupt and call the callback > > > + * if link state changed. > > > + */ > > > +static void > > > +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle, > > > + void *param) > > > +{ > > > + struct rte_eth_dev *dev = param; > > > + struct virtio_hw *hw = > > > + VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > + uint8_t isr; > > > + > > > + /* Read interrupt status which clears interrupt */ > > > + isr = vtpci_isr(hw); > > > + PMD_DRV_LOG(INFO, "interrupt status = %#x", isr); > > > + > > > + if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) > > > + PMD_DRV_LOG(ERR, "interrupt enable failed"); > > > + > > > > Is it better to put rte_intr_enable after we have handled the interrupt. > > Is there the possibility of interrupt reentrant in uio intr framework? > > The UIO framework handles IRQ's via posix thread that is reading fd, then > calling this code. Therefore it is always single threaded. Even if it is under UIO framework, and always single threaded, How about move rte_intr_enable after the virtio_dev_link_update() and _rte_eth_dev_callback_process is called. This make it more like interrupt handler in linux kernel. What do you think of it? Thanks Changchun
[dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers
> -Original Message- > From: Xie, Huawei > Sent: Wednesday, January 28, 2015 12:16 AM > To: Stephen Hemminger > Cc: Ouyang, Changchun; dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers > > > > > -Original Message- > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > Sent: Tuesday, January 27, 2015 5:59 PM > > To: Xie, Huawei > > Cc: Ouyang, Changchun; dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2 02/24] virtio: Use weaker barriers > > > > > > > I recall our original code is virtio_wmb(). > > > Use store fence to ensure all updates to entries before updating the > index. > > > Why do we need virtio_rmb() here and add virtio_wmb after > > vq_update_avail_idx()? > > > > Store fence is unnecessary, Intel CPU's are cache coherent, please > > read the virtio Linux ring header file for explanation. A full fence > > WMB is more expensive and causes CPU stall > > > > > I mean virtio_wmb rather than virtio_rmb should be used here, and both of > them are defined as compiler barrier. > > The following code is linux virtio driver for adding buffer to vring. > /* Put entry in available array (but don't update avail->idx until they >* do sync). */ > avail = (vq->vring.avail->idx & (vq->vring.num-1)); > vq->vring.avail->ring[avail] = head; > > /* Descriptors and available array need to be set before we expose > the >* new available array entries. */ > virtio_wmb(vq->weak_barriers); > vq->vring.avail->idx++; > Yes, use virtio_wmb is better here, will change it in next version. Thanks Changchun
[dpdk-dev] [PATCH v2 00/15] support multi-pthread per core
v2 changes: add '-' support for EAL option '--lcores' The patch series contain the enhancements of EAL and fixes for libraries to run multi-pthreads(either EAL or non-EAL thread) per physical core. Two major changes list as below: - Extend the core affinity of each EAL thread to 1:n. Each lcore stands for a EAL thread rather than a logical core. The change adds new EAL option to allow static lcore to cpuset assginment. Then a lcore(EAL thread) affinity to a cpuset, original 1:1 mapping is the special case. - Fix the libraries to allow running on any non-EAL thread. It fix the gaps running libraries in non-EAL thread(dynamic created by user). Each fix libraries take care the case of rte_lcore_id() >= RTE_MAX_LCORE. Thanks a million for the comments from Konstantin, Bruce, Mirek and Stephen in RFC review. *** BLURB HERE *** Cunming Liang (15): eal: add cpuset into per EAL thread lcore_config eal: new eal option '--lcores' for cpu assignment eal: add support parsing socket_id from cpuset eal: new TLS definition and API declaration eal: add eal_common_thread.c for common thread API eal: add rte_gettid() to acquire unique system tid eal: apply affinity of EAL thread by assigned cpuset enic: fix re-define freebsd compile complain malloc: fix the issue of SOCKET_ID_ANY log: fix the gap to support non-EAL thread eal: set _lcore_id and _socket_id to (-1) by default eal: fix recursive spinlock in non-EAL thraed mempool: add support to non-EAL thread ring: add support to non-EAL thread timer: add support to non-EAL thread lib/librte_eal/bsdapp/eal/Makefile | 1 + lib/librte_eal/bsdapp/eal/eal.c| 13 +- lib/librte_eal/bsdapp/eal/eal_lcore.c | 14 + lib/librte_eal/bsdapp/eal/eal_memory.c | 2 + lib/librte_eal/bsdapp/eal/eal_thread.c | 76 +++--- lib/librte_eal/common/eal_common_launch.c | 1 - lib/librte_eal/common/eal_common_log.c | 17 +- lib/librte_eal/common/eal_common_options.c | 300 - lib/librte_eal/common/eal_common_thread.c | 142 ++ lib/librte_eal/common/eal_options.h| 2 + lib/librte_eal/common/eal_thread.h | 66 + .../common/include/generic/rte_spinlock.h | 4 +- lib/librte_eal/common/include/rte_eal.h| 27 ++ lib/librte_eal/common/include/rte_lcore.h | 37 ++- lib/librte_eal/common/include/rte_log.h| 5 + lib/librte_eal/linuxapp/eal/Makefile | 4 + lib/librte_eal/linuxapp/eal/eal.c | 7 +- lib/librte_eal/linuxapp/eal/eal_lcore.c| 15 ++ lib/librte_eal/linuxapp/eal/eal_thread.c | 78 +++--- lib/librte_malloc/malloc_heap.h| 7 +- lib/librte_mempool/rte_mempool.h | 18 +- lib/librte_pmd_enic/enic.h | 1 + lib/librte_pmd_enic/enic_compat.h | 1 + lib/librte_ring/rte_ring.h | 10 +- lib/librte_timer/rte_timer.c | 40 ++- lib/librte_timer/rte_timer.h | 2 +- 26 files changed, 759 insertions(+), 131 deletions(-) create mode 100644 lib/librte_eal/common/eal_common_thread.c -- 1.8.1.4
[dpdk-dev] [PATCH v2 01/15] eal: add cpuset into per EAL thread lcore_config
The patch adds 'cpuset' into per-lcore configure 'lcore_config[]', as the lcore no longer always 1:1 pinning with physical cpu. The lcore now stands for a EAL thread rather than a logical cpu. It doesn't change the default behavior of 1:1 mapping, but allows to affinity the EAL thread to multiple cpus. Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 +++ lib/librte_eal/bsdapp/eal/eal_memory.c| 2 ++ lib/librte_eal/common/include/rte_lcore.h | 8 lib/librte_eal/linuxapp/eal/Makefile | 1 + lib/librte_eal/linuxapp/eal/eal_lcore.c | 8 5 files changed, 26 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c b/lib/librte_eal/bsdapp/eal/eal_lcore.c index 662f024..72f8ac2 100644 --- a/lib/librte_eal/bsdapp/eal/eal_lcore.c +++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c @@ -76,11 +76,18 @@ rte_eal_cpu_init(void) * ones and enable them by default. */ for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + /* init cpuset for per lcore config */ + CPU_ZERO(&lcore_config[lcore_id].cpuset); + lcore_config[lcore_id].detected = (lcore_id < ncpus); if (lcore_config[lcore_id].detected == 0) { config->lcore_role[lcore_id] = ROLE_OFF; continue; } + + /* By default, lcore 1:1 map to cpu id */ + CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset); + /* By default, each detected core is enabled */ config->lcore_role[lcore_id] = ROLE_RTE; lcore_config[lcore_id].core_id = cpu_core_id(lcore_id); diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c index 65ee87d..a34d500 100644 --- a/lib/librte_eal/bsdapp/eal/eal_memory.c +++ b/lib/librte_eal/bsdapp/eal/eal_memory.c @@ -45,6 +45,8 @@ #include "eal_internal_cfg.h" #include "eal_filesystem.h" +/* avoid re-defined against with freebsd header */ +#undef PAGE_SIZE #define PAGE_SIZE (sysconf(_SC_PAGESIZE)) /* diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h index 49b2c03..4c7d6bb 100644 --- a/lib/librte_eal/common/include/rte_lcore.h +++ b/lib/librte_eal/common/include/rte_lcore.h @@ -50,6 +50,13 @@ extern "C" { #define LCORE_ID_ANY -1/**< Any lcore. */ +#if defined(__linux__) + typedef cpu_set_t rte_cpuset_t; +#elif defined(__FreeBSD__) +#include + typedef cpuset_t rte_cpuset_t; +#endif + /** * Structure storing internal configuration (per-lcore) */ @@ -65,6 +72,7 @@ struct lcore_config { unsigned socket_id;/**< physical socket id for this lcore */ unsigned core_id; /**< core number on socket for this lcore */ int core_index;/**< relative index, starting from 0 */ + rte_cpuset_t cpuset; /**< cpu set which the lcore affinity to */ }; /** diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 72ecf3a..0e9c447 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -87,6 +87,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_dev.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_common_options.c CFLAGS_eal.o := -D_GNU_SOURCE +CFLAGS_eal_lcore.o := -D_GNU_SOURCE CFLAGS_eal_thread.o := -D_GNU_SOURCE CFLAGS_eal_log.o := -D_GNU_SOURCE CFLAGS_eal_common_log.o := -D_GNU_SOURCE diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c b/lib/librte_eal/linuxapp/eal/eal_lcore.c index c67e0e6..29615f8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_lcore.c +++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c @@ -158,11 +158,19 @@ rte_eal_cpu_init(void) * ones and enable them by default. */ for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + /* init cpuset for per lcore config */ + CPU_ZERO(&lcore_config[lcore_id].cpuset); + + /* in 1:1 mapping, record related cpu detected state */ lcore_config[lcore_id].detected = cpu_detected(lcore_id); if (lcore_config[lcore_id].detected == 0) { config->lcore_role[lcore_id] = ROLE_OFF; continue; } + + /* By default, lcore 1:1 map to cpu id */ + CPU_SET(lcore_id, &lcore_config[lcore_id].cpuset); + /* By default, each detected core is enabled */ config->lcore_role[lcore_id] = ROLE_RTE; lcore_config[lcore_id].core_id = cpu_core_id(lcore_id); -- 1.8.1.4
[dpdk-dev] [PATCH v2 02/15] eal: new eal option '--lcores' for cpu assignment
It supports one new eal long option '--lcores' for EAL thread cpuset assignment. The format pattern: --lcores='lcores[@cpus]<,lcores[@cpus]>' lcores, cpus could be a single digit/range or a group. '(' and ')' are necessary if it's a group. If not supply '@cpus', the value of cpus uses the same as lcores. e.g. '1,2@(5-7),(3-5)@(0,2),(0,6),7-8' means starting 9 EAL thread as below lcore 0 runs on cpuset 0x41 (cpu 0,6) lcore 1 runs on cpuset 0x2 (cpu 1) lcore 2 runs on cpuset 0xe0 (cpu 5,6,7) lcore 3,4,5 runs on cpuset 0x5 (cpu 0,2) lcore 6 runs on cpuset 0x41 (cpu 0,6) lcore 7 runs on cpuset 0x80 (cpu 7) lcore 8 runs on cpuset 0x100 (cpu 8) Signed-off-by: Cunming Liang --- lib/librte_eal/common/eal_common_launch.c | 1 - lib/librte_eal/common/eal_common_options.c | 300 - lib/librte_eal/common/eal_options.h| 2 + lib/librte_eal/linuxapp/eal/Makefile | 1 + 4 files changed, 299 insertions(+), 5 deletions(-) diff --git a/lib/librte_eal/common/eal_common_launch.c b/lib/librte_eal/common/eal_common_launch.c index 599f83b..2d732b1 100644 --- a/lib/librte_eal/common/eal_common_launch.c +++ b/lib/librte_eal/common/eal_common_launch.c @@ -117,4 +117,3 @@ rte_eal_mp_wait_lcore(void) rte_eal_wait_lcore(lcore_id); } } - diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 67e02dc..29ebb6f 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "eal_internal_cfg.h" #include "eal_options.h" @@ -85,6 +86,7 @@ eal_long_options[] = { {OPT_XEN_DOM0, 0, 0, OPT_XEN_DOM0_NUM}, {OPT_CREATE_UIO_DEV, 1, NULL, OPT_CREATE_UIO_DEV_NUM}, {OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM}, + {OPT_LCORES, 1, 0, OPT_LCORES_NUM}, {0, 0, 0, 0} }; @@ -255,9 +257,11 @@ eal_parse_corelist(const char *corelist) if (min == RTE_MAX_LCORE) min = idx; for (idx = min; idx <= max; idx++) { - cfg->lcore_role[idx] = ROLE_RTE; - lcore_config[idx].core_index = count; - count++; + if (cfg->lcore_role[idx] != ROLE_RTE) { + cfg->lcore_role[idx] = ROLE_RTE; + lcore_config[idx].core_index = count; + count++; + } } min = RTE_MAX_LCORE; } else @@ -292,6 +296,279 @@ eal_parse_master_lcore(const char *arg) return 0; } +/* + * Parse elem, the elem could be single number/range or '(' ')' group + * Within group elem, '-' used for a range seperator; + *',' used for a single number. + */ +static int +eal_parse_set(const char *input, uint16_t set[], unsigned num) +{ + unsigned idx; + const char *str = input; + char *end = NULL; + unsigned min, max; + + memset(set, 0, num * sizeof(uint16_t)); + + while (isblank(*str)) + str++; + + /* only digit or left bracket is qulify for start point */ + if ((!isdigit(*str) && *str != '(') || *str == '\0') + return -1; + + /* process single number or single range of number */ + if (*str != '(') { + errno = 0; + idx = strtoul(str, &end, 10); + if (errno || end == NULL || idx >= num) + return -1; + else { + while (isblank(*end)) + end++; + + min = idx; + max = idx; + if (*end == '-') { + /* proccess single - */ + end++; + while (isblank(*end)) + end++; + if (!isdigit(*end)) + return -1; + + errno = 0; + idx = strtoul(end, &end, 10); + if (errno || end == NULL || idx >= num) + return -1; + max = idx; + while (isblank(*end)) + end++; + if (*end != ',' && *end != '\0') + return -1; + } + + if (*end != ',' && *end != '\0' && + *end != '@') + return -1; + + for (idx = RTE_MIN(min, max); +idx <= R
[dpdk-dev] [PATCH v2 03/15] eal: add support parsing socket_id from cpuset
It returns the socket_id if all cpus in the cpuset belongs to the same NUMA node, otherwise it will return SOCKET_ID_ANY. Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal_lcore.c | 7 + lib/librte_eal/common/eal_thread.h | 52 + lib/librte_eal/linuxapp/eal/eal_lcore.c | 7 + 3 files changed, 66 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_lcore.c b/lib/librte_eal/bsdapp/eal/eal_lcore.c index 72f8ac2..162fb4f 100644 --- a/lib/librte_eal/bsdapp/eal/eal_lcore.c +++ b/lib/librte_eal/bsdapp/eal/eal_lcore.c @@ -41,6 +41,7 @@ #include #include "eal_private.h" +#include "eal_thread.h" /* No topology information available on FreeBSD including NUMA info */ #define cpu_core_id(X) 0 @@ -112,3 +113,9 @@ rte_eal_cpu_init(void) return 0; } + +unsigned +eal_cpu_socket_id(__rte_unused unsigned cpu_id) +{ + return cpu_socket_id(cpu_id); +} diff --git a/lib/librte_eal/common/eal_thread.h b/lib/librte_eal/common/eal_thread.h index b53b84d..a25ee86 100644 --- a/lib/librte_eal/common/eal_thread.h +++ b/lib/librte_eal/common/eal_thread.h @@ -34,6 +34,10 @@ #ifndef EAL_THREAD_H #define EAL_THREAD_H +#include + +#include + /** * basic loop of thread, called for each thread by eal_init(). * @@ -50,4 +54,52 @@ __attribute__((noreturn)) void *eal_thread_loop(void *arg); */ void eal_thread_init_master(unsigned lcore_id); +/** + * Get the NUMA socket id from cpu id. + * This function is private to EAL. + * + * @param cpu_id + * The logical process id. + * @return + * socket_id or SOCKET_ID_ANY + */ +unsigned eal_cpu_socket_id(unsigned cpu_id); + +/** + * Get the NUMA socket id from cpuset. + * This function is private to EAL. + * + * @param cpusetp + * The point to a valid cpu set. + * @return + * socket_id or SOCKET_ID_ANY + */ +static inline int +eal_cpuset_socket_id(rte_cpuset_t *cpusetp) +{ + unsigned cpu = 0; + int socket_id = SOCKET_ID_ANY; + int sid; + + if (cpusetp == NULL) + return SOCKET_ID_ANY; + + do { + if (!CPU_ISSET(cpu, cpusetp)) + continue; + + if (socket_id == SOCKET_ID_ANY) + socket_id = eal_cpu_socket_id(cpu); + + sid = eal_cpu_socket_id(cpu); + if (socket_id != sid) { + socket_id = SOCKET_ID_ANY; + break; + } + + } while (++cpu < RTE_MAX_LCORE); + + return socket_id; +} + #endif /* EAL_THREAD_H */ diff --git a/lib/librte_eal/linuxapp/eal/eal_lcore.c b/lib/librte_eal/linuxapp/eal/eal_lcore.c index 29615f8..922af6d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_lcore.c +++ b/lib/librte_eal/linuxapp/eal/eal_lcore.c @@ -45,6 +45,7 @@ #include "eal_private.h" #include "eal_filesystem.h" +#include "eal_thread.h" #define SYS_CPU_DIR "/sys/devices/system/cpu/cpu%u" #define CORE_ID_FILE "topology/core_id" @@ -197,3 +198,9 @@ rte_eal_cpu_init(void) return 0; } + +unsigned +eal_cpu_socket_id(unsigned cpu_id) +{ + return cpu_socket_id(cpu_id); +} -- 1.8.1.4
[dpdk-dev] [PATCH v2 04/15] eal: new TLS definition and API declaration
1. add two TLS *_socket_id* and *_cpuset* 2. add two external API rte_thread_set/get_affinity 3. add one internal API eal_thread_dump_affinity Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal_thread.c| 2 ++ lib/librte_eal/common/eal_thread.h| 14 ++ lib/librte_eal/common/include/rte_lcore.h | 29 +++-- lib/librte_eal/linuxapp/eal/eal_thread.c | 2 ++ 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c index ab05368..10220c7 100644 --- a/lib/librte_eal/bsdapp/eal/eal_thread.c +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c @@ -56,6 +56,8 @@ #include "eal_thread.h" RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); +RTE_DEFINE_PER_LCORE(unsigned, _socket_id); +RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); /* * Send a message to a slave lcore identified by slave_id to call a diff --git a/lib/librte_eal/common/eal_thread.h b/lib/librte_eal/common/eal_thread.h index a25ee86..28edf51 100644 --- a/lib/librte_eal/common/eal_thread.h +++ b/lib/librte_eal/common/eal_thread.h @@ -102,4 +102,18 @@ eal_cpuset_socket_id(rte_cpuset_t *cpusetp) return socket_id; } +/** + * Dump the current pthread cpuset. + * This function is private to EAL. + * + * @param str + * The string buffer the cpuset will dump to. + * @param size + * The string buffer size. + */ +#define CPU_STR_LEN256 +void +eal_thread_dump_affinity(char str[], unsigned size); + + #endif /* EAL_THREAD_H */ diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h index 4c7d6bb..facdbdc 100644 --- a/lib/librte_eal/common/include/rte_lcore.h +++ b/lib/librte_eal/common/include/rte_lcore.h @@ -43,6 +43,7 @@ #include #include #include +#include #ifdef __cplusplus extern "C" { @@ -80,7 +81,9 @@ struct lcore_config { */ extern struct lcore_config lcore_config[RTE_MAX_LCORE]; -RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per core "core id". */ +RTE_DECLARE_PER_LCORE(unsigned, _lcore_id); /**< Per thread "lcore id". */ +RTE_DECLARE_PER_LCORE(unsigned, _socket_id); /**< Per thread "socket id". */ +RTE_DECLARE_PER_LCORE(rte_cpuset_t, _cpuset); /**< Per thread "cpuset". */ /** * Return the ID of the execution unit we are running on. @@ -146,7 +149,7 @@ rte_lcore_index(int lcore_id) static inline unsigned rte_socket_id(void) { - return lcore_config[rte_lcore_id()].socket_id; + return RTE_PER_LCORE(_socket_id); } /** @@ -229,6 +232,28 @@ rte_get_next_lcore(unsigned i, int skip_master, int wrap) i
[dpdk-dev] [PATCH v2 05/15] eal: add eal_common_thread.c for common thread API
The API works for both EAL thread and none EAL thread. When calling rte_thread_set_affinity, the *_socket_id* and *_cpuset* of calling thread will be updated if the thread successful set the cpu affinity. Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/Makefile| 1 + lib/librte_eal/common/eal_common_thread.c | 142 ++ lib/librte_eal/linuxapp/eal/Makefile | 2 + 3 files changed, 145 insertions(+) create mode 100644 lib/librte_eal/common/eal_common_thread.c diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index d434882..78406be 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -73,6 +73,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_hexdump.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_devargs.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_dev.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_options.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_BSDAPP) += eal_common_thread.c CFLAGS_eal.o := -D_GNU_SOURCE #CFLAGS_eal_thread.o := -D_GNU_SOURCE diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c new file mode 100644 index 000..d996690 --- /dev/null +++ b/lib/librte_eal/common/eal_common_thread.c @@ -0,0 +1,142 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "eal_thread.h" + +int +rte_thread_set_affinity(rte_cpuset_t *cpusetp) +{ + int s; + unsigned lcore_id; + pthread_t tid; + + if (!cpusetp) + return -1; + + lcore_id = rte_lcore_id(); + if (lcore_id != (unsigned)LCORE_ID_ANY) { + /* EAL thread */ + tid = lcore_config[lcore_id].thread_id; + + s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp); + if (s != 0) { + RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); + return -1; + } + + /* store socket_id in TLS for quick access */ + RTE_PER_LCORE(_socket_id) = + eal_cpuset_socket_id(cpusetp); + + /* store cpuset in TLS for quick access */ + rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp, + sizeof(rte_cpuset_t)); + + /* update lcore_config */ + lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id); + rte_memcpy(&lcore_config[lcore_id].cpuset, cpusetp, + sizeof(rte_cpuset_t)); + } else { + /* none EAL thread */ + tid = pthread_self(); + + s = pthread_setaffinity_np(tid, sizeof(rte_cpuset_t), cpusetp); + if (s != 0) { + RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); + return -1; + } + + /* store cpuset in TLS for quick access */ + rte_memcpy(&RTE_PER_LCORE(_cpuset), cpusetp, + sizeof(rte_cpuset_t)); + + /* store socket_id in TLS for quick access */ + RTE_PER_LCORE(_socket_id) = +
[dpdk-dev] [PATCH v2 07/15] eal: apply affinity of EAL thread by assigned cpuset
EAL threads use assigned cpuset to set core affinity during startup. It keeps 1:1 mapping, if no '--lcores' option is used. Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal.c | 13 --- lib/librte_eal/bsdapp/eal/eal_thread.c | 63 +- lib/librte_eal/linuxapp/eal/eal.c| 7 +++- lib/librte_eal/linuxapp/eal/eal_thread.c | 67 +++- 4 files changed, 54 insertions(+), 96 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 69f3c03..98c5a83 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -432,6 +432,7 @@ rte_eal_init(int argc, char **argv) int i, fctret, ret; pthread_t thread_id; static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0); + char cpuset[CPU_STR_LEN]; if (!rte_atomic32_test_and_set(&run_once)) return -1; @@ -502,13 +503,17 @@ rte_eal_init(int argc, char **argv) if (rte_eal_pci_init() < 0) rte_panic("Cannot init PCI\n"); - RTE_LOG(DEBUG, EAL, "Master core %u is ready (tid=%p)\n", - rte_config.master_lcore, thread_id); - eal_check_mem_on_local_socket(); rte_eal_mcfg_complete(); + eal_thread_init_master(rte_config.master_lcore); + + eal_thread_dump_affinity(cpuset, CPU_STR_LEN); + + RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%p;cpuset=[%s])\n", + rte_config.master_lcore, thread_id, cpuset); + if (rte_eal_dev_init() < 0) rte_panic("Cannot init pmd devices\n"); @@ -532,8 +537,6 @@ rte_eal_init(int argc, char **argv) rte_panic("Cannot create thread\n"); } - eal_thread_init_master(rte_config.master_lcore); - /* * Launch a dummy function on all slave lcores, so that master lcore * knows they are all ready when this function returns. diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c index d0c077b..5b16302 100644 --- a/lib/librte_eal/bsdapp/eal/eal_thread.c +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c @@ -103,55 +103,27 @@ eal_thread_set_affinity(void) { int s; pthread_t thread; - -/* - * According to the section VERSIONS of the CPU_ALLOC man page: - * - * The CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET() macros were added - * in glibc 2.3.3. - * - * CPU_COUNT() first appeared in glibc 2.6. - * - * CPU_AND(), CPU_OR(), CPU_XOR(),CPU_EQUAL(),CPU_ALLOC(), - * CPU_ALLOC_SIZE(), CPU_FREE(), CPU_ZERO_S(), CPU_SET_S(), CPU_CLR_S(), - * CPU_ISSET_S(), CPU_AND_S(), CPU_OR_S(), CPU_XOR_S(), and CPU_EQUAL_S() - * first appeared in glibc 2.7. - */ -#if defined(CPU_ALLOC) - size_t size; - cpu_set_t *cpusetp; - - cpusetp = CPU_ALLOC(RTE_MAX_LCORE); - if (cpusetp == NULL) { - RTE_LOG(ERR, EAL, "CPU_ALLOC failed\n"); - return -1; - } - - size = CPU_ALLOC_SIZE(RTE_MAX_LCORE); - CPU_ZERO_S(size, cpusetp); - CPU_SET_S(rte_lcore_id(), size, cpusetp); + unsigned lcore_id = rte_lcore_id(); thread = pthread_self(); - s = pthread_setaffinity_np(thread, size, cpusetp); + s = pthread_setaffinity_np(thread, sizeof(cpuset_t), + &lcore_config[lcore_id].cpuset); if (s != 0) { RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); - CPU_FREE(cpusetp); return -1; } - CPU_FREE(cpusetp); -#else /* CPU_ALLOC */ - cpuset_t cpuset; - CPU_ZERO( &cpuset ); - CPU_SET( rte_lcore_id(), &cpuset ); + /* acquire system unique id */ + rte_gettid(); + + /* store socket_id in TLS for quick access */ + RTE_PER_LCORE(_socket_id) = + eal_cpuset_socket_id(&lcore_config[lcore_id].cpuset); + + CPU_COPY(&lcore_config[lcore_id].cpuset, &RTE_PER_LCORE(_cpuset)); + + lcore_config[lcore_id].socket_id = RTE_PER_LCORE(_socket_id); - thread = pthread_self(); - s = pthread_setaffinity_np(thread, sizeof( cpuset ), &cpuset); - if (s != 0) { - RTE_LOG(ERR, EAL, "pthread_setaffinity_np failed\n"); - return -1; - } -#endif return 0; } @@ -174,6 +146,7 @@ eal_thread_loop(__attribute__((unused)) void *arg) unsigned lcore_id; pthread_t thread_id; int m2s, s2m; + char cpuset[CPU_STR_LEN]; thread_id = pthread_self(); @@ -185,9 +158,6 @@ eal_thread_loop(__attribute__((unused)) void *arg) if (lcore_id == RTE_MAX_LCORE) rte_panic("cannot retrieve lcore id\n"); - RTE_LOG(DEBUG, EAL, "Core %u is ready (tid=%p)\n", - lcore_id, thread_id); - m2s = lcore_config[lcore_id].pipe_master2slave[0]; s2m = lcore_config[lcore_id].pipe_slave2master[1]; @@ -198,6 +168,11
[dpdk-dev] [PATCH v2 08/15] enic: fix re-define freebsd compile complain
Some macro already been defined by freebsd 'sys/param.h'. Signed-off-by: Cunming Liang --- lib/librte_pmd_enic/enic.h| 1 + lib/librte_pmd_enic/enic_compat.h | 1 + 2 files changed, 2 insertions(+) diff --git a/lib/librte_pmd_enic/enic.h b/lib/librte_pmd_enic/enic.h index c43417c..189c3b9 100644 --- a/lib/librte_pmd_enic/enic.h +++ b/lib/librte_pmd_enic/enic.h @@ -66,6 +66,7 @@ #define ENIC_CALC_IP_CKSUM 1 #define ENIC_CALC_TCP_UDP_CKSUM 2 #define ENIC_MAX_MTU9000 +#undef PAGE_SIZE #define PAGE_SIZE 4096 #define PAGE_ROUND_UP(x) \ unsigned long)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1))) diff --git a/lib/librte_pmd_enic/enic_compat.h b/lib/librte_pmd_enic/enic_compat.h index b1af838..b84c766 100644 --- a/lib/librte_pmd_enic/enic_compat.h +++ b/lib/librte_pmd_enic/enic_compat.h @@ -67,6 +67,7 @@ #define pr_warn(y, args...) dev_warning(0, y, ##args) #define BUG() pr_err("BUG at %s:%d", __func__, __LINE__) +#undef ALIGN #define ALIGN(x, a) __ALIGN_MASK(x, (typeof(x))(a)-1) #define __ALIGN_MASK(x, mask)(((x)+(mask))&~(mask)) #define udelay usleep -- 1.8.1.4
[dpdk-dev] [PATCH v2 11/15] eal: set _lcore_id and _socket_id to (-1) by default
For those none EAL thread, *_lcore_id* shall always be LCORE_ID_ANY. The libraries using *_lcore_id* as index need to take care. *_socket_id* always be SOCKET_ID_ANY unitl the thread changes the affinity by rte_thread_set_affinity() Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal_thread.c | 4 ++-- lib/librte_eal/linuxapp/eal/eal_thread.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c index 5b16302..2b3c9a8 100644 --- a/lib/librte_eal/bsdapp/eal/eal_thread.c +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c @@ -56,8 +56,8 @@ #include "eal_private.h" #include "eal_thread.h" -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); -RTE_DEFINE_PER_LCORE(unsigned, _socket_id); +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY; +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); /* diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c index 6eb1525..ab94e20 100644 --- a/lib/librte_eal/linuxapp/eal/eal_thread.c +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c @@ -57,8 +57,8 @@ #include "eal_private.h" #include "eal_thread.h" -RTE_DEFINE_PER_LCORE(unsigned, _lcore_id); -RTE_DEFINE_PER_LCORE(unsigned, _socket_id); +RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = (unsigned)LCORE_ID_ANY; +RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY; RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset); /* -- 1.8.1.4
[dpdk-dev] [PATCH v2 12/15] eal: fix recursive spinlock in non-EAL thraed
In non-EAL thread, lcore_id alrways be LCORE_ID_ANY. It cann't be used as unique id for recursive spinlock. Then use rte_gettid() to replace it. Signed-off-by: Cunming Liang --- lib/librte_eal/common/include/generic/rte_spinlock.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/include/generic/rte_spinlock.h b/lib/librte_eal/common/include/generic/rte_spinlock.h index dea885c..c7fb0df 100644 --- a/lib/librte_eal/common/include/generic/rte_spinlock.h +++ b/lib/librte_eal/common/include/generic/rte_spinlock.h @@ -179,7 +179,7 @@ static inline void rte_spinlock_recursive_init(rte_spinlock_recursive_t *slr) */ static inline void rte_spinlock_recursive_lock(rte_spinlock_recursive_t *slr) { - int id = rte_lcore_id(); + int id = rte_gettid(); if (slr->user != id) { rte_spinlock_lock(&slr->sl); @@ -212,7 +212,7 @@ static inline void rte_spinlock_recursive_unlock(rte_spinlock_recursive_t *slr) */ static inline int rte_spinlock_recursive_trylock(rte_spinlock_recursive_t *slr) { - int id = rte_lcore_id(); + int id = rte_gettid(); if (slr->user != id) { if (rte_spinlock_trylock(&slr->sl) == 0) -- 1.8.1.4
[dpdk-dev] [PATCH v2 13/15] mempool: add support to non-EAL thread
For non-EAL thread, bypass per lcore cache, directly use ring pool. It allows using rte_mempool in either EAL thread or any user pthread. As in non-EAL thread, it directly rely on rte_ring and it's none preemptive. It doesn't suggest to run multi-pthread/cpu which compete the rte_mempool. It will get bad performance and has critical risk if scheduling policy is RT. Signed-off-by: Cunming Liang --- lib/librte_mempool/rte_mempool.h | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 3314651..4845f27 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -198,10 +198,12 @@ struct rte_mempool { * Number to add to the object-oriented statistics. */ #ifdef RTE_LIBRTE_MEMPOOL_DEBUG -#define __MEMPOOL_STAT_ADD(mp, name, n) do { \ - unsigned __lcore_id = rte_lcore_id(); \ - mp->stats[__lcore_id].name##_objs += n; \ - mp->stats[__lcore_id].name##_bulk += 1; \ +#define __MEMPOOL_STAT_ADD(mp, name, n) do {\ + unsigned __lcore_id = rte_lcore_id(); \ + if (__lcore_id < RTE_MAX_LCORE) { \ + mp->stats[__lcore_id].name##_objs += n; \ + mp->stats[__lcore_id].name##_bulk += 1; \ + } \ } while(0) #else #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0) @@ -767,8 +769,9 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, __MEMPOOL_STAT_ADD(mp, put, n); #if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 - /* cache is not enabled or single producer */ - if (unlikely(cache_size == 0 || is_mp == 0)) + /* cache is not enabled or single producer or none EAL thread */ + if (unlikely(cache_size == 0 || is_mp == 0 || +lcore_id >= RTE_MAX_LCORE)) goto ring_enqueue; /* Go straight to ring if put would overflow mem allocated for cache */ @@ -952,7 +955,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, uint32_t cache_size = mp->cache_size; /* cache is not enabled or single consumer */ - if (unlikely(cache_size == 0 || is_mc == 0 || n >= cache_size)) + if (unlikely(cache_size == 0 || is_mc == 0 || +n >= cache_size || lcore_id >= RTE_MAX_LCORE)) goto ring_dequeue; cache = &mp->local_cache[lcore_id]; -- 1.8.1.4
[dpdk-dev] [PATCH v2 06/15] eal: add rte_gettid() to acquire unique system tid
The rte_gettid() wraps the linux and freebsd syscall gettid(). It provides a persistent unique thread id for the calling thread. It will save the unique id in TLS on the first time. Signed-off-by: Cunming Liang --- lib/librte_eal/bsdapp/eal/eal_thread.c | 9 + lib/librte_eal/common/include/rte_eal.h | 27 +++ lib/librte_eal/linuxapp/eal/eal_thread.c | 7 +++ 3 files changed, 43 insertions(+) diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c index 10220c7..d0c077b 100644 --- a/lib/librte_eal/bsdapp/eal/eal_thread.c +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -233,3 +234,11 @@ eal_thread_loop(__attribute__((unused)) void *arg) /* pthread_exit(NULL); */ /* return NULL; */ } + +/* require calling thread tid by gettid() */ +int rte_sys_gettid(void) +{ + long lwpid; + thr_self(&lwpid); + return (int)lwpid; +} diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index f4ecd2e..8ccdd65 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -41,6 +41,9 @@ */ #include +#include + +#include #ifdef __cplusplus extern "C" { @@ -262,6 +265,30 @@ rte_set_application_usage_hook( rte_usage_hook_t usage_func ); */ int rte_eal_has_hugepages(void); +/** + * A wrap API for syscall gettid. + * + * @return + * On success, returns the thread ID of calling process. + * It always successful. + */ +int rte_sys_gettid(void); + +/** + * Get system unique thread id. + * + * @return + * On success, returns the thread ID of calling process. + * It always successful. + */ +static inline int rte_gettid(void) +{ + static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1; + if (RTE_PER_LCORE(_thread_id) == -1) + RTE_PER_LCORE(_thread_id) = rte_sys_gettid(); + return RTE_PER_LCORE(_thread_id); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c index 748a83a..ed20c93 100644 --- a/lib/librte_eal/linuxapp/eal/eal_thread.c +++ b/lib/librte_eal/linuxapp/eal/eal_thread.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -233,3 +234,9 @@ eal_thread_loop(__attribute__((unused)) void *arg) /* pthread_exit(NULL); */ /* return NULL; */ } + +/* require calling thread tid by gettid() */ +int rte_sys_gettid(void) +{ + return (int)syscall(SYS_gettid); +} -- 1.8.1.4
[dpdk-dev] [PATCH v2 14/15] ring: add support to non-EAL thread
ring debug stat won't take care non-EAL thread. Signed-off-by: Cunming Liang --- lib/librte_ring/rte_ring.h | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index 7cd5f2d..39bacdd 100644 --- a/lib/librte_ring/rte_ring.h +++ b/lib/librte_ring/rte_ring.h @@ -188,10 +188,12 @@ struct rte_ring { * The number to add to the object-oriented statistics. */ #ifdef RTE_LIBRTE_RING_DEBUG -#define __RING_STAT_ADD(r, name, n) do { \ - unsigned __lcore_id = rte_lcore_id(); \ - r->stats[__lcore_id].name##_objs += n; \ - r->stats[__lcore_id].name##_bulk += 1; \ +#define __RING_STAT_ADD(r, name, n) do {\ + unsigned __lcore_id = rte_lcore_id(); \ + if (__lcore_id < RTE_MAX_LCORE) { \ + r->stats[__lcore_id].name##_objs += n; \ + r->stats[__lcore_id].name##_bulk += 1; \ + } \ } while(0) #else #define __RING_STAT_ADD(r, name, n) do {} while(0) -- 1.8.1.4
[dpdk-dev] [PATCH v2 15/15] timer: add support to non-EAL thread
Allow to setup timers only for EAL (lcore) threads (__lcore_id < MAX_LCORE_ID). E.g. ? dynamically created thread will be able to reset/stop timer for lcore thread, but it will be not allowed to setup timer for itself or another non-lcore thread. rte_timer_manage() for non-lcore thread would simply do nothing and return straightway. Signed-off-by: Cunming Liang --- lib/librte_timer/rte_timer.c | 40 +++- lib/librte_timer/rte_timer.h | 2 +- 2 files changed, 32 insertions(+), 10 deletions(-) diff --git a/lib/librte_timer/rte_timer.c b/lib/librte_timer/rte_timer.c index 269a992..601c159 100644 --- a/lib/librte_timer/rte_timer.c +++ b/lib/librte_timer/rte_timer.c @@ -79,9 +79,10 @@ static struct priv_timer priv_timer[RTE_MAX_LCORE]; /* when debug is enabled, store some statistics */ #ifdef RTE_LIBRTE_TIMER_DEBUG -#define __TIMER_STAT_ADD(name, n) do { \ - unsigned __lcore_id = rte_lcore_id(); \ - priv_timer[__lcore_id].stats.name += (n); \ +#define __TIMER_STAT_ADD(name, n) do { \ + unsigned __lcore_id = rte_lcore_id(); \ + if (__lcore_id < RTE_MAX_LCORE) \ + priv_timer[__lcore_id].stats.name += (n); \ } while(0) #else #define __TIMER_STAT_ADD(name, n) do {} while(0) @@ -127,15 +128,26 @@ timer_set_config_state(struct rte_timer *tim, unsigned lcore_id; lcore_id = rte_lcore_id(); + if (lcore_id >= RTE_MAX_LCORE) + lcore_id = LCORE_ID_ANY; /* wait that the timer is in correct status before update, * and mark it as being configured */ while (success == 0) { prev_status.u32 = tim->status.u32; + /* +* prevent race condition of non-EAL threads +* to update the timer. When 'owner == LCORE_ID_ANY', +* it means updated by a non-EAL thread. +*/ + if (lcore_id == (unsigned)LCORE_ID_ANY && + (uint16_t)lcore_id == prev_status.owner) + return -1; + /* timer is running on another core, exit */ if (prev_status.state == RTE_TIMER_RUNNING && - (unsigned)prev_status.owner != lcore_id) + prev_status.owner != (uint16_t)lcore_id) return -1; /* timer is being configured on another core */ @@ -366,9 +378,13 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, /* round robin for tim_lcore */ if (tim_lcore == (unsigned)LCORE_ID_ANY) { - tim_lcore = rte_get_next_lcore(priv_timer[lcore_id].prev_lcore, - 0, 1); - priv_timer[lcore_id].prev_lcore = tim_lcore; + if (lcore_id < RTE_MAX_LCORE) { + tim_lcore = rte_get_next_lcore( + priv_timer[lcore_id].prev_lcore, + 0, 1); + priv_timer[lcore_id].prev_lcore = tim_lcore; + } else + tim_lcore = rte_get_next_lcore(LCORE_ID_ANY, 0, 1); } /* wait that the timer is in correct status before update, @@ -378,7 +394,8 @@ __rte_timer_reset(struct rte_timer *tim, uint64_t expire, return -1; __TIMER_STAT_ADD(reset, 1); - if (prev_status.state == RTE_TIMER_RUNNING) { + if (prev_status.state == RTE_TIMER_RUNNING && + lcore_id < RTE_MAX_LCORE) { priv_timer[lcore_id].updated = 1; } @@ -455,7 +472,8 @@ rte_timer_stop(struct rte_timer *tim) return -1; __TIMER_STAT_ADD(stop, 1); - if (prev_status.state == RTE_TIMER_RUNNING) { + if (prev_status.state == RTE_TIMER_RUNNING && + lcore_id < RTE_MAX_LCORE) { priv_timer[lcore_id].updated = 1; } @@ -499,6 +517,10 @@ void rte_timer_manage(void) uint64_t cur_time; int i, ret; + /* timer manager only runs on EAL thread */ + if (lcore_id >= RTE_MAX_LCORE) + return; + __TIMER_STAT_ADD(manage, 1); /* optimize for the case where per-cpu list is empty */ if (priv_timer[lcore_id].pending_head.sl_next[0] == NULL) diff --git a/lib/librte_timer/rte_timer.h b/lib/librte_timer/rte_timer.h index 4907cf5..5c5df91 100644 --- a/lib/librte_timer/rte_timer.h +++ b/lib/librte_timer/rte_timer.h @@ -76,7 +76,7 @@ extern "C" { #define RTE_TIMER_RUNNING 2 /**< State: timer function is running. */ #define RTE_TIMER_CONFIG 3 /**< State: timer is being configured. */ -#define RTE_TIMER_NO_OWNER -1 /**< Timer has no owner. */ +#define RTE_TIMER_NO_OWNER -2 /**< Timer has no owner. */ /** * Timer type: Periodic or single (one-shot). -- 1.8.1.4
[dpdk-dev] [PATCH v2 10/15] log: fix the gap to support non-EAL thread
For those non-EAL thread, *_lcore_id* is invalid and probably larger than RTE_MAX_LCORE. The patch adds the check and allows only EAL thread using EAL per thread log level and log type. Others shares the global log level. Signed-off-by: Cunming Liang --- lib/librte_eal/common/eal_common_log.c | 17 +++-- lib/librte_eal/common/include/rte_log.h | 5 + 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/lib/librte_eal/common/eal_common_log.c b/lib/librte_eal/common/eal_common_log.c index cf57619..e8dc94a 100644 --- a/lib/librte_eal/common/eal_common_log.c +++ b/lib/librte_eal/common/eal_common_log.c @@ -193,11 +193,20 @@ rte_set_log_type(uint32_t type, int enable) rte_logs.type &= (~type); } +/* Get global log type */ +uint32_t +rte_get_log_type(void) +{ + return rte_logs.type; +} + /* get the current loglevel for the message beeing processed */ int rte_log_cur_msg_loglevel(void) { unsigned lcore_id; lcore_id = rte_lcore_id(); + if (lcore_id >= RTE_MAX_LCORE) + return rte_get_log_level(); return log_cur_msg[lcore_id].loglevel; } @@ -206,6 +215,8 @@ int rte_log_cur_msg_logtype(void) { unsigned lcore_id; lcore_id = rte_lcore_id(); + if (lcore_id >= RTE_MAX_LCORE) + return rte_get_log_type(); return log_cur_msg[lcore_id].logtype; } @@ -265,8 +276,10 @@ rte_vlog(__attribute__((unused)) uint32_t level, /* save loglevel and logtype in a global per-lcore variable */ lcore_id = rte_lcore_id(); - log_cur_msg[lcore_id].loglevel = level; - log_cur_msg[lcore_id].logtype = logtype; + if (lcore_id < RTE_MAX_LCORE) { + log_cur_msg[lcore_id].loglevel = level; + log_cur_msg[lcore_id].logtype = logtype; + } ret = vfprintf(f, format, ap); fflush(f); diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index db1ea08..f83a0d9 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -144,6 +144,11 @@ uint32_t rte_get_log_level(void); void rte_set_log_type(uint32_t type, int enable); /** + * Get the global log type. + */ +uint32_t rte_get_log_type(void); + +/** * Get the current loglevel for the message being processed. * * Before calling the user-defined stream for logging, the log -- 1.8.1.4
[dpdk-dev] [PATCH v2 09/15] malloc: fix the issue of SOCKET_ID_ANY
Add check for rte_socket_id(), avoid get unexpected return like (-1). Signed-off-by: Cunming Liang --- lib/librte_malloc/malloc_heap.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/librte_malloc/malloc_heap.h b/lib/librte_malloc/malloc_heap.h index b4aec45..a47136d 100644 --- a/lib/librte_malloc/malloc_heap.h +++ b/lib/librte_malloc/malloc_heap.h @@ -44,7 +44,12 @@ extern "C" { static inline unsigned malloc_get_numa_socket(void) { - return rte_socket_id(); + unsigned socket_id = rte_socket_id(); + + if (socket_id == (unsigned)SOCKET_ID_ANY) + return 0; + + return socket_id; } void * -- 1.8.1.4
[dpdk-dev] ACL trie insertion and search
Hi, We were converting the acl rule data, from host to network byte order, [by mistake] while inserting into trie. And while searching we are not converting the search data to n/w byte order. With the above also rules are matching, except few scenarios. After correcting the above mistake, all rules are matching perfectly fine. I believe the above[converting while insertion] should also work for all rules. Please clarify the above. Regards, Varun
[dpdk-dev] [PATCH v1 0/5] Interrupt mode for PMD
The patch series introduce low-latency one-shot rx interrupt into DPDK with polling and interrupt mode switch control example. DPDK userspace interrupt notification and handling mechanism is based on UIO with below limitation: 1) It is designed to handle LSC interrupt only with inefficient suspended pthread wakeup procedure (e.g. UIO wakes up LSC interrupt handling thread which then wakes up DPDK polling thread). In this way, it introduces non-deterministic wakeup latency for DPDK polling thread as well as packet latency if it is used to handle Rx interrupt. 2) UIO only supports a single interrupt vector which has to been shared by LSC interrupt and interrupts assigned to dedicated rx queues. This patchset includes below features: 1) Enable one-shot rx queue interrupt in ixgbe PMD(PF & VF) and igb PMD(PF only). 2) Build on top of the VFIO mechanism instead of UIO, so it could support up to 64 interrupt vectors for rx queue interrupts. 3) Have 1 DPDK polling thread handle per Rx queue interrupt with a dedicated VFIO eventfd, which eliminates non-deterministic pthread wakeup latency in user space. 4) Demonstrate interrupts control APIs and userspace NAIP-like polling/interrupt switch algorithms in L3fwd-power example. Known limitations: 1) It does not work for UIO due to a single interrupt eventfd shared by LSC and rx queue interrupt handlers causes a mess. 2) LSC interrupt is not supported by VF driver, so it is by default disabled in L3fwd-power now. Feel free to turn in on if you want to support both LSC and rx queue interrupts on a PF. Danny Zhou (5): ethdev: add rx interrupt enable/disable functions ixgbe: enable rx queue interrupts for both PF and VF igb: enable rx queue interrupts for PF eal: add per rx queue interrupt handling based on VFIO L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch examples/l3fwd-power/main.c| 170 +++--- lib/librte_eal/common/include/rte_eal.h| 9 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 186 --- lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 11 +- .../linuxapp/eal/include/exec-env/rte_interrupts.h | 4 + lib/librte_ether/rte_ethdev.c | 45 +++ lib/librte_ether/rte_ethdev.h | 57 lib/librte_pmd_e1000/e1000/e1000_hw.h | 3 + lib/librte_pmd_e1000/e1000_ethdev.h| 6 + lib/librte_pmd_e1000/igb_ethdev.c | 265 +-- lib/librte_pmd_ixgbe/ixgbe_ethdev.c| 371 + lib/librte_pmd_ixgbe/ixgbe_ethdev.h| 9 + 12 files changed, 1028 insertions(+), 108 deletions(-) -- 1.8.1.4
[dpdk-dev] [PATCH v1 1/5] ethdev: add rx interrupt enable/disable functions
Signed-off-by: Danny Zhou --- lib/librte_ether/rte_ethdev.c | 45 ++ lib/librte_ether/rte_ethdev.h | 57 +++ 2 files changed, 102 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ea3a1fb..dd66cd9 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -2825,6 +2825,51 @@ _rte_eth_dev_callback_process(struct rte_eth_dev *dev, } rte_spinlock_unlock(&rte_eth_dev_cb_lock); } + +int +rte_eth_dev_rx_queue_intr_enable(uint8_t port_id, + uint16_t queue_id) +{ + struct rte_eth_dev *dev; + + if (port_id >= nb_ports) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + return (-ENODEV); + } + + dev = &rte_eth_devices[port_id]; + if (dev == NULL) { + PMD_DEBUG_TRACE("Invalid port device\n"); + return (-ENODEV); + } + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_enable, -ENOTSUP); + (*dev->dev_ops->rx_queue_intr_enable)(dev, queue_id); + return 0; +} + +int +rte_eth_dev_rx_queue_intr_disable(uint8_t port_id, + uint16_t queue_id) +{ + struct rte_eth_dev *dev; + + if (port_id >= nb_ports) { + PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id); + return (-ENODEV); + } + + dev = &rte_eth_devices[port_id]; + if (dev == NULL) { + PMD_DEBUG_TRACE("Invalid port device\n"); + return (-ENODEV); + } + + FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_intr_disable, -ENOTSUP); + (*dev->dev_ops->rx_queue_intr_disable)(dev, queue_id); + return 0; +} + #ifdef RTE_NIC_BYPASS int rte_eth_dev_bypass_init(uint8_t port_id) { diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 1200c1c..c080039 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -848,6 +848,8 @@ struct rte_eth_fdir { struct rte_intr_conf { /** enable/disable lsc interrupt. 0 (default) - disable, 1 enable */ uint16_t lsc; + /** enable/disable rxq interrupt. 0 (default) - disable, 1 enable */ + uint16_t rxq; }; /** @@ -1108,6 +1110,14 @@ typedef int (*eth_tx_queue_setup_t)(struct rte_eth_dev *dev, const struct rte_eth_txconf *tx_conf); /**< @internal Setup a transmit queue of an Ethernet device. */ +typedef int (*eth_rx_enable_intr_t)(struct rte_eth_dev *dev, + uint16_t rx_queue_id); +/**< @internal Enable interrupt of a receive queue of an Ethernet device. */ + +typedef int (*eth_rx_disable_intr_t)(struct rte_eth_dev *dev, + uint16_t rx_queue_id); +/**< @internal Disable interrupt of a receive queue of an Ethernet device. */ + typedef void (*eth_queue_release_t)(void *queue); /**< @internal Release memory resources allocated by given RX/TX queue. */ @@ -1444,6 +1454,8 @@ struct eth_dev_ops { eth_queue_start_t tx_queue_start;/**< Start TX for a queue.*/ eth_queue_stop_t tx_queue_stop;/**< Stop TX for a queue.*/ eth_rx_queue_setup_t rx_queue_setup;/**< Set up device RX queue.*/ + eth_rx_enable_intr_t rx_queue_intr_enable; /**< Enable Rx queue interrupt. */ + eth_rx_disable_intr_t rx_queue_intr_disable; /**< Disable Rx queue interrupt.*/ eth_queue_release_trx_queue_release;/**< Release RX queue.*/ eth_rx_queue_count_t rx_queue_count; /**< Get Rx queue count. */ eth_rx_descriptor_done_t rx_descriptor_done; /**< Check rxd DD bit */ @@ -2810,6 +2822,51 @@ void _rte_eth_dev_callback_process(struct rte_eth_dev *dev, enum rte_eth_event_type event); /** + * When there is no rx packet coming in Rx Queue for a long time, we can + * sleep lcore related to RX Queue for power saving, and enable rx interrupt + * to be triggered when rx packect arrives. + * + * The rte_eth_dev_rx_queue_intr_enable() function enables rx queue + * interrupt on specific rx queue of a port. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The index of the receive queue from which to retrieve input packets. + * The value must be in the range [0, nb_rx_queue - 1] previously supplied + * to rte_eth_dev_configure(). + * @return + * - (0) if successful. + * - (-ENOTSUP) if underlying hardware OR driver doesn't support + * that operation. + * - (-ENODEV) if *port_id* invalid. + */ +int rte_eth_dev_rx_queue_intr_enable(uint8_t port_id, + uint16_t queue_id); + +/** + * When lcore wakes up from rx interrupt indicating packet coming, disable rx + * interrupt and returns to polling mode. + * + * The rte_eth_dev_rx_queue_intr_disable() function di
[dpdk-dev] [PATCH v1 2/5] ixgbe: enable rx queue interrupts for both PF and VF
Signed-off-by: Danny Zhou --- lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 371 lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 9 + 2 files changed, 380 insertions(+) diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c index b341dd0..39f883a 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include "ixgbe_logs.h" @@ -173,6 +174,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev, uint16_t reta_size); static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev); static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev); +static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev); static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev); static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev); static void ixgbe_dev_interrupt_handler(struct rte_intr_handle *handle, @@ -186,11 +188,14 @@ static void ixgbe_dcb_init(struct ixgbe_hw *hw,struct ixgbe_dcb_config *dcb_conf /* For Virtual Function support */ static int eth_ixgbevf_dev_init(struct eth_driver *eth_drv, struct rte_eth_dev *eth_dev); +static int ixgbevf_dev_interrupt_get_status(struct rte_eth_dev *dev); +static int ixgbevf_dev_interrupt_action(struct rte_eth_dev *dev); static int ixgbevf_dev_configure(struct rte_eth_dev *dev); static int ixgbevf_dev_start(struct rte_eth_dev *dev); static void ixgbevf_dev_stop(struct rte_eth_dev *dev); static void ixgbevf_dev_close(struct rte_eth_dev *dev); static void ixgbevf_intr_disable(struct ixgbe_hw *hw); +static void ixgbevf_intr_enable(struct ixgbe_hw *hw); static void ixgbevf_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats); static void ixgbevf_dev_stats_reset(struct rte_eth_dev *dev); @@ -198,8 +203,15 @@ static int ixgbevf_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on); static void ixgbevf_vlan_strip_queue_set(struct rte_eth_dev *dev, uint16_t queue, int on); +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, u8 msix_vector); static void ixgbevf_vlan_offload_set(struct rte_eth_dev *dev, int mask); static void ixgbevf_set_vfta_all(struct rte_eth_dev *dev, bool on); +static void ixgbevf_dev_interrupt_handler(struct rte_intr_handle *handle, + void *param); +static int ixgbevf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); +static int ixgbevf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void ixgbevf_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, u8 msix_vector); +static void ixgbevf_configure_msix(struct ixgbe_hw *hw); /* For Eth VMDQ APIs support */ static int ixgbe_uc_hash_table_set(struct rte_eth_dev *dev, struct @@ -217,6 +229,11 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev, static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t rule_id); +static int ixgbe_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); +static int ixgbe_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void ixgbe_set_ivar(struct ixgbe_hw *hw, s8 direction, u8 queue, u8 msix_vector); +static void ixgbe_configure_msix(struct ixgbe_hw *hw); + static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx, uint16_t tx_rate); static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf, @@ -338,6 +355,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = { .tx_queue_start = ixgbe_dev_tx_queue_start, .tx_queue_stop= ixgbe_dev_tx_queue_stop, .rx_queue_setup = ixgbe_dev_rx_queue_setup, + .rx_queue_intr_enable = ixgbe_dev_rx_queue_intr_enable, + .rx_queue_intr_disable = ixgbe_dev_rx_queue_intr_disable, .rx_queue_release = ixgbe_dev_rx_queue_release, .rx_queue_count = ixgbe_dev_rx_queue_count, .rx_descriptor_done = ixgbe_dev_rx_descriptor_done, @@ -412,8 +431,11 @@ static struct eth_dev_ops ixgbevf_eth_dev_ops = { .vlan_offload_set = ixgbevf_vlan_offload_set, .rx_queue_setup = ixgbe_dev_rx_queue_setup, .rx_queue_release = ixgbe_dev_rx_queue_release, + .rx_descriptor_done = ixgbe_dev_rx_descriptor_done, .tx_queue_setup = ixgbe_dev_tx_queue_setup, .tx_queue_release = ixgbe_dev_tx_queue_release, + .rx_queue_intr_enable = ixgbevf_dev_rx_queue_intr_enable, + .rx_queue_intr_disable = ixgbevf_dev_rx_queue_intr_disable, .mac_addr_add = ixgbevf_add_mac_addr, .mac_addr_remove = ixgbevf_remove_mac_addr, }; @@ -908,6 +930,9 @@ eth_ixgbe_dev_init(__attribute__((unused)) struct eth_driver *eth_drv, eth_dev->data->port_id,
[dpdk-dev] [PATCH v1 3/5] igb: enable rx queue interrupts for PF
Signed-off-by: Danny Zhou --- lib/librte_pmd_e1000/e1000/e1000_hw.h | 3 + lib/librte_pmd_e1000/e1000_ethdev.h | 6 + lib/librte_pmd_e1000/igb_ethdev.c | 265 ++ 3 files changed, 249 insertions(+), 25 deletions(-) diff --git a/lib/librte_pmd_e1000/e1000/e1000_hw.h b/lib/librte_pmd_e1000/e1000/e1000_hw.h index 4dd92a3..9b999ec 100644 --- a/lib/librte_pmd_e1000/e1000/e1000_hw.h +++ b/lib/librte_pmd_e1000/e1000/e1000_hw.h @@ -780,6 +780,9 @@ struct e1000_mac_info { u16 mta_reg_count; u16 uta_reg_count; + u32 max_rx_queues; + u32 max_tx_queues; + /* Maximum size of the MTA register table in all supported adapters */ #define MAX_MTA_REG 128 u32 mta_shadow[MAX_MTA_REG]; diff --git a/lib/librte_pmd_e1000/e1000_ethdev.h b/lib/librte_pmd_e1000/e1000_ethdev.h index d155e77..713ca11 100644 --- a/lib/librte_pmd_e1000/e1000_ethdev.h +++ b/lib/librte_pmd_e1000/e1000_ethdev.h @@ -34,6 +34,8 @@ #ifndef _E1000_ETHDEV_H_ #define _E1000_ETHDEV_H_ +#include + /* need update link, bit flag */ #define E1000_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0) #define E1000_FLAG_MAILBOX (uint32_t)(1 << 1) @@ -105,10 +107,14 @@ #define E1000_FTQF_QUEUE_SHIFT 16 #define E1000_FTQF_QUEUE_ENABLE 0x0100 +/* maximum number of other interrupts besides Rx & Tx interrupts */ +#define E1000_MAX_OTHER_INTR 1 + /* structure for interrupt relative data */ struct e1000_interrupt { uint32_t flags; uint32_t mask; + rte_spinlock_t lock; }; /* local vfta copy */ diff --git a/lib/librte_pmd_e1000/igb_ethdev.c b/lib/librte_pmd_e1000/igb_ethdev.c index 2a268b8..2a9bf00 100644 --- a/lib/librte_pmd_e1000/igb_ethdev.c +++ b/lib/librte_pmd_e1000/igb_ethdev.c @@ -97,6 +97,7 @@ static int eth_igb_flow_ctrl_get(struct rte_eth_dev *dev, static int eth_igb_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf); static int eth_igb_lsc_interrupt_setup(struct rte_eth_dev *dev); +static int eth_igb_rxq_interrupt_setup(struct rte_eth_dev *dev); static int eth_igb_interrupt_get_status(struct rte_eth_dev *dev); static int eth_igb_interrupt_action(struct rte_eth_dev *dev); static void eth_igb_interrupt_handler(struct rte_intr_handle *handle, @@ -191,6 +192,12 @@ static int eth_igb_filter_ctrl(struct rte_eth_dev *dev, enum rte_filter_op filter_op, void *arg); +static int eth_igb_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id); +static int eth_igb_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id); +static void eth_igb_assign_vector(struct e1000_hw *hw, s8 direction, u8 queue, u8 msix_vector); +static void eth_igb_configure_msix(struct e1000_hw *hw); +static void eth_igb_write_ivar(struct e1000_hw *hw, u8 msix_vector, u8 index, u8 offset); + /* * Define VF Stats MACRO for Non "cleared on read" register */ @@ -250,6 +257,8 @@ static struct eth_dev_ops eth_igb_ops = { .vlan_tpid_set= eth_igb_vlan_tpid_set, .vlan_offload_set = eth_igb_vlan_offload_set, .rx_queue_setup = eth_igb_rx_queue_setup, + .rx_queue_intr_enable = eth_igb_rx_queue_intr_enable, + .rx_queue_intr_disable = eth_igb_rx_queue_intr_disable, .rx_queue_release = eth_igb_rx_queue_release, .rx_queue_count = eth_igb_rx_queue_count, .rx_descriptor_done = eth_igb_rx_descriptor_done, @@ -592,6 +601,16 @@ eth_igb_dev_init(__attribute__((unused)) struct eth_driver *eth_drv, eth_dev->data->port_id, pci_dev->id.vendor_id, pci_dev->id.device_id); + /* set max interrupt vfio request */ + struct rte_eth_dev_info dev_info; + + memset(&dev_info, 0, sizeof(dev_info)); + eth_igb_infos_get(eth_dev, &dev_info); + + hw->mac.max_rx_queues = dev_info.max_rx_queues; + + pci_dev->intr_handle.max_intr = hw->mac.max_rx_queues + E1000_MAX_OTHER_INTR; + rte_intr_callback_register(&(pci_dev->intr_handle), eth_igb_interrupt_handler, (void *)eth_dev); @@ -754,7 +773,7 @@ eth_igb_start(struct rte_eth_dev *dev) { struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private); - int ret, i, mask; + int ret, mask; uint32_t ctrl_ext; PMD_INIT_FUNC_TRACE(); @@ -794,6 +813,9 @@ eth_igb_start(struct rte_eth_dev *dev) /* configure PF module if SRIOV enabled */ igb_pf_host_configure(dev); + /* confiugre msix for sleep until rx interrupt */ + eth_igb_configure_msix(hw); + /* Configure for OS presence */ igb_init_manageability(hw); @@ -821,33 +843,9 @@ eth_igb_start(struct rte_eth_dev *dev) igb_vmdq_vlan_hw_filter_enable(dev); } - /* -* Configure the Interrupt Moderation register (EITR) with the maximum -
[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO
Signed-off-by: Danny Zhou Signed-off-by: Yong Liu --- lib/librte_eal/common/include/rte_eal.h| 9 + lib/librte_eal/linuxapp/eal/eal_interrupts.c | 186 - lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 11 +- .../linuxapp/eal/include/exec-env/rte_interrupts.h | 4 + 4 files changed, 168 insertions(+), 42 deletions(-) diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h index f4ecd2e..5f31aa5 100644 --- a/lib/librte_eal/common/include/rte_eal.h +++ b/lib/librte_eal/common/include/rte_eal.h @@ -150,6 +150,15 @@ int rte_eal_iopl_init(void); * - On failure, a negative error value. */ int rte_eal_init(int argc, char **argv); + +/** + * @param port_id + * the port id + * @return + * - On success, return 0 + */ +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id); + /** * Usage function typedef used by the application usage function. * diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c index dc2668a..b120303 100644 --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c @@ -64,6 +64,7 @@ #include #include #include +#include #include "eal_private.h" #include "eal_vfio.h" @@ -127,6 +128,7 @@ static pthread_t intr_thread; #ifdef VFIO_PRESENT #define IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int)) +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) * (VFIO_MAX_QUEUE_ID + 1)) /* enable legacy (INTx) interrupts */ static int @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) { /* enable MSI-X interrupts */ static int vfio_enable_msi(struct rte_intr_handle *intr_handle) { - int len, ret; + int len, ret, max_intr; char irq_set_buf[IRQ_SET_BUF_LEN]; struct vfio_irq_set *irq_set; int *fd_ptr; @@ -230,12 +232,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { irq_set = (struct vfio_irq_set *) irq_set_buf; irq_set->argsz = len; - irq_set->count = 1; + if ((!intr_handle->max_intr) || + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) + max_intr = VFIO_MAX_QUEUE_ID + 1; + else + max_intr = intr_handle->max_intr; + + irq_set->count = max_intr; irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER; irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; irq_set->start = 0; fd_ptr = (int *) &irq_set->data; - *fd_ptr = intr_handle->fd; + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd)); + fd_ptr[max_intr - 1] = intr_handle->fd; ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -244,23 +253,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { intr_handle->fd); return -1; } - - /* manually trigger interrupt to enable it */ - memset(irq_set, 0, len); - len = sizeof(struct vfio_irq_set); - irq_set->argsz = len; - irq_set->count = 1; - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER; - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; - irq_set->start = 0; - - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); - - if (ret) { - RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n", - intr_handle->fd); - return -1; - } return 0; } @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) { /* enable MSI-X interrupts */ static int vfio_enable_msix(struct rte_intr_handle *intr_handle) { - int len, ret; - char irq_set_buf[IRQ_SET_BUF_LEN]; + int len, ret, max_intr; + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; struct vfio_irq_set *irq_set; int *fd_ptr; @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) { irq_set = (struct vfio_irq_set *) irq_set_buf; irq_set->argsz = len; - irq_set->count = 1; + if ((!intr_handle->max_intr) || + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) + max_intr = VFIO_MAX_QUEUE_ID + 1; + else + max_intr = intr_handle->max_intr; + + irq_set->count = max_intr; irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER; irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX; irq_set->start = 0; fd_ptr = (int *) &irq_set->data; - *fd_ptr = intr_handle->fd; + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd)); + fd_ptr[max_intr - 1] = intr_handle->fd; ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); @@ -316,22 +315,6 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) {
[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?
> -Original Message- > From: Linhaifeng [mailto:haifeng.lin at huawei.com] > Sent: Tuesday, January 27, 2015 3:57 PM > To: dpd >> dev at dpdk.org; ms >> Michael S. Tsirkin > Cc: lilijun; liuyongan at huawei.com; Xie, Huawei > Subject: vhost: virtio-net rx-ring stop work after work many hours,bug? > > Hi,all > > I use vhost-user to send data to VM at first it cant work well but after many > hours VM can not receive data but can send data. > > (gdb)p avail_idx > $4 = 2668 > (gdb)p free_entries > $5 = 0 > (gdb)l > /* check that we have enough buffers */ > if (unlikely(count > free_entries)) > count = free_entries; > > if (count == 0){ > int b=0; > if(b) { // when set b=1 to notify guest rx_ring will restart to > work > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) { > > eventfd_write(vq->callfd, 1); > } > } > return 0; > } > > some info i print in guest: > > net eth3:vi->num=199 > net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 > net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644 > > net eth3:vi->num=199 > net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 > net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645 > > net eth3:vi->num=199 > net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668 > net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646 > > # free > total used free sharedbuffers cached > Mem: 3924100 3372523586848 0 95984 138060 > -/+ buffers/cache: 1032083820892 > Swap: 970748 0 970748 > > I have two questions: > 1.Should we need to notify guest when there is no buffer in vq->avail? > 2.Why virtio_net stop to fill avail? > > Haifeng: Thanks for reporting this issue. It might not be vhost-user specific, because as long vhost-user has received all the vring information correctly, it shares the same code receiving/transmitting packets with vhost-cuse. Are you using latest patch or the old patch? 1 Do you disable merge-able feature support in vhost example? There is an bug in vhost-user feature negotiation which is fixed in latest patch. It could cause guest not receive packets at all. So if you are testing only using linux net device, this isn't the cause. 2.Do you still have the spot? Could you check if there are available descriptors from checking the desc ring or even dump the vring status? Check the notify_on_empty flag Michael mentioned? I find a bug in vhost library when processing three or more chained descriptors. But if you never re-configure eth0 with different features, this isn't the cause. 3. Is this reproduce-able? Next time if you run long hours stability test, could you try to disable guest virtio feature? -device virtio-net-pci,netdev=mynet0,mac=54:00:00:54:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off I have run more than ten hours' nightly test many times before, and haven't met this issue. We will check * if there is issue in the vhost code delivering interrupts to guest which cause potential deadlock *if there are places we should but miss delivering interrupts to guest. > > > > > -- > Regards, > Haifeng
[dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch
Signed-off-by: Danny Zhou --- examples/l3fwd-power/main.c | 170 +--- 1 file changed, 129 insertions(+), 41 deletions(-) diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index f6b55b9..e6e4f55 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -75,12 +75,13 @@ #include #include #include +#include #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 #define MAX_PKT_BURST 32 -#define MIN_ZERO_POLL_COUNT 5 +#define MIN_ZERO_POLL_COUNT 10 /* around 100ms at 2 Ghz */ #define TIMER_RESOLUTION_CYCLES 2ULL @@ -188,6 +189,9 @@ struct lcore_rx_queue { #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS #define MAX_RX_QUEUE_PER_PORT 128 +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16 + + #define MAX_LCORE_PARAMS 1024 struct lcore_params { uint8_t port_id; @@ -214,7 +218,7 @@ static uint16_t nb_lcore_params = sizeof(lcore_params_array_default) / static struct rte_eth_conf port_conf = { .rxmode = { - .mq_mode= ETH_MQ_RX_RSS, + .mq_mode = ETH_MQ_RX_RSS, .max_rx_pkt_len = ETHER_MAX_LEN, .split_hdr_size = 0, .header_split = 0, /**< Header Split disabled */ @@ -226,11 +230,14 @@ static struct rte_eth_conf port_conf = { .rx_adv_conf = { .rss_conf = { .rss_key = NULL, - .rss_hf = ETH_RSS_IP, + .rss_hf = ETH_RSS_UDP, }, }, .txmode = { - .mq_mode = ETH_DCB_NONE, + .mq_mode = ETH_MQ_TX_NONE, + }, + .intr_conf = { + .rxq = 1, /**< rxq interrupt feature enabled */ }, }; @@ -402,19 +409,22 @@ power_timer_cb(__attribute__((unused)) struct rte_timer *tim, /* accumulate total execution time in us when callback is invoked */ sleep_time_ratio = (float)(stats[lcore_id].sleep_time) / (float)SCALING_PERIOD; - /** * check whether need to scale down frequency a step if it sleep a lot. */ - if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) - rte_power_freq_down(lcore_id); + if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) { + if (rte_power_freq_down) + rte_power_freq_down(lcore_id); + } else if ( (unsigned)(stats[lcore_id].nb_rx_processed / - stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) + stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) { /** * scale down a step if average packet per iteration less * than expectation. */ - rte_power_freq_down(lcore_id); + if (rte_power_freq_down) + rte_power_freq_down(lcore_id); + } /** * initialize another timer according to current frequency to ensure @@ -707,22 +717,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t portid, } -#define SLEEP_GEAR1_THRESHOLD100 -#define SLEEP_GEAR2_THRESHOLD1000 +#define MINIMUM_SLEEP_TIME 1 +#define SUSPEND_THRESHOLD 300 static inline uint32_t power_idle_heuristic(uint32_t zero_rx_packet_count) { - /* If zero count is less than 100, use it as the sleep time in us */ - if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD) - return zero_rx_packet_count; - /* If zero count is less than 1000, sleep time should be 100 us */ - else if ((zero_rx_packet_count >= SLEEP_GEAR1_THRESHOLD) && - (zero_rx_packet_count < SLEEP_GEAR2_THRESHOLD)) - return SLEEP_GEAR1_THRESHOLD; - /* If zero count is greater than 1000, sleep time should be 1000 us */ - else if (zero_rx_packet_count >= SLEEP_GEAR2_THRESHOLD) - return SLEEP_GEAR2_THRESHOLD; + /* If zero count is less than 100, sleep 1us */ + if (zero_rx_packet_count < SUSPEND_THRESHOLD) + return MINIMUM_SLEEP_TIME; + /* If zero count is less than 1000, sleep 100 us which is the minimum latency + switching from C3/C6 to C0 + */ + else + return SUSPEND_THRESHOLD; return 0; } @@ -762,6 +770,35 @@ power_freq_scaleup_heuristic(unsigned lcore_id, return FREQ_CURRENT; } +/** + * force polling thread sleep until one-shot rx interrupt triggers + * @param port_id + * Port id. + * @param queue_id + * Rx queue id. + * @return + * 0 on success + */ +static int +sleep_until_rx_interrupt(uint8_t port_id, uint8_t queue_id) +{ + /* Enable one-shot rx interrupt */ + rte_eth_dev_rx_queue_intr_enable(port_id, queue_id); + + RTE_LOG(INFO, L3FWD_POWER, + "lcore %u sleeps until interrupt on port%d,rxq%d triggers\n", + rte_lcore
[dpdk-dev] [PATCH v2] maintainers: start a Linux-style file
This MAINTAINERS file is inspired from the Linux one. Almost all files are split into areas in order to identify maintainers of each DPDK area. Note that a maintainer is not a git tree manager. Candidates are welcome to send a patch to sign up for one or several areas. There is a script to check coverage, especially when adding or moving files. Signed-off-by: Thomas Monjalon Acked-by: Neil Horman --- Changes in v2: - add copyright and licence to check-maintainers.sh - minor improvements in the script --- MAINTAINERS | 388 +++ scripts/check-maintainers.sh | 117 + 2 files changed, 505 insertions(+) create mode 100644 MAINTAINERS create mode 100755 scripts/check-maintainers.sh diff --git a/MAINTAINERS b/MAINTAINERS new file mode 100644 index 000..1f7d04a --- /dev/null +++ b/MAINTAINERS @@ -0,0 +1,388 @@ +DPDK Maintainers + + +The intention of this file is to provide a set of names that we can rely on +for helping in patch reviews and questions. +These names are additional recipients for emails sent to dev at dpdk.org. +Please avoid private emails. + +Descriptions of section entries: + + M: Maintainer's Full Name + T: Git tree location. + F: Files and directories with wildcard patterns. + A trailing slash includes all files and subdirectory files. + A wildcard includes all files but not subdirectories. + One pattern per line. Multiple F: lines acceptable. + X: Files and directories exclusion, same rules as F: + K: Keyword regex pattern to match content. + One regex pattern per line. Multiple K: lines acceptable. + + +General Project Administration +-- +M: Thomas Monjalon +T: git://dpdk.org/dpdk +F: MAINTAINERS +F: scripts/check-maintainers.sh + + +Security Issues +--- +M: maintainers at dpdk.org + + +Documentation (with overlaps) +- +F: doc/ + + +Build System + +F: GNUmakefile +F: Makefile +F: config/ +F: mk/ +F: pkg/ +F: scripts/depdirs-rule.sh +F: scripts/gen-build-mk.sh +F: scripts/gen-config-h.sh +F: scripts/relpath.sh + + +Environment Abstraction Layer +- + +EAL API and common code +M: Thomas Monjalon +F: lib/librte_eal/common/* +F: lib/librte_eal/common/include/* +F: lib/librte_eal/common/include/generic/ +F: app/test/test_alarm.c +F: app/test/test_atomic.c +F: app/test/test_byteorder.c +F: app/test/test_common.c +F: app/test/test_cpuflags.c +F: app/test/test_cycles.c +F: app/test/test_debug.c +F: app/test/test_devargs.c +F: app/test/test_eal* +F: app/test/test_errno.c +F: app/test/test_func_reentrancy.c +F: app/test/test_interrupts.c +F: app/test/test_logs.c +F: app/test/test_memcpy* +F: app/test/test_memory.c +F: app/test/test_memzone.c +F: app/test/test_pci.c +F: app/test/test_per_lcore.c +F: app/test/test_prefetch.c +F: app/test/test_rwlock.c +F: app/test/test_spinlock.c +F: app/test/test_string_fns.c +F: app/test/test_tailq.c +F: app/test/test_version.c + +Secondary process +K: RTE_PROC_ +F: doc/guides/prog_guide/multi_proc_support.rst +F: app/test/test_mp_secondary.c +F: examples/multi_process/ +F: doc/guides/sample_app_ug/multi_process.rst + +IBM Power +F: lib/librte_eal/common/include/arch/ppc_64/ + +Intel x86 +F: lib/librte_eal/common/include/arch/x86/ + +Linux EAL (with overlaps) +F: lib/librte_eal/linuxapp/Makefile +F: lib/librte_eal/linuxapp/eal/ +F: doc/guides/linux_gsg/ + +Linux UIO +F: lib/librte_eal/linuxapp/igb_uio/ +F: lib/librte_eal/linuxapp/eal/*uio* + +Linux VFIO +F: lib/librte_eal/linuxapp/eal/*vfio* + +Linux Xen +F: lib/librte_eal/linuxapp/xen_dom0/ +F: lib/librte_eal/linuxapp/eal/*xen* +F: lib/librte_eal/linuxapp/eal/include/exec-env/rte_dom0_common.h +F: lib/librte_mempool/rte_dom0_mempool.c +F: lib/librte_pmd_xenvirt/ +F: app/test-pmd/mempool_* +F: examples/vhost_xen/ +F: doc/guides/prog_guide/intel_dpdk_xen_based_packet_switch_sol.rst + +FreeBSD EAL (with overlaps) +F: lib/librte_eal/bsdapp/Makefile +F: lib/librte_eal/bsdapp/eal/ +F: doc/guides/freebsd_gsg/ + +FreeBSD contigmem +F: lib/librte_eal/bsdapp/contigmem/ + +FreeBSD UIO +F: lib/librte_eal/bsdapp/nic_uio/ + + +Core Libraries +-- + +Memory management +F: lib/librte_malloc/ +F: doc/guides/prog_guide/malloc_lib.rst +F: app/test/test_malloc.c +F: lib/librte_mempool/ +F: doc/guides/prog_guide/mempool_lib.rst +F: app/test/test_mempool* +F: app/test/test_func_reentrancy.c + +Ring queue +F: lib/librte_ring/ +F: app/test/test_ring* +F: app/test/test_func_reentrancy.c + +Packet buffer +F: lib/librte_mbuf/ +F: doc/guides/prog_guide/mbuf_lib.rst +F: app/test/test_mbuf.c + +Ethernet API +M: Thomas Monjalon +F: lib/librte_ether/ + + +Drivers +--- + +Link bonding +F: lib/librte_pmd_bond/ +F: doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst +F: app/test/test_link_bonding.c + +Linux KNI +F: lib/librte_eal/linuxapp/kni/ +F: lib/librte_kni/ +F: doc/guides/pr
[dpdk-dev] [PATCH] mk: allow application to override clean
Hi Stephen, On 01/23/2015 07:19 AM, stephen at networkplumber.org wrote: > From: Stephen Hemminger > > In some cases application may want to have additional rules > for clean. This can be handled by allowing the double colon > form of rule. > > https://www.gnu.org/software/make/manual/html_node/Double_002dColon.html There is already a way to do that in dpdk makefiles: you can add the following code in your application Makefile, before the line that includes $(RTE_SDK)/mk/rte.app.mk: POSTCLEAN += my_clean .PHONY: my_clean my_clean: @echo executed after clean Regards, Olivier
[dpdk-dev] [PATCH 0/6] Support NVGRE on i40e
Hi Min, On 01/27/2015 06:46 AM, Cao, Min wrote: > Test by: min.cao > Patch name: [dpdk-dev] [PATCH 0/6] Support NVGRE on i40e > Test Flag:Tested-by > Tester name: min.cao at intel.com > Result summary: total 2 cases, 2 passed, 0 failed > > Test Case 1: > Name: nvgre filter > Environment: OS: Fedora20 3.11.10-301.fc20.x86_64 > gcc (GCC) 4.8.2 > CPU: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz > NIC: Fortville eagle > [...] Just one remarq about your test report: it's quite useful to have such reports, showing that a feature is tested. However, I think it would be much better if you provide all the means to reproduce the test (testpmd configuration, scripts to generate packets, ...). For instance, this could help people wanting to implement the same on another PMD to validate with the same test plan than yours. Regards, Olivier
[dpdk-dev] ACL trie insertion and search
Hi > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Rapelly, Varun > Sent: Wednesday, January 28, 2015 9:07 AM > To: dev at dpdk.org > Subject: [dpdk-dev] ACL trie insertion and search > > Hi, > > We were converting the acl rule data, from host to network byte order, [by > mistake] while inserting into trie. And while searching we > are not converting the search data to n/w byte order. > With the above also rules are matching, except few scenarios. > > After correcting the above mistake, all rules are matching perfectly fine. > > I believe the above[converting while insertion] should also work for all > rules. Please clarify the above. Yes, that's correct. rte_acl_add_rules() expects all fields to be in host byte order, while rte_acl_classify()expects all fields in input data buffers to be in network byte order. As mentioned in comments in rte_acl.h. Konstantin > > Regards, > Varun
[dpdk-dev] Regarding UDP checksum offload
Hi, I am aware that this topic has been discussed several times before, but I am somehow still stuck with this. I am using dpdk 1.6r1, intel 82599 NIC. I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data portion, filled the relevant fields of the headers and I do a tx burst. No problems, the destination gets the packet. I filled UDP checksum as zero and there was no checksum offloaded in ol_flags. Now in the same usecase, I want to offload UDP checksum. I am aware that the checksum field in UDP header has to be filled with the pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag in ol_flags, did a tx_burst and the packet does not reach the destination. I realized that I have to fill the following fields as well (my packet does not have vlan tag) mbuf->pkt.vlan_macip.f.l2_len mbuf->pkt.vlan_macip.f.l3_len so I filled the l2_len as 14 and l3_len as 20 (IP header with no options) Yet the packet did not reach the destination. So my question is -- am I filling the l2_len and l3_len properly ? Is there anything else to be done before I can get this UDP checksum offload to work properly for me. Regards -Prashant
[dpdk-dev] [PATCH v2 0/3] PMD ring MAC management, fix initialization, link up/down
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tomasz Kulasek > Sent: Monday, January 19, 2015 11:57 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v2 0/3] PMD ring MAC management, fix > initialization, > link up/down > > Patch split into smaller parts to separate features from previous version. > > Tomasz Kulasek (3): > PMD Ring - Add link up/down functions > PMD Ring - Add MAC addr add/remove functions > PMD Ring - Fix for per device management > > lib/librte_pmd_ring/rte_eth_ring.c | 62 > +--- > 1 file changed, 57 insertions(+), 5 deletions(-) > > -- > 1.7.9.5 Acked-by: Declan Doherty
[dpdk-dev] DPDK testpmd forwarding performace degradation
On Tue, Jan 27, 2015 at 7:21 PM, De Lara Guarch, Pablo < pablo.de.lara.guarch at intel.com> wrote: > > > > On Tue, Jan 27, 2015 at 10:51 AM, Alexander Belyakov > > > wrote: > > > > > > Hi Pablo, > > > > > > On Mon, Jan 26, 2015 at 5:22 PM, De Lara Guarch, Pablo > > > wrote: > > > Hi Alexander, > > > > > > > -Original Message- > > > > From: dev [mailto:dev-bounces at dpdk.org ] On > Behalf Of Alexander > > > Belyakov > > > > Sent: Monday, January 26, 2015 10:18 AM > > > > To: dev at dpdk.org > > > > Subject: [dpdk-dev] DPDK testpmd forwarding performace degradation > > > > > > > > Hello, > > > > > > > > recently I have found a case of significant performance degradation > for our > > > > application (built on top of DPDK, of course). Surprisingly, similar > issue > > > > is easily reproduced with default testpmd. > > > > > > > > To show the case we need simple IPv4 UDP flood with variable UDP > > > payload > > > > size. Saying "packet length" below I mean: Eth header length (14 > bytes) + > > > > IPv4 header length (20 bytes) + UPD header length (8 bytes) + UDP > payload > > > > length (variable) + CRC (4 bytes). Source IP addresses and ports are > > > selected > > > > randomly for each packet. > > > > > > > > I have used DPDK with revisions 1.6.0r2 and 1.7.1. Both show the same > > > issue. > > > > > > > > Follow "Quick start" guide (http://dpdk.org/doc/quick-start) to build > and > > > > run testpmd. Enable testpmd forwarding ("start" command). > > > > > > > > Table below shows measured forwarding performance depending on > > > packet > > > > length: > > > > > > > > No. -- UDP payload length (bytes) -- Packet length (bytes) -- > Forwarding > > > > performance (Mpps) -- Expected theoretical performance (Mpps) > > > > > > > > 1. 0 -- 64 -- 14.8 -- 14.88 > > > > 2. 34 -- 80 -- 12.4 -- 12.5 > > > > 3. 35 -- 81 -- 6.2 -- 12.38 (!) > > > > 4. 40 -- 86 -- 6.6 -- 11.79 > > > > 5. 49 -- 95 -- 7.6 -- 10.87 > > > > 6. 50 -- 96 -- 10.7 -- 10.78 (!) > > > > 7. 60 -- 106 -- 9.4 -- 9.92 > > > > > > > > At line number 3 we have added 1 byte of UDP payload (comparing to > > > > previous > > > > line) and got forwarding performance halved! 6.2 Mpps against 12.38 > Mpps > > > > of > > > > expected theoretical maximum for this packet size. > > > > > > > > That is the issue. > > > > > > > > Significant performance degradation exists up to 50 bytes of UDP > payload > > > > (96 bytes packet length), where it jumps back to theoretical maximum. > > > > > > > > What is happening between 80 and 96 bytes packet length? > > > > > > > > This issue is stable and 100% reproducible. At this point I am not > sure if > > > > it is DPDK or NIC issue. These tests have been performed on Intel(R) > Eth > > > > Svr Bypass Adapter X520-LR2 (X520LR2BP). > > > > > > > > Is anyone aware of such strange behavior? > > > I cannot reproduce the issue using two ports on two different 82599EB > NICs, > > > using 1.7.1 and 1.8.0. > > > I always get either same or better linerate as I increase the packet > size. > > > > > > Thank you for trying to reproduce the issue. > > > > > > Actually, have you tried using 1.8.0? > > > > > > I feel 1.8.0 is little bit immature and might require some post-release > > > patching. Even tespmd from this release is not forwarding packets > properly > > > on my setup. It is up and running without visible errors/warnings, TX/RX > > > counters are ticking but I can not see any packets at the output. > > > > This is strange. Without changing anything, forwarding works perfectly > for me > > (so, RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is enabled). > > > > >Please note, both 1.6.0r2 and 1.7.1 releases work (on the same setup) > out-of-the-box just > > > fine with only exception of this mysterious performance drop. > > > So it will take some time to figure out what is wrong with dpdk-1.8.0. > > > Meanwhile we could focus on stable dpdk-1.7.1. > > > > > > Managed to get testpmd from dpdk-1.8.0 to work on my setup. > > > Unfortunately I had to disable RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC, > > > it is new comparing to 1.7.1 and somehow breaks testpmd forwarding. By > the > > > way, simply disabling RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC in > > > common_linuxapp config file breaks the build - had to make quick'n'dirty > fix > > > in struct igb_rx_queue as well. > > > > > > Anyway, issue is still here. > > > > > > Forwarding 80 bytes packets at 12.4 Mpps. > > > Forwarding 81 bytes packets at 7.2 Mpps. > > > > > > Any ideas? > > > As for X520-LR2 NIC - it is dual port bypass adapter with device id > 155d. I > > > believe it should be treated as 82599EB except bypass feature. I put > bypass > > > mode to "normal" in those tests. > > > > I have used a 82599EB first, and now a X520-SR2. Same results. > > I assume that X520-SR2 and X520-LR2 should give similar results > > (only thing that is changed is the wavelength, but the controller is the > same). > > > It seems I found what was wrong, at least got a hint. My build server mach
[dpdk-dev] Regarding UDP checksum offload
Hi Prashant, On 01/28/2015 12:25 PM, Prashant Upadhyaya wrote: > Hi, > > I am aware that this topic has been discussed several times before, but I > am somehow still stuck with this. > > I am using dpdk 1.6r1, intel 82599 NIC. > I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data > portion, filled the relevant fields of the headers and I do a tx burst. No > problems, the destination gets the packet. I filled UDP checksum as zero > and there was no checksum offloaded in ol_flags. > > Now in the same usecase, I want to offload UDP checksum. > I am aware that the checksum field in UDP header has to be filled with the > pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag in > ol_flags, did a tx_burst and the packet does not reach the destination. > > I realized that I have to fill the following fields as well (my packet does > not have vlan tag) > mbuf->pkt.vlan_macip.f.l2_len > mbuf->pkt.vlan_macip.f.l3_len > > so I filled the l2_len as 14 and l3_len as 20 (IP header with no options) > Yet the packet did not reach the destination. > > So my question is -- am I filling the l2_len and l3_len properly ? > Is there anything else to be done before I can get this UDP checksum > offload to work properly for me. As far as I remember, this should be working on 1.6r1. When you say "did not reach the destination", do you mean that the packet is not transmitted at all? Or is it transmitted with a wrong checksum? I think you should try to reproduce the issue with the latest DPDK which is known to work with test-pmd (csum forward engine). Regards, Olivier
[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them
On Thu, Jan 22, 2015 at 10:36:11AM +0200, Dan Aloni wrote: > While VFIO doesn't allow us to map complete BARs with MSI-X tables, > it does allow us to map around them in PAGE_SIZE granularity. There > might be adapters that provide their registers in the same BAR > but on a different page. For example, Intel's NVME adapter, though > not a network adapter, provides only one MMIO BAR that contains > the MSI-X table. > > Signed-off-by: Dan Aloni > CC: Anatoly Burakov Has anyone reviewed this yet? I am asking because I am interested to know whether someone is aiming to integrate storage controllers support into DPDK, and this patch could be instrumental. -- Dan Aloni
[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon > Sent: Tuesday, January 27, 2015 8:31 PM > To: dev at dpdk.org > Subject: [dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem > > Hi all, > > I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a back-to-back > setup both equipped with 82599EB 10-Gigabit NIC. > The problem is when I start it, pktgen indicates 1Mbits/s Tx with 64B > packet > size, but I'm receiving about 15% of it on the other end. > This percentage seems to be proportional with the packet size. > > e.g. > Using nload to read Rx traffic > Pktgen: Tx: 1Mbits/s==> Other end: Rx 1660 Mbits/s > Rate: 100% > Pkt size: 64B > > > e.g 2 > Pktgen: Tx: 1Mbits/s==> Other end: Rx 9385 Mbits/s > Rate: 100% > Pkt size: 1518B > > > Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz > ./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024 --file-prefix pg > -- -p > 0x3 -P -N -m "[1:3].0, [2:4].1" >From past experience I don't assign more than 1 core per port. It had some race conditions issues and one core I capable to RX or TX full 10G. Also check if you assign proper cores/memory for your NICs (the same NUMA node). > > Is there something I'm not configuring correctly or something I have miss? > > Also, the % rate is acting strangely since anything above 50% doesn't change > the Tx rate and anything below is modifying it > e.g Tx: 1Mbits/s 5000Mbits/s > %Rate: >=50% 25% > > Actually I am getting exactly opposite results :) If I set rate to 50% I get MBits/s Rx/Tx : 0/9942 9942/0 9942/9942 For 10%: MBits/s Rx/Tx : 0/1997 1997/0 1997/1997 Which is about 2x set :D Additionaly I am getting message when "start 0" -> "stop 0" -> "start 0" is issued PMD: ixgbe_dev_rx_init(): forcing scatter mode So there is definitely something wrong there but don't know where. Another issue I encountered is build system that fail when building out-of-tree. Till this is fixed you can try version 2.7.1 that is working for me. Pawel
[dpdk-dev] [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters
> -Original Message- > From: Wu, Jingjing > Sent: Thursday, January 22, 2015 7:38 AM > To: dev at dpdk.org > Cc: Wu, Jingjing; De Lara Guarch, Pablo; Cao, Min; Xu, HuilongX > Subject: [PATCH v2 0/6] new ntuple filter replaces 2tuple and 5tuple filters > > v2 changes: > - remove the code which is already applied in patch "Integrate ethertype > filter in igb/ixgbe driver to new API". > - modify commands' description in doc testpmd_funcs.rst. > > The patch set uses filter_ctrl API to replace old 2tuple and 5tuple filter > APIs. > It defines ntuple filter to combine 2tuple and 5tuple types. > It uses new functions and structure to replace old ones in igb/ixgbe driver, > new commands to replace old ones in testpmd, and removes the old APIs. > It removes the filter's index parameters from user interface, only the > filter's key and assigned queue are visible to user. > > Jingjing Wu (6): > ethdev: define ntuple filter type and its structure > ixgbe: ntuple filter functions replace old ones for 5tuple filter > e1000: ntuple filter functions replace old ones for 2tuple and 5tuple > filter > testpmd: new commands for ntuple filter > ethdev: remove old APIs and structures of 5tuple and 2tuple filters > doc: commands changed in testpmd_funcs for 2tuple amd 5tuple filter > > app/test-pmd/cmdline.c | 406 ++--- > app/test-pmd/config.c | 65 --- > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 99 +--- > lib/librte_ether/rte_eth_ctrl.h | 57 ++ > lib/librte_ether/rte_ethdev.c | 116 > lib/librte_ether/rte_ethdev.h | 192 -- > lib/librte_pmd_e1000/e1000_ethdev.h | 69 ++- > lib/librte_pmd_e1000/igb_ethdev.c | 869 +++-- > --- > lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 468 +++ > lib/librte_pmd_ixgbe/ixgbe_ethdev.h | 52 +- > 10 files changed, 1300 insertions(+), 1093 deletions(-) > > -- > 1.9.3 Acked-by: Pablo de Lara Just mind that the last patch (changing the documentation) does not apply properly, as there was another patch (from you I think), that modifies that document. Could you send another version of the last patch? Not sure if that's OK or if it is better to send the full patchset again.
[dpdk-dev] Regarding UDP checksum offload
On Wed, Jan 28, 2015 at 6:32 PM, Olivier MATZ wrote: > Hi Prashant, > > > On 01/28/2015 12:25 PM, Prashant Upadhyaya wrote: > >> Hi, >> >> I am aware that this topic has been discussed several times before, but I >> am somehow still stuck with this. >> >> I am using dpdk 1.6r1, intel 82599 NIC. >> I have an mbuf, I have hand-constructed a UDP packet (IPv4) in the data >> portion, filled the relevant fields of the headers and I do a tx burst. No >> problems, the destination gets the packet. I filled UDP checksum as zero >> and there was no checksum offloaded in ol_flags. >> >> Now in the same usecase, I want to offload UDP checksum. >> I am aware that the checksum field in UDP header has to be filled with the >> pseudo header checksum, I did that, duly added the PKT_TX_UDP_CKSUM flag >> in >> ol_flags, did a tx_burst and the packet does not reach the destination. >> >> I realized that I have to fill the following fields as well (my packet >> does >> not have vlan tag) >> mbuf->pkt.vlan_macip.f.l2_len >> mbuf->pkt.vlan_macip.f.l3_len >> >> so I filled the l2_len as 14 and l3_len as 20 (IP header with no options) >> Yet the packet did not reach the destination. >> >> So my question is -- am I filling the l2_len and l3_len properly ? >> Is there anything else to be done before I can get this UDP checksum >> offload to work properly for me. >> > > > As far as I remember, this should be working on 1.6r1. > When you say "did not reach the destination", do you mean that the > packet is not transmitted at all? Or is it transmitted with a wrong > checksum? > The packet is not transmitted to destination. I cannot see it in tcpdump at wireshark. If I don't do the offload and fill UDP checksum as zero, then destination shows the packet in tcpdump If I don't do the offload and just fill the pseudo header checksum in UDP header (clearly the wrong checksum), then the destination shows the packet in tcpdump and wireshark decodes it to complain of wrong UDP checksum as expected. Let me add further, I am _just_ doing the UDP checksum offload and not the IP hdr checksum offload. I calculate and set IP header checksum by my own code. I hope that this is acceptable and does not interfere with UDP checksum offload > > I think you should try to reproduce the issue with the latest DPDK > which is known to work with test-pmd (csum forward engine). > > Regards, > Olivier > >
[dpdk-dev] [PATCH] eal/linux: allow to map BARs with MSI-X tables, around them
Hi Dan Apologies for not looking at it earlier. > While VFIO doesn't allow us to map complete BARs with MSI-X tables, > it does allow us to map around them in PAGE_SIZE granularity. There > might be adapters that provide their registers in the same BAR > but on a different page. For example, Intel's NVME adapter, though > not a network adapter, provides only one MMIO BAR that contains > the MSI-X table. > > Signed-off-by: Dan Aloni > CC: Anatoly Burakov > --- > lib/librte_eal/linuxapp/eal/eal_pci.c | 5 +- > lib/librte_eal/linuxapp/eal/eal_pci_init.h | 2 +- > lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 4 +- > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 99 > +++--- > lib/librte_eal/linuxapp/eal/eal_vfio.h | 8 ++- > 5 files changed, 101 insertions(+), 17 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c > b/lib/librte_eal/linuxapp/eal/eal_pci.c > index b5f54101e8aa..4a74a9372a15 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci.c > +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c > @@ -118,13 +118,14 @@ pci_find_max_end_va(void) > > /* map a particular resource from a file */ > void * > -pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size) > +pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size, > + int additional_flags) > { > void *mapaddr; > > /* Map the PCI memory resource of device */ > mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE, > - MAP_SHARED, fd, offset); > + MAP_SHARED | additional_flags, fd, offset); > if (mapaddr == MAP_FAILED) { > RTE_LOG(ERR, EAL, "%s(): cannot mmap(%d, %p, 0x%lx, > 0x%lx): %s (%p)\n", > __func__, fd, requested_addr, > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h > b/lib/librte_eal/linuxapp/eal/eal_pci_init.h > index 1070eb88fe0a..0a0853d4c4df 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h > @@ -66,7 +66,7 @@ extern void *pci_map_addr; > void *pci_find_max_end_va(void); > > void *pci_map_resource(void *requested_addr, int fd, off_t offset, > - size_t size); > +size_t size, int additional_flags); > > /* map IGB_UIO resource prototype */ > int pci_uio_map_resource(struct rte_pci_device *dev); > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c > b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c > index e53f06b82430..eaa2e36f643e 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c > @@ -139,7 +139,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev) > > if (pci_map_resource(uio_res->maps[i].addr, fd, >(off_t)uio_res->maps[i].offset, > - (size_t)uio_res->maps[i].size) > + (size_t)uio_res->maps[i].size, 0) > != uio_res->maps[i].addr) { > RTE_LOG(ERR, EAL, > "Cannot mmap device resource\n"); > @@ -379,7 +379,7 @@ pci_uio_map_resource(struct rte_pci_device *dev) > pci_map_addr = > pci_find_max_end_va(); > > mapaddr = > pci_map_resource(pci_map_addr, fd, (off_t)offset, > - (size_t)maps[j].size); > + (size_t)maps[j].size, 0); > if (mapaddr == MAP_FAILED) > fail = 1; > > diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > index 20e097727f80..f6542a1f1464 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c > @@ -62,6 +62,9 @@ > > #ifdef VFIO_PRESENT > > +#define PAGE_SIZE (sysconf(_SC_PAGESIZE)) > +#define PAGE_MASK (~(PAGE_SIZE - 1)) > + > #define VFIO_DIR "/dev/vfio" > #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" > #define VFIO_GROUP_FMT "/dev/vfio/%u" > @@ -72,10 +75,12 @@ static struct vfio_config vfio_cfg; > > /* get PCI BAR number where MSI-X interrupts are */ > static int > -pci_vfio_get_msix_bar(int fd, int *msix_bar) > +pci_vfio_get_msix_bar(int fd, int *msix_bar, uint32_t *msix_table_offset, > + uint32_t *msix_table_size) > { > int ret; > uint32_t reg; > + uint16_t flags; > uint8_t cap_id, cap_offset; > > /* read PCI capability pointer from config space */ > @@ -134,7 +139,18 @@ pci_vfio_get_msix_bar(int fd, int *msix_bar) > return -1; > } > > + ret = pread64(fd, &flags, sizeof(flags), > + > VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + > + c
[dpdk-dev] Regarding UDP checksum offload
Hi Prashant, On 01/28/2015 03:57 PM, Prashant Upadhyaya wrote: >>> I am using dpdk 1.6r1, intel 82599 NIC. >>> I have an mbuf, I have hand-constructed a UDP packet (IPv4) in >>> the data >>> portion, filled the relevant fields of the headers and I do a tx >>> burst. No >>> problems, the destination gets the packet. I filled UDP checksum >>> as zero >>> and there was no checksum offloaded in ol_flags. >>> >>> Now in the same usecase, I want to offload UDP checksum. >>> I am aware that the checksum field in UDP header has to be >>> filled with the >>> pseudo header checksum, I did that, duly added the >>> PKT_TX_UDP_CKSUM flag in >>> ol_flags, did a tx_burst and the packet does not reach the >>> destination. >>> >>> I realized that I have to fill the following fields as well (my >>> packet does >>> not have vlan tag) >>> mbuf->pkt.vlan_macip.f.l2_len >>> mbuf->pkt.vlan_macip.f.l3_len >>> >>> so I filled the l2_len as 14 and l3_len as 20 (IP header with no >>> options) >>> Yet the packet did not reach the destination. >>> >>> So my question is -- am I filling the l2_len and l3_len properly ? >>> Is there anything else to be done before I can get this UDP checksum >>> offload to work properly for me. >> >> >> >> As far as I remember, this should be working on 1.6r1. >> When you say "did not reach the destination", do you mean that the >> packet is not transmitted at all? Or is it transmitted with a wrong >> checksum? > > > The packet is not transmitted to destination. I cannot see it in tcpdump > at wireshark. > If I don't do the offload and fill UDP checksum as zero, then > destination shows the packet in tcpdump > If I don't do the offload and just fill the pseudo header checksum in > UDP header (clearly the wrong checksum), then the destination shows the > packet in tcpdump and wireshark decodes it to complain of wrong UDP > checksum as expected. This is strange. I don't see anything obvious in what you are describing. It looks like the packet is dropped in the driver or in the hardware. You can check the device statistics. Another thing you can do is to retry on the latest stable dpdk which is known to work (see csumonly.c in test-pmd). > Let me add further, I am _just_ doing the UDP checksum offload and not > the IP hdr checksum offload. I calculate and set IP header checksum by > my own code. I hope that this is acceptable and does not interfere with > UDP checksum offload This should not be a problem. Regards, Olivier
[dpdk-dev] Process question: reviewing older patches
There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/) that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I don't have any of the emails from the patch in my mail client. I can copy the text from the 'mbox' link in Patchwork into an email, but I'm guessing that may not make the patch toolchain happy. What's the right way to do this? Thanks, Jay
[dpdk-dev] DPDK 1.7.1 error (PANIC in ovdk_vport_phy_port_init(): Cannot init NIC port '0' (Success))
Hi! I have one question to inii NIC port in DPDK 1.7.1. I got the following error/ EAL: PCI device :03:00.0 on NUMA socket -1 EAL: probe driver: 8086:150e rte_igb_pmd EAL: PCI memory mapped at 0x7fa5a5261000 EAL: PCI memory mapped at 0x7fa5a7538000 EAL: PCI device :03:00.1 on NUMA socket -1 EAL: probe driver: 8086:150e rte_igb_pmd EAL: PCI memory mapped at 0x7fa5a51e1000 EAL: PCI memory mapped at 0x7fa5a7534000 EAL: PCI device :03:00.2 on NUMA socket -1 EAL: probe driver: 8086:150e rte_igb_pmd EAL: PCI memory mapped at 0x7fa5a5161000 EAL: PCI memory mapped at 0x7fa5a753 EAL: PCI device :03:00.3 on NUMA socket -1 EAL: probe driver: 8086:150e rte_igb_pmd EAL: PCI memory mapped at 0x7fa5a50e1000 EAL: PCI memory mapped at 0x7fa5a50dd000 EAL: PCI device :06:00.0 on NUMA socket -1 EAL: probe driver: 8086:154d rte_ixgbe_pmd EAL: :06:00.0 not managed by UIO driver, skipping EAL: PCI device :06:00.1 on NUMA socket -1 EAL: probe driver: 8086:154d rte_ixgbe_pmd EAL: :06:00.1 not managed by UIO driver, skipping EAL: :06:00.0 not managed by UIO driver, skipping EAL: :06:00.1 not managed by UIO driver, skipping PANIC in ovdk_vport_phy_port_init(): Cannot init NIC port '0' (Success) 1: [/home/cubiq/sothy/dpdkovs/dpdk-1.7.1/x86_64-ivshmem-linuxapp-gcc/lib/libintel_dpdk.so(rte_dump_stack+0x18) [0x7fa5a75bb768]] Abandon (core dumped) +++ THe above error I got when I run $./datapath/dpdk/ovs-dpdk -c 0x0F -n 4 --proc-type primary --huge-dir /dev/hugepages -- --stats_core=0 --stats_int=5 -p 0x03 I guess it is problem of DPDK to init PCI probe. Any guess or suggestion from error log. I am running Fedora 20, DPDK 1.7.1 and OVS DPDK 1.2./ Best regards Sothy
[dpdk-dev] [PATCH v3 00/18] ACL: New AVX2 classify method and several other enhancements.
> > v3 changes: > > Applied review comments from Thomas: > > - fix spelling errors reported by codespell. > > - split last patch into two: > > first to remove unused macros, > > second to add some comments about ACL internal layout. > > > > v2 changes: > > - When build with the compilers that don't support AVX2 instructions, > > make rte_acl_classify_avx2() do nothing and return an error. > > - Remove unneeded 'ifdef __AVX2__' in acl_run_avx2.*. > > - Reorder order of patches in the set, to keep RTE_LIBRTE_ACL_STANDALONE=y > > always buildable. > > > > This patch series contain several fixes and enhancements for ACL library. > > See complete list below. > > Two main changes that are externally visible: > > - Introduce new classify method: RTE_ACL_CLASSIFY_AVX2. > > It uses AVX2 instructions and 256 bit wide data types > > to perform internal trie traversal. > > That helps to increase classify() throughput. > > This method is selected as default one on CPUs that supports AVX2. > > - Introduce new field in the build config structure: max_size. > > It specifies maximum size that internal RT structure for given context > > can reach. > > The purpose of that is to allow user to decide about space/performance > > trade-off > > (faster classify() vs less space for RT internal structures) > > for each given set of rules. > > > > Konstantin Ananyev (18): > > fix fix compilation issues with RTE_LIBRTE_ACL_STANDALONE=y > > app/test: few small fixes fot test_acl.c > > librte_acl: make data_indexes long enough to survive idle transitions. > > librte_acl: remove build phase heuristsic with negative performance > > effect. > > librte_acl: fix a bug at build phase that can cause matches beeing > > overwirtten. > > librte_acl: introduce DFA nodes compression (group64) for identical > > entries. > > librte_acl: build/gen phase - simplify the way match nodes are > > allocated. > > librte_acl: make scalar RT code to be more similar to vector one. > > librte_acl: a bit of RT code deduplication. > > EAL: introduce rte_ymm and relatives in rte_common_vect.h. > > librte_acl: add AVX2 as new rte_acl_classify() method > > test-acl: add ability to manually select RT method. > > librte_acl: Remove search_sse_2 and relatives. > > libter_acl: move lo/hi dwords shuffle out from calc_addr > > libte_acl: make calc_addr a define to deduplicate the code. > > libte_acl: introduce max_size into rte_acl_config. > > libte_acl: remove unused macros. > > libte_acl: add some comments about ACL internal layout. > > > For the series > Acked-by: Neil Horman Applied Thanks for the big work -- Thomas
[dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State interrupt
On Wed, 28 Jan 2015 03:03:32 + "Ouyang, Changchun" wrote: > Hi Stephen, > > > -Original Message- > > From: Stephen Hemminger [mailto:stephen at networkplumber.org] > > Sent: Tuesday, January 27, 2015 6:00 PM > > To: Xie, Huawei > > Cc: Ouyang, Changchun; dev at dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link State > > interrupt > > > > On Tue, 27 Jan 2015 09:04:07 + > > "Xie, Huawei" wrote: > > > > > > -Original Message- > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ouyang > > > > Changchun > > > > Sent: Tuesday, January 27, 2015 10:36 AM > > > > To: dev at dpdk.org > > > > Subject: [dpdk-dev] [PATCH v2 04/24] virtio: Add support for Link > > > > State interrupt > > > > > > > > Virtio has link state interrupt which can be used. > > > > > > > > Signed-off-by: Stephen Hemminger > > > > Signed-off-by: Changchun Ouyang > > > > --- > > > > lib/librte_pmd_virtio/virtio_ethdev.c | 78 > > > > +++-- > > > > -- > > > > lib/librte_pmd_virtio/virtio_pci.c| 22 ++ > > > > lib/librte_pmd_virtio/virtio_pci.h| 4 ++ > > > > 3 files changed, 86 insertions(+), 18 deletions(-) > > > > > > > > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c > > > > b/lib/librte_pmd_virtio/virtio_ethdev.c > > > > index 5df3b54..ef87ff8 100644 > > > > --- a/lib/librte_pmd_virtio/virtio_ethdev.c > > > > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c > > > > @@ -845,6 +845,34 @@ static int virtio_resource_init(struct > > > > rte_pci_device *pci_dev __rte_unused) #endif > > > > > > > > /* > > > > + * Process Virtio Config changed interrupt and call the callback > > > > + * if link state changed. > > > > + */ > > > > +static void > > > > +virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle, > > > > +void *param) > > > > +{ > > > > + struct rte_eth_dev *dev = param; > > > > + struct virtio_hw *hw = > > > > + VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > + uint8_t isr; > > > > + > > > > + /* Read interrupt status which clears interrupt */ > > > > + isr = vtpci_isr(hw); > > > > + PMD_DRV_LOG(INFO, "interrupt status = %#x", isr); > > > > + > > > > + if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) > > > > + PMD_DRV_LOG(ERR, "interrupt enable failed"); > > > > + > > > > > > Is it better to put rte_intr_enable after we have handled the interrupt. > > > Is there the possibility of interrupt reentrant in uio intr framework? > > > > The UIO framework handles IRQ's via posix thread that is reading fd, then > > calling this code. Therefore it is always single threaded. > > Even if it is under UIO framework, and always single threaded, > How about move rte_intr_enable after the virtio_dev_link_update() and > _rte_eth_dev_callback_process is called. > This make it more like interrupt handler in linux kernel. > What do you think of it? I ordered the interrupt handling to match what happens in e1000/igb handler. My concern is that interrupt was level (not edge triggered) and another link transisition could occur and be missed.
[dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem
Hi Pawel, Thanks for your reply. Sadly, assigning 1 core per port didn't change anything. As for the NUMA nodes, If I understand correctly there is a node for each socket and I'm only using 1 socket with 4 cores and lscpu is showing me only 1 node. I don't think I can do anything about that. Correct me if I'm wrong on this one. I'm definitely going to try a older version and see if it works properly. Thanks for your help Alexandre F. > -Original Message- > From: Wodkowski, PawelX [mailto:pawelx.wodkowski at intel.com] > Sent: Wednesday, January 28, 2015 9:15 AM > To: Alexandre Frigon; dev at dpdk.org; keith.wiles at windriver.com > Subject: RE: Pktgen-DPDK rate and traffic inconsistency problem > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alexandre Frigon > > Sent: Tuesday, January 27, 2015 8:31 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] Pktgen-DPDK rate and traffic inconsistency problem > > > > Hi all, > > > > I'm using dpdk 1.8 and pktgen-dpdk 2.8 to generate traffic on a > > back-to-back setup both equipped with 82599EB 10-Gigabit NIC. > > The problem is when I start it, pktgen indicates 1Mbits/s Tx with > > 64B packet size, but I'm receiving about 15% of it on the other end. > > This percentage seems to be proportional with the packet size. > > > > e.g. > > Using nload to read Rx traffic > > Pktgen: Tx: 1Mbits/s==> Other end: Rx 1660 > Mbits/s > > Rate: 100% > > Pkt size: 64B > > > > > > e.g 2 > > Pktgen: Tx: 1Mbits/s==> Other end: Rx 9385 > Mbits/s > > Rate: 100% > > Pkt size: 1518B > > > > > > Pktgen is started with this command on a Xeon(R) CPU E31270 @ 3.40GHz > > ./app/pktgen -c 1f -n 3 --proc-type auto --socket-mem 1024 > > --file-prefix pg -- -p > > 0x3 -P -N -m "[1:3].0, [2:4].1" > > From past experience I don't assign more than 1 core per port. It had some > race conditions issues and one core I capable to RX or TX full 10G. > Also check if you assign proper cores/memory for your NICs (the same > NUMA node). > > > > > Is there something I'm not configuring correctly or something I have miss? > > > > Also, the % rate is acting strangely since anything above 50% doesn't > > change the Tx rate and anything below is modifying it > > e.g Tx: 1Mbits/s 5000Mbits/s > > %Rate: >=50% 25% > > > > > > Actually I am getting exactly opposite results :) If I set rate to 50% I get > MBits/s Rx/Tx : 0/9942 9942/0 9942/9942 > > For 10%: > MBits/s Rx/Tx : 0/1997 1997/0 1997/1997 > > Which is about 2x set :D > > Additionaly I am getting message when "start 0" -> "stop 0" -> "start 0" is > issued > PMD: ixgbe_dev_rx_init(): forcing scatter mode > > So there is definitely something wrong there but don't know where. > Another issue I encountered is build system that fail when building out-of- > tree. > > Till this is fixed you can try version 2.7.1 that is working for me. > > Pawel
[dpdk-dev] Process question: reviewing older patches
2015-01-28 09:52, Jay Rolette: > There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/) > that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I > don't have any of the emails from the patch in my mail client. > > I can copy the text from the 'mbox' link in Patchwork into an email, but > I'm guessing that may not make the patch toolchain happy. > > What's the right way to do this? I think you should try to open the mbox file with your mail client and reply. In my case, I had to rename it into .mbox (was .patch). Thanks for reviewing -- Thomas
[dpdk-dev] Process question: reviewing older patches
On Wed, Jan 28, 2015 at 09:52:48AM -0600, Jay Rolette wrote: > There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/) > that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I > don't have any of the emails from the patch in my mail client. > > I can copy the text from the 'mbox' link in Patchwork into an email, but > I'm guessing that may not make the patch toolchain happy. > > What's the right way to do this? > Just grab the message id from the patchwork site, and list it in the envelope headers in-reply-to: field when you respond. You won't have the rest of the conversation field in the thread, but you will respond properly to the thread, and patchwork will pick up the ACK Neil > Thanks, > Jay >
[dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling based on VFIO
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou > Sent: Wednesday, January 28, 2015 2:51 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 4/5] eal: add per rx queue interrupt handling > based on VFIO > > Signed-off-by: Danny Zhou > Signed-off-by: Yong Liu > --- > lib/librte_eal/common/include/rte_eal.h| 9 + > lib/librte_eal/linuxapp/eal/eal_interrupts.c | 186 > - > lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 11 +- > .../linuxapp/eal/include/exec-env/rte_interrupts.h | 4 + > 4 files changed, 168 insertions(+), 42 deletions(-) > > diff --git a/lib/librte_eal/common/include/rte_eal.h > b/lib/librte_eal/common/include/rte_eal.h > index f4ecd2e..5f31aa5 100644 > --- a/lib/librte_eal/common/include/rte_eal.h > +++ b/lib/librte_eal/common/include/rte_eal.h > @@ -150,6 +150,15 @@ int rte_eal_iopl_init(void); > * - On failure, a negative error value. > */ > int rte_eal_init(int argc, char **argv); > + > +/** > + * @param port_id > + * the port id > + * @return > + * - On success, return 0 [LCM] It has changes to return -1. > + */ > +int rte_eal_wait_rx_intr(uint8_t port_id, uint8_t queue_id); > + > /** > * Usage function typedef used by the application usage function. > * > diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > index dc2668a..b120303 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c > +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c > @@ -64,6 +64,7 @@ > #include > #include > #include > +#include > > #include "eal_private.h" > #include "eal_vfio.h" > @@ -127,6 +128,7 @@ static pthread_t intr_thread; > #ifdef VFIO_PRESENT > > #define IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int)) > +#define MSIX_IRQ_SET_BUF_LEN (sizeof(struct vfio_irq_set) + sizeof(int) * > (VFIO_MAX_QUEUE_ID + 1)) > > /* enable legacy (INTx) interrupts */ > static int > @@ -221,7 +223,7 @@ vfio_disable_intx(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ > static int > vfio_enable_msi(struct rte_intr_handle *intr_handle) { > - int len, ret; > + int len, ret, max_intr; > char irq_set_buf[IRQ_SET_BUF_LEN]; > struct vfio_irq_set *irq_set; > int *fd_ptr; > @@ -230,12 +232,19 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) > { > > irq_set = (struct vfio_irq_set *) irq_set_buf; > irq_set->argsz = len; > - irq_set->count = 1; > + if ((!intr_handle->max_intr) || > + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) > + max_intr = VFIO_MAX_QUEUE_ID + 1; > + else > + max_intr = intr_handle->max_intr; > + > + irq_set->count = max_intr; > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_ACTION_TRIGGER; > irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; > irq_set->start = 0; > fd_ptr = (int *) &irq_set->data; > - *fd_ptr = intr_handle->fd; > + memcpy(fd_ptr, intr_handle->queue_fd, sizeof(intr_handle->queue_fd)); > + fd_ptr[max_intr - 1] = intr_handle->fd; > > ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > > @@ -244,23 +253,6 @@ vfio_enable_msi(struct rte_intr_handle *intr_handle) { > intr_handle->fd); > return -1; > } > - > - /* manually trigger interrupt to enable it */ > - memset(irq_set, 0, len); > - len = sizeof(struct vfio_irq_set); > - irq_set->argsz = len; > - irq_set->count = 1; > - irq_set->flags = VFIO_IRQ_SET_DATA_NONE | > VFIO_IRQ_SET_ACTION_TRIGGER; > - irq_set->index = VFIO_PCI_MSI_IRQ_INDEX; > - irq_set->start = 0; > - > - ret = ioctl(intr_handle->vfio_dev_fd, VFIO_DEVICE_SET_IRQS, irq_set); > - > - if (ret) { > - RTE_LOG(ERR, EAL, "Error triggering MSI interrupts for fd %d\n", > - intr_handle->fd); > - return -1; > - } > return 0; > } > > @@ -292,8 +284,8 @@ vfio_disable_msi(struct rte_intr_handle *intr_handle) { > /* enable MSI-X interrupts */ > static int > vfio_enable_msix(struct rte_intr_handle *intr_handle) { > - int len, ret; > - char irq_set_buf[IRQ_SET_BUF_LEN]; > + int len, ret, max_intr; > + char irq_set_buf[MSIX_IRQ_SET_BUF_LEN]; > struct vfio_irq_set *irq_set; > int *fd_ptr; > > @@ -301,12 +293,19 @@ vfio_enable_msix(struct rte_intr_handle *intr_handle) > { > > irq_set = (struct vfio_irq_set *) irq_set_buf; > irq_set->argsz = len; > - irq_set->count = 1; > + if ((!intr_handle->max_intr) || > + (intr_handle->max_intr > VFIO_MAX_QUEUE_ID)) > + max_intr = VFIO_MAX_QUEUE_ID + 1; > + else > + max_intr = intr_handle->max_intr; > + > + irq_set->count = max_intr; > irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD | > VFIO_IRQ_SET_
[dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt and polling/interrupt mode switch
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Danny Zhou > Sent: Wednesday, January 28, 2015 2:51 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH v1 5/5] L3fwd-power: enable one-shot rx interrupt > and polling/interrupt mode switch > > Signed-off-by: Danny Zhou > --- > examples/l3fwd-power/main.c | 170 > +--- > 1 file changed, 129 insertions(+), 41 deletions(-) > > diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c > index f6b55b9..e6e4f55 100644 > --- a/examples/l3fwd-power/main.c > +++ b/examples/l3fwd-power/main.c > @@ -75,12 +75,13 @@ > #include > #include > #include > +#include > > #define RTE_LOGTYPE_L3FWD_POWER RTE_LOGTYPE_USER1 > > #define MAX_PKT_BURST 32 > > -#define MIN_ZERO_POLL_COUNT 5 > +#define MIN_ZERO_POLL_COUNT 10 > > /* around 100ms at 2 Ghz */ > #define TIMER_RESOLUTION_CYCLES 2ULL > @@ -188,6 +189,9 @@ struct lcore_rx_queue { > #define MAX_TX_QUEUE_PER_PORT RTE_MAX_ETHPORTS > #define MAX_RX_QUEUE_PER_PORT 128 > > +#define MAX_RX_QUEUE_INTERRUPT_PER_PORT 16 > + > + > #define MAX_LCORE_PARAMS 1024 > struct lcore_params { > uint8_t port_id; > @@ -214,7 +218,7 @@ static uint16_t nb_lcore_params = > sizeof(lcore_params_array_default) / > > static struct rte_eth_conf port_conf = { > .rxmode = { > - .mq_mode= ETH_MQ_RX_RSS, > + .mq_mode = ETH_MQ_RX_RSS, > .max_rx_pkt_len = ETHER_MAX_LEN, > .split_hdr_size = 0, > .header_split = 0, /**< Header Split disabled */ > @@ -226,11 +230,14 @@ static struct rte_eth_conf port_conf = { > .rx_adv_conf = { > .rss_conf = { > .rss_key = NULL, > - .rss_hf = ETH_RSS_IP, > + .rss_hf = ETH_RSS_UDP, > }, > }, > .txmode = { > - .mq_mode = ETH_DCB_NONE, > + .mq_mode = ETH_MQ_TX_NONE, > + }, > + .intr_conf = { > + .rxq = 1, /**< rxq interrupt feature enabled */ > }, > }; > > @@ -402,19 +409,22 @@ power_timer_cb(__attribute__((unused)) struct > rte_timer *tim, > /* accumulate total execution time in us when callback is invoked */ > sleep_time_ratio = (float)(stats[lcore_id].sleep_time) / > (float)SCALING_PERIOD; > - > /** >* check whether need to scale down frequency a step if it sleep a lot. >*/ > - if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) > - rte_power_freq_down(lcore_id); > + if (sleep_time_ratio >= SCALING_DOWN_TIME_RATIO_THRESHOLD) { > + if (rte_power_freq_down) > + rte_power_freq_down(lcore_id); > + } > else if ( (unsigned)(stats[lcore_id].nb_rx_processed / > - stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) > + stats[lcore_id].nb_iteration_looped) < MAX_PKT_BURST) { > /** >* scale down a step if average packet per iteration less >* than expectation. >*/ > - rte_power_freq_down(lcore_id); > + if (rte_power_freq_down) > + rte_power_freq_down(lcore_id); > + } > > /** >* initialize another timer according to current frequency to ensure > @@ -707,22 +717,20 @@ l3fwd_simple_forward(struct rte_mbuf *m, uint8_t > portid, > > } > > -#define SLEEP_GEAR1_THRESHOLD100 > -#define SLEEP_GEAR2_THRESHOLD1000 > +#define MINIMUM_SLEEP_TIME 1 > +#define SUSPEND_THRESHOLD 300 > > static inline uint32_t > power_idle_heuristic(uint32_t zero_rx_packet_count) > { > - /* If zero count is less than 100, use it as the sleep time in us */ > - if (zero_rx_packet_count < SLEEP_GEAR1_THRESHOLD) > - return zero_rx_packet_count; > - /* If zero count is less than 1000, sleep time should be 100 us */ > - else if ((zero_rx_packet_count >= SLEEP_GEAR1_THRESHOLD) && > - (zero_rx_packet_count < SLEEP_GEAR2_THRESHOLD)) > - return SLEEP_GEAR1_THRESHOLD; > - /* If zero count is greater than 1000, sleep time should be 1000 us */ > - else if (zero_rx_packet_count >= SLEEP_GEAR2_THRESHOLD) > - return SLEEP_GEAR2_THRESHOLD; > + /* If zero count is less than 100, sleep 1us */ > + if (zero_rx_packet_count < SUSPEND_THRESHOLD) > + return MINIMUM_SLEEP_TIME; > + /* If zero count is less than 1000, sleep 100 us which is the minimum > latency > + switching from C3/C6 to C0 > + */ > + else > + return SUSPEND_THRESHOLD; > > return 0; > } > @@ -762,6 +770,35 @@ power_freq_scaleup_heuristic(unsigned lcore_id, > return FREQ_CURRENT; > } > > +/** > + * force polling thread sleep until one-shot rx interrupt triggers > + * @param
[dpdk-dev] Process question: reviewing older patches
Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for my mail, there's not a reasonable way to import the mbox file or to control the message id. If someone else wants to resend the message to the list, I can reply to that. Otherwise, here are the relevant bits from the original patch email: >From patchwork Wed Jul 23 06:45:12 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [dpdk-dev] kni: optimizing the rte_kni_rx_burst From: Hemant Agrawal X-Patchwork-Id: 84 Message-Id: <14060979121185-git-send-email-Hemant at freescale.com> To: Date: Wed, 23 Jul 2014 12:15:12 +0530 The current implementation of rte_kni_rx_burst polls the fifo for buffers. Irrespective of success or failure, it allocates the mbuf and try to put them into the alloc_q if the buffers are not added to alloc_q, it frees them. This waste lots of cpu cycles in allocating and freeing the buffers if alloc_q is full. The logic has been changed to: 1. Initially allocand add buffer(burstsize) to alloc_q 2. Add buffers to alloc_q only when you are pulling out the buffers. Signed-off-by: Hemant Agrawal --- lib/librte_kni/rte_kni.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 76feef4..01e85f8 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -263,6 +263,9 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, ctx->in_use = 1; + /* Allocate mbufs and then put them into alloc_q */ + kni_allocate_mbufs(ctx); + return ctx; fail: @@ -369,8 +372,9 @@ rte_kni_rx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num) { unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num); - /* Allocate mbufs and then put them into alloc_q */ - kni_allocate_mbufs(kni); + /* If buffers removed, allocate mbufs and then put them into alloc_q */ + if(ret) + kni_allocate_mbufs(kni); return ret; } The patch looks good from a DPDK 1.6r2 viewpoint. We saw the same behavior in our app and ended up avoiding it higher in the stack (in our code). Reviewed-by: Jay Rolette Jay On Wed, Jan 28, 2015 at 10:49 AM, Neil Horman wrote: > On Wed, Jan 28, 2015 at 09:52:48AM -0600, Jay Rolette wrote: > > There's a fairly old KNI patch (http://dpdk.org/dev/patchwork/patch/84/) > > that I reviewed, but I'm not seeing how to submit my "Reviewed-by" when I > > don't have any of the emails from the patch in my mail client. > > > > I can copy the text from the 'mbox' link in Patchwork into an email, but > > I'm guessing that may not make the patch toolchain happy. > > > > What's the right way to do this? > > > Just grab the message id from the patchwork site, and list it in the > envelope > headers in-reply-to: field when you respond. You won't have the rest of > the > conversation field in the thread, but you will respond properly to the > thread, > and patchwork will pick up the ACK > Neil > > > Thanks, > > Jay > > >
[dpdk-dev] Process question: reviewing older patches
On Wed, Jan 28, 2015 at 02:57:58PM -0600, Jay Rolette wrote: > Thanks Thomas and Neil. Sadly, no joy. While I generally like gmail for my > mail, there's not a reasonable way to import the mbox file or to control > the message id. > Sure there is, you just need to select an appropriate MUA. You can't use the web interface for this. Enable imap access to your gmail account, and setup an MUA like mutt to point to it. Then the mutt client can open the mbox file, or you can fill out the in-reply-to: header manually. Neil
[dpdk-dev] deadline for 2.0 features proposal
Hello all, As previously announced when releasing DPDK 1.8.0 (http://dpdk.org/ml/archives/dev/2014-December/010470.html), we are going to apply deadlines to schedule the release cycles. The version 2.0 will integrate only features submitted before end of January (end of this week) and reviewed before 20th February. More details on this page: http://dpdk.org/dev/roadmap#dates During the 2nd phase ("Review Period"), only pending features, fixes and highly desirable cleanups will be accepted. In case there are some volunteers to clean the code, I maintain a list of cleanups which could be interesting: - use Rx/Tx defaults in testpmd - use librte_cfgfile in examples/qos_sched (promised deduplication) - use rte_eth_dev_atomic_read_link_status in PMDs - use new assert macros for unit tests - convert all drivers to new filtering API - move non-ethernet API from ethdev to EAL - move RTE_MBUF_DATA_DMA_ADDR from all PMDs to a common place - move rte_rxmbuf_alloc in API - move queue_stats_mapping_set to ixgbe - move rte_cache_aligned at beginning of struct declarations - detect cache line size - remove old VMDQ API - remove old filtering API - remove Xen ifdefs in memory management - remove doxygen warnings - choose between RTE_LIBRTE_*_PMD and RTE_LIBRTE_PMD_* Thank you -- Thomas
[dpdk-dev] [PATCH 0/4] DPDK memcpy optimization
On 2015-01-27, 3:22 AM, "Wang, Zhihong" wrote: > > >> -Original Message- >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of EDMISON, Kelvin >> (Kelvin) >> Sent: Friday, January 23, 2015 2:22 AM >> To: dev at dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 0/4] DPDK memcpy optimization >> >> >> >> On 2015-01-21, 3:54 PM, "Neil Horman" wrote: >> >> >On Wed, Jan 21, 2015 at 11:49:47AM -0800, Stephen Hemminger wrote: >> >> On Wed, 21 Jan 2015 13:26:20 + >> >> Bruce Richardson wrote: >> >> [..trim...] >> >> One issue I have is that as a vendor we need to ship on binary, not >> >>different distributions >> >> for each Intel chip variant. There is some support for multi-chip >> >>version functions >> >> but only in latest Gcc which isn't in Debian stable. And the >>multi-chip >> >>version >> >> of functions is going to be more expensive than inlining. For some >> >>cases, I have >> >> seen that the overhead of fancy instructions looks good but have >>nasty >> >>side effects >> >> like CPU stall and/or increased power consumption which turns of >>turbo >> >>boost. >> >> >> >> >> >> Distro's in general have the same problem with special case >> >>optimizations. >> >> >> >What we really need is to do something like borrow the alternatives >> >mechanism >> >from the kernel so that we can dynamically replace instructions at run >> >time >> >based on cpu flags. That way we could make the choice at run time, and >> >wouldn't >> >have to do alot of special case jumping about. >> >Neil >> >> +1. >> >> I think it should be an anti-requirement that the build machine be the >> exact same chip as the deployment platform. >> >> I like the cpu flag inspection approach. It would help in the case >>where >> DPDK is in a VM and an odd set of CPU flags have been exposed. >> >> If that approach doesn't work though, then perhaps DPDK memcpy could go >> through a benchmarking at app startup time and select the most >>performant >> option out of a set, like mdraid's raid6 implementation does. To give >>an >> example, this is what my systems print out at boot time re: raid6 >> algorithm selection. >> raid6: sse2x13171 MB/s >> raid6: sse2x23925 MB/s >> raid6: sse2x44523 MB/s >> raid6: using algorithm sse2x4 (4523 MB/s) >> >> Regards, >>Kelvin >> > >Thanks for the proposal! > >For DPDK, performance is always the most important concern. We need to >utilize new architecture features to achieve that, so solution per arch >is necessary. >Even a few extra cycles can lead to bad performance if they're in a hot >loop. >For instance, let's assume DPDK takes 60 cycles to process a packet on >average, then 3 more cycles here means 5% performance drop. > >The dynamic solution is doable but with performance penalties, even if it >could be small. Also it may bring extra complexity, which can lead to >unpredictable behaviors and side effects. >For example, the dynamic solution won't have inline unrolling, which can >bring significant performance benefit for small copies with constant >length, like eth_addr. > >We can investigate the VM scenario more. > >Zhihong (John) John, Thanks for taking the time to answer my newbie question. I deeply appreciate the attention paid to performance in DPDK. I have a follow-up though. I'm trying to figure out what requirements this approach creates for the software build environment. If we want to build optimized versions for Haswell, Ivy Bridge, Sandy Bridge, etc, does this mean that we must have one of each micro-architecture available for running the builds, or is there a way of cross-compiling for all micro-architectures from just one build environment? Thanks, Kelvin