Re: [dpdk-dev] Intel 82599ES send packets failed on dpdk-17.05.1 after traffic testing
On 8/22/2017 5:45 AM, zimeiw wrote: > hi, > My test env is dell T430 server, NIC is intel 82599ES. > I integrate my app with dpdk-17.05.1 version. after do some tcp traffic > testing, app can receive packets from the 82599 NIC, but 82599 NIC send > packets failed. > But my app with dpdk-17.02 works well. > > > If any change with ixgbe driver? or need some additional configuration for > the NIC? tx_pkt_prepare added but it has been added on 17.02 which seems working for your case. Just to double check, are you using rte_eth_tx_prepare()? > > > > Thanks. > > > -- > > Best Regards, > zimeiw >
Re: [dpdk-dev] [PATCH 5/5] net/mlx5: remove old MLNX_OFED 3.3 verification
On Thu, Aug 17, 2017 at 03:38:05PM +0100, Ferruh Yigit wrote: > On 8/1/2017 1:09 PM, Nelio Laranjeiro wrote: > > This version of MLNX_OFED is no more supported. > > Does it make sense to document minimum supported MLNX_OFED version? > > > > > Signed-off-by: Nelio Laranjeiro > > Acked-by: Shahaf Shuler Hi Ferruh, This is already documented in the NIC guides, see section "20.5. Prerequisites" [1]. Thanks, [1] http://dpdk.org/doc/guides/nics/mlx5.html -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH v2 1/2] app/testpmd: support the heavywight mode GRO
On 8/22/2017 2:00 AM, Hu, Jiayu wrote: > Hi, >> -Original Message- >> From: Yigit, Ferruh >> Sent: Monday, August 21, 2017 7:04 PM >> To: Hu, Jiayu ; dev@dpdk.org >> Cc: Ananyev, Konstantin ; Tan, Jianfeng >> ; tho...@monjalon.net; Wu, Jingjing >> ; Yao, Lei A >> Subject: Re: [PATCH v2 1/2] app/testpmd: support the heavywight mode GRO >> >> On 8/17/2017 10:08 AM, Jiayu Hu wrote: >>> The GRO library provides two reassembly modes: lightweight mode and >>> heavyweight mode. This patch is to support the heavyweight mode in >>> csum forwarding engine. >>> >>> With the command "set port gro (heavymode|lightmode) >> (on|off)", >>> users can select the lightweight mode or the heavyweight mode to use. >> With >>> the command "set gro flush interval ", users can set the interval of >>> flushing GROed packets from reassembly tables for the heavyweight mode. >>> With the command "show port gro", users can display GRO >>> configuration. >>> >>> Signed-off-by: Jiayu Hu >> >> <...> >> >>> lcoreid_t cpuid_idx;/**< index of logical core in CPU id table */ >>> @@ -434,13 +436,21 @@ extern struct ether_addr >> peer_eth_addrs[RTE_MAX_ETHPORTS]; >>> extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for mac- >> retry. */ >>> extern uint32_t burst_tx_retry_num; /**< Burst tx retry number for mac- >> retry. */ >>> >>> +#define GRO_HEAVYMODE 0x1 >>> +#define GRO_LIGHTMODE 0x2 >> >> Why these are not part of the gro library? >> Is the concept "lightweight mode and heavyweight mode" part of gro >> library or implemented only in testpmd? > > Lightweight mode and heavyweight mode are two reassembly methods we > provided in the GRO library. For applications, they are just two kinds of > APIs. > Applications can select any of them to merge packets. GRO modes are defined in testpmd, and kept in testpmd variables, library seems not aware of these modes. What are these two APIs, rte_gro_reassemble() and rte_gro_reassemble_burst() ? Perhaps you can detail what Lightweight mode and heavyweight mode are, doc also don't have much about it. This still looks like gro library provides common API and testpmd calls these API with different parameters and calls these lightweight and heavyweight, if these modes are common use case, I believe they should be part of library. If not, instead of saying different gro modes, it can be presented as different gro usage samples in testpmd. testpmd good for testing dpdk, and good for providing usage sample for APIs, but I believe it shouldn't have the concepts coded in it, libraries should have it, that is what end user uses. > > In testpmd, we want to show how to use these two reassembly modes, so > I define two macros to present them. Users can select which one to use via > command line. > >> >>> + >>> #define GRO_DEFAULT_FLOW_NUM 4 >>> #define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST >>> + >>> +#define GRO_DEFAULT_FLUSH_INTERVAL 2 >>> +#define GRO_MAX_FLUSH_INTERVAL 4 >>> + >>> struct gro_status { >>> struct rte_gro_param param; >>> uint8_t enable; >>> }; >>> extern struct gro_status gro_ports[RTE_MAX_ETHPORTS]; >>> +extern uint32_t gro_flush_interval; >> >> <...>
Re: [dpdk-dev] Intel 82599ES send packets failed on dpdk-17.05.1 after traffic testing
hi, no use rte_eth_tx_prepare()? I set txq_flags value as below, may it cause this issue? .txq_flags = ~ETH_TXQ_FLAGS_NOXSUMS -- Best Regards, zimeiw At 2017-08-22 15:49:56, "Ferruh Yigit" wrote: >On 8/22/2017 5:45 AM, zimeiw wrote: >> hi, >> My test env is dell T430 server, NIC is intel 82599ES. >> I integrate my app with dpdk-17.05.1 version. after do some tcp traffic >> testing, app can receive packets from the 82599 NIC, but 82599 NIC send >> packets failed. >> But my app with dpdk-17.02 works well. >> >> >> If any change with ixgbe driver? or need some additional configuration for >> the NIC? > >tx_pkt_prepare added but it has been added on 17.02 which seems working >for your case. > >Just to double check, are you using rte_eth_tx_prepare()? > >> >> >> >> Thanks. >> >> >> -- >> >> Best Regards, >> zimeiw >> >
Re: [dpdk-dev] [PATCH 4/5] net/mlx5: remove multiple drop RSS queues
On Thu, Aug 17, 2017 at 03:38:13PM +0100, Ferruh Yigit wrote: > On 8/1/2017 1:09 PM, Nelio Laranjeiro wrote: > > Since MLNX_OFED 4.1 this code is no more useful. > > Can you please elaborate what is changed with MLNX_OFED 4.1? Hardware drop queues are supported since this version, before that, the flow drop was handled by software queues not polled. > Also how one can know his MLNX_OFED version? Currently the user has to download and install it to have a working PMD, the NIC documentation provides the related information on where to get this Mellanox OFED from their website. So the user knows which version of MLNX OFED is installed in his system. > > Signed-off-by: Nelio Laranjeiro > > Acked-by: Shahaf Shuler > > <...> Thanks, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH 3/5] net/mlx5: fix non working secondary process by removing it
On Thu, Aug 17, 2017 at 03:38:22PM +0100, Ferruh Yigit wrote: > On 8/1/2017 1:09 PM, Nelio Laranjeiro wrote: > > Secondary process is a copy/paste of the mlx4 drivers, it was never > > tested and it even segfault at the secondary process start in the > > mlx5_pci_probe(). > > Does this means multi process support for mlx5 broken with this patch? > If so features file and release notes should be updated if this won't be > fixed back in this release.. No currently the secondary process in mlx5 is not working at all. This is only removing a non working code from the PMD. > And what is special required for mlx5 secondary process support? There is some work to make this secondary process work correctly, but as it will be totally different, it does not make sense to keep a non working code to replace it by a working one. Concerning the release note and features, this new implementation is expected in 17.11 same release as this patch. > > This makes more sense to wipe this non working feature to re-write a > > working and functional version. > > > > Fixes: a48deada651b ("mlx5: allow operation in secondary processes") > > > > Signed-off-by: Nelio Laranjeiro > > Acked-by: Shahaf Shuler > > <...> I'll update the feature documentation in a v2 to remove the "Multiprocess aware" from the feature list in this patch. It will be added back with the new implementation. Thanks, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH 2/5] net/mlx5: remove pdentic pragma
On Thu, Aug 17, 2017 at 03:38:33PM +0100, Ferruh Yigit wrote: > On 8/1/2017 1:09 PM, Nelio Laranjeiro wrote: > > Those are useless since DPDK headers have been cleaned up. > > > > Signed-off-by: Nelio Laranjeiro > > Acked-by: Shahaf Shuler > > <...> > > > > > -/* DPDK headers don't like -pedantic. */ > > -#ifdef PEDANTIC > > -#pragma GCC diagnostic ignored "-Wpedantic" > > -#endif > > Good to see that they are going, thanks. Is verbs.h one still required? Yes it is, Adrien is pushing some patch to the OFED community to make it also pedantic. Thanks, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] [PATCH v1 00/21] net/mlx5: cleanup for isolated mode
On Fri, Aug 18, 2017 at 02:44:53PM +0100, Ferruh Yigit wrote: > On 8/2/2017 3:10 PM, Nelio Laranjeiro wrote: > > This series cleanups the control plane part and the way it uses the > > different > > kind of objects (DPDK queues, Verbs Queues, ...). It has three goals: > > > > 1. Reduce the memory usage by sharing all possible objects. > > > > 2. Leave the configuration to the control plane and the > > creation/destruction > > of queues to the dev_start/dev_stop() to have a better control on object > > and easily apply the configuration. > > > > 3. Create all flows through the generic flow API, it will also help to > > implement a detection collision algorithm as all flows are using the > > same > > service and thus the same kind of object. > > Hi Nelio, > > Patchset is not applying cleanly, can you please rebase it on top of > latest tree? yes, > And there are some checkpatch warnings for the set. sure, > There are two other mlx5 patchsets, what is the dependency between them? It is mlx5-cleanup first and then this one. I will put clearly the dependency in cover letter to help. Thanks, -- Nélio Laranjeiro 6WIND
Re: [dpdk-dev] Intel 82599ES send packets failed on dpdk-17.05.1 after traffic testing
On 8/22/2017 9:30 AM, zimeiw wrote: > hi, > > no use rte_eth_tx_prepare()? > > I set txq_flags value as below, may it cause this issue? > > .txq_flags = ~ETH_TXQ_FLAGS_NOXSUMS So you are enabling all csum offloading, for Tx path this requires packets to be prepared before sending to the NIC. Can you please test with rte_eth_tx_prepare()? testpmd has the sample usage. Thanks, ferruh > > -- > Best Regards, > zimeiw > > > At 2017-08-22 15:49:56, "Ferruh Yigit" wrote: >>On 8/22/2017 5:45 AM, zimeiw wrote: >>> hi, >>> My test env is dell T430 server, NIC is intel 82599ES. >>> I integrate my app with dpdk-17.05.1 version. after do some tcp traffic >>> testing, app can receive packets from the 82599 NIC, but 82599 NIC send >>> packets failed. >>> But my app with dpdk-17.02 works well. >>> >>> >>> If any change with ixgbe driver? or need some additional configuration for >>> the NIC? >> >>tx_pkt_prepare added but it has been added on 17.02 which seems working >>for your case. >> >>Just to double check, are you using rte_eth_tx_prepare()? >> >>> >>> >>> >>> Thanks. >>> >>> >>> -- >>> >>> Best Regards, >>> zimeiw >>> >> > > > > >
Re: [dpdk-dev] [PATCH] vhost: added user callbacks for socket open/close
Hi Jens, > I'm a little uncertain but my gut feeling is that in this context a > connection is > something between two sockets, not between devices. What do you mean? This is a unix domain socket connection. DPDK can create the socket, then the client may connect to it via connect(2). > I would probably add > these callbacks to struct vhost_user_socket. This is also where we keep the > list of connections. I get your point. However, it's vhost_device_ops struct that's being set by the user via rte_vhost_driver_callback_register(). The new_connection callback is there just to mark the device as *in use, can't be deleted*. It doesn't transport any connection data. Regards, D.
Re: [dpdk-dev] [PATCH 1/7] member: implement main API
On Mon, 2017-08-21 at 17:19 -0700, Yipeng Wang wrote: > Membership library is an extension and generalization of a > traditional > filter (for example Bloom Filter) structure. In general, the > Membership > library is a data structure that provides a "set-summary" and > responds > to set-membership queries of whether a certain element belongs to a > set(s). A membership test for an element will return the set this > element > belongs to or not-found if the element is never inserted into the > set-summary. > > The results of the membership test is not 100% accurate. Certain > false positive or false negative probability could exist. However, > comparing to a "full-blown" complete list of elements, a "set- > summary" > is memory efficient and fast on lookup. > > This patch add the main API definition. > > Signed-off-by: Yipeng Wang > --- > lib/Makefile | 2 + > lib/librte_eal/common/eal_common_log.c | 1 + > lib/librte_eal/common/include/rte_log.h | 1 + > lib/librte_member/Makefile | 48 +++ > lib/librte_member/rte_member.c | 357 + > lib/librte_member/rte_member.h | 518 > +++ > lib/librte_member/rte_member_version.map | 15 + > 7 files changed, 942 insertions(+) > create mode 100644 lib/librte_member/Makefile > create mode 100644 lib/librte_member/rte_member.c > create mode 100644 lib/librte_member/rte_member.h > create mode 100644 lib/librte_member/rte_member_version.map > > diff --git a/lib/librte_member/Makefile b/lib/librte_member/Makefile > new file mode 100644 > index 000..997c825 > --- /dev/null > +++ b/lib/librte_member/Makefile > @@ -0,0 +1,48 @@ > +# BSD LICENSE > +# > +# Copyright(c) 2017 Intel Corporation. All rights reserved. > +# All rights reserved. > +# > +# Redistribution and use in source and binary forms, with or > without > +# modification, are permitted provided that the following > conditions > +# are met: > +# > +# * Redistributions of source code must retain the above > copyright > +# notice, this list of conditions and the following > disclaimer. > +# * Redistributions in binary form must reproduce the above > copyright > +# notice, this list of conditions and the following disclaimer > in > +# the documentation and/or other materials provided with the > +# distribution. > +# * Neither the name of Intel Corporation nor the names of its > +# contributors may be used to endorse or promote products > derived > +# from this software without specific prior written > permission. > +# > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > CONTRIBUTORS > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > NOT > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > FITNESS FOR > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > COPYRIGHT > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > USE, > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > ON ANY > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > TORT > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF > THE USE > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > DAMAGE. > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +# library name > +LIB = librte_member.a > + > +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 > + This breaks reproducibility as the output directory will be included before the source directory, causing a race - please do something like: CFLAGS := -I$(SRCDIR) $(CFLAGS) CFLAGS += $(WERROR_FLAGS) -O3 > +EXPORT_MAP := rte_member_version.map > + > +LIBABIVER := 1 > + > +# all source are stored in SRCS-y > +SRCS-$(CONFIG_RTE_LIBRTE_MEMBER) += rte_member.c > +# install includes > +SYMLINK-$(CONFIG_RTE_LIBRTE_MEMBER)-include := rte_member.h > + > +include $(RTE_SDK)/mk/rte.lib.mk -- Kind regards, Luca Boccassi
[dpdk-dev] [PATCH v2] nfp: handle packets with length 0 as usual ones
A DPDK app could, whatever the reason, send packets with size 0. The PMD is not sending those packets, which does make sense, but the problem is the mbuf is not released either. That leads to mbufs not being available, because the app trusts the PMD will do it. Although this is a problem related to app wrong behaviour, we should harden the PMD in this regard. Not sending a packet with size 0 could be problematic, needing special handling inside the PMD xmit function. It could be a burst of those packets, which can be easily handled, but it could also be a single packet in a burst, what is harder to handle. It would be simpler to just send that kind of packets, which will likely be dropped by the hw at some point. The main problem is how the fw/hw handles the DMA, because a dma read to a hypothetical 0x0 address could trigger an IOMMU error. It turns out, it is safe to send a descriptor with packet size 0 to the hardware: the DMA never happens, from the PCIe point of view. v2: remove code for handling zero-length mbuf chained. Signed-off-by: Alejandro Lucero --- drivers/net/nfp/nfp_net.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 92b03c4..6f1800c 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -2094,7 +2094,7 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq) */ pkt_size = pkt->pkt_len; - while (pkt_size) { + while (pkt) { /* Copying TSO, VLAN and cksum info */ *txds = txd; @@ -2126,13 +2126,13 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq) txq->wr_p = 0; pkt_size -= dma_size; - if (!pkt_size) { + if (!pkt_size) /* End of packet */ txds->offset_eop |= PCIE_DESC_TX_EOP; - } else { + else txds->offset_eop &= PCIE_DESC_TX_OFFSET_MASK; - pkt = pkt->next; - } + + pkt = pkt->next; /* Referencing next free TX descriptor */ txds = &txq->txds[txq->wr_p]; lmbuf = &txq->txbufs[txq->wr_p].mbuf; -- 1.9.1
Re: [dpdk-dev] [PATCH] vhost: added user callbacks for socket open/close
On Tue, Aug 22, 2017 at 09:55:19AM +, Stojaczyk, DariuszX wrote: Hi Jens, I'm a little uncertain but my gut feeling is that in this context a connection is something between two sockets, not between devices. What do you mean? This is a unix domain socket connection. DPDK can create the socket, then the client may connect to it via connect(2). yes, I get that. I would probably add these callbacks to struct vhost_user_socket. This is also where we keep the list of connections. I get your point. However, it's vhost_device_ops struct that's being set by the user via rte_vhost_driver_callback_register(). The new_connection callback is there just to mark the device as *in use, can't be deleted*. It doesn't transport any connection data. You're right, I overlooked that it needs to be set by the user. In this case your patch is the smallest possible change and looks good to me. Do we need a documentation change for this? regards, Jens
[dpdk-dev] [PATCH] hash: optimize the softrss computation
Use rte_bsf32 and fast bit unset operation to optimize the softrss computation. The following measurements shows improvement over the default softrss computation function. tuple lens old(cycles) new(cycles) 31225 337 93743 992 Signed-off-by: Yangchao Zhou --- lib/librte_hash/rte_thash.h | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h index 2fffd61..4fa5e07 100644 --- a/lib/librte_hash/rte_thash.h +++ b/lib/librte_hash/rte_thash.h @@ -207,15 +207,14 @@ static inline uint32_t rte_softrss(uint32_t *input_tuple, uint32_t input_len, const uint8_t *rss_key) { - uint32_t i, j, ret = 0; + uint32_t i, j, map, ret = 0; for (j = 0; j < input_len; j++) { - for (i = 0; i < 32; i++) { - if (input_tuple[j] & (1 << (31 - i))) { - ret ^= rte_cpu_to_be_32(((const uint32_t *)rss_key)[j]) << i | + for (map = input_tuple[j]; map; map &= (map - 1)) { + i = rte_bsf32(map); + ret ^= rte_cpu_to_be_32(((const uint32_t *)rss_key)[j]) << (31 - i) | (uint32_t)((uint64_t)(rte_cpu_to_be_32(((const uint32_t *)rss_key)[j + 1])) >> - (32 - i)); - } + (i + 1)); } } return ret; @@ -238,14 +237,13 @@ static inline uint32_t rte_softrss_be(uint32_t *input_tuple, uint32_t input_len, const uint8_t *rss_key) { - uint32_t i, j, ret = 0; + uint32_t i, j, map, ret = 0; for (j = 0; j < input_len; j++) { - for (i = 0; i < 32; i++) { - if (input_tuple[j] & (1 << (31 - i))) { - ret ^= ((const uint32_t *)rss_key)[j] << i | - (uint32_t)((uint64_t)(((const uint32_t *)rss_key)[j + 1]) >> (32 - i)); - } + for (map = input_tuple[j]; map; map &= (map - 1)) { + i = rte_bsf32(map); + ret ^= ((const uint32_t *)rss_key)[j] << (31 - i) | + (uint32_t)((uint64_t)(((const uint32_t *)rss_key)[j + 1]) >> (i + 1)); } } return ret; -- 2.7.4
Re: [dpdk-dev] [PATCH] vhost: added user callbacks for socket open/close
On Tue, Aug 22, 2017 at 01:58:44PM +0200, Jens Freimann wrote: On Tue, Aug 22, 2017 at 09:55:19AM +, Stojaczyk, DariuszX wrote: Do we need a documentation change for this? To answer my own question, I think doc/guides/prog_guide/vhost_lib.rst needs an update. regards, Jens regards, Jens
[dpdk-dev] [PATCH v2] vhost: added user callbacks for socket open/close
When user receives destroy_device signal, he does not know *why* that event happened. He does not differ between socket shutdown and virtio processing pause. User could completely delete device during transition from BIOS to kernel, causing freeze or possibly kernel panic. Instead of changing new_device/destroy_device callbacks and breaking the ABI, a set of new functions new_connection/destroy_connection has been added. Signed-off-by: Dariusz Stojaczyk --- v2: also updated vhost_lib.rst doc/guides/prog_guide/vhost_lib.rst | 15 +-- lib/librte_vhost/rte_vhost.h| 5 - lib/librte_vhost/socket.c | 23 +++ 3 files changed, 36 insertions(+), 7 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 5979290..861a0e2 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -129,8 +129,7 @@ The following is an overview of some key Vhost API functions: * ``destroy_device(int vid)`` -This callback is invoked when a virtio device shuts down (or when the -vhost connection is broken). +This callback is invoked when a virtio device is paused or shut down. * ``vring_state_changed(int vid, uint16_t queue_id, int enable)`` @@ -143,6 +142,18 @@ The following is an overview of some key Vhost API functions: ``VHOST_F_LOG_ALL`` will be set/cleared at the start/end of live migration, respectively. + * ``new_connection(int vid)`` + +This callback is invoked on new vhost-user socket connection. If DPDK +acts as the server the device should not be deleted before + ``destroy_connection`` callback is received. + + * ``destroy_connection(int vid)`` + +This callback is invoked when vhost-user socket connection is closed. +It indicates that device with id ``vid`` is no longer in use and can be +safely deleted. + * ``rte_vhost_driver_disable/enable_features(path, features))`` This function disables/enables some features. For example, it can be used to diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index 8c974eb..8f86167 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -107,7 +107,10 @@ struct vhost_device_ops { */ int (*features_changed)(int vid, uint64_t features); - void *reserved[4]; /**< Reserved for future extension */ + int (*new_connection)(int vid); /**< Connect to socket. */ + void (*destroy_connection)(int vid);/**< Disconnect from socket */ + + void *reserved[2]; /**< Reserved for future extension */ }; /** diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 41aa3f9..4ab4ff7 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -230,24 +230,36 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n", vid); + if (vsocket->notify_ops->new_connection) { + ret = vsocket->notify_ops->new_connection(vid); + if (ret < 0) { + RTE_LOG(ERR, VHOST_CONFIG, + "failed to add vhost user connection with fd %d\n", + fd); + goto err; + } + } + conn->connfd = fd; conn->vsocket = vsocket; conn->vid = vid; ret = fdset_add(&vhost_user.fdset, fd, vhost_user_read_cb, NULL, conn); if (ret < 0) { - conn->connfd = -1; - free(conn); - close(fd); RTE_LOG(ERR, VHOST_CONFIG, "failed to add fd %d into vhost server fdset\n", fd); - return; + goto err; } pthread_mutex_lock(&vsocket->conn_mutex); TAILQ_INSERT_TAIL(&vsocket->conn_list, conn, next); pthread_mutex_unlock(&vsocket->conn_mutex); + return; + +err: + free(conn); + close(fd); } /* call back when there is new vhost-user connection from client */ @@ -277,6 +289,9 @@ vhost_user_read_cb(int connfd, void *dat, int *remove) *remove = 1; vhost_destroy_device(conn->vid); + if (vsocket->notify_ops->destroy_connection) + vsocket->notify_ops->destroy_connection(conn->vid); + pthread_mutex_lock(&vsocket->conn_mutex); TAILQ_REMOVE(&vsocket->conn_list, conn, next); pthread_mutex_unlock(&vsocket->conn_mutex); -- 2.7.4
[dpdk-dev] [PATCH v3] igb_uio: MSI IRQ mode, irq enable/disable refactored
This patch adds MSI IRQ mode and in a way, that should also work on older kernel versions. The base for my patch was an attempt to do this in cf705bc36c which was later reverted in d8ee82745a. Compilation was tested on Linux 3.2, 4.10 and 4.12. MSI(X) setup was already using pci_alloc_irq_vectors before, but calls to pci_free_irq_vectors were missing and added. Signed-off-by: Markus Theil --- lib/librte_eal/linuxapp/igb_uio/compat.h | 9 +- lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 175 ++ 2 files changed, 135 insertions(+), 49 deletions(-) diff --git a/lib/librte_eal/linuxapp/igb_uio/compat.h b/lib/librte_eal/linuxapp/igb_uio/compat.h index b800a53..8674088 100644 --- a/lib/librte_eal/linuxapp/igb_uio/compat.h +++ b/lib/librte_eal/linuxapp/igb_uio/compat.h @@ -125,5 +125,12 @@ static bool pci_check_and_mask_intx(struct pci_dev *pdev) #endif /* < 3.3.0 */ #if LINUX_VERSION_CODE < KERNEL_VERSION(4, 8, 0) -#define HAVE_PCI_ENABLE_MSIX + +#define HAVE_PCI_ENABLE_MSIX 1 +#define HAVE_PCI_ENABLE_MSI 1 + +#else + +#define HAVE_ALLOC_IRQ_VECTORS 1 + #endif diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c index 07a19a3..bd94eb4 100644 --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c @@ -91,6 +91,7 @@ static struct attribute *dev_attrs[] = { static const struct attribute_group dev_attr_grp = { .attrs = dev_attrs, }; + /* * It masks the msix on/off of generating MSI-X messages. */ @@ -113,6 +114,29 @@ igbuio_msix_mask_irq(struct msi_desc *desc, int32_t state) } } +/* + * It masks the msi on/off of generating MSI messages. + */ +static void +igbuio_msi_mask_irq(struct pci_dev *pdev, struct msi_desc *desc, int32_t state) +{ + u32 mask_bits = desc->masked; + u32 offset = desc->irq - pdev->irq; + u32 mask = 1 << offset; + u32 flag = !!state << offset; + + if (!desc->msi_attrib.maskbit) + return; + + mask_bits &= ~mask; + mask_bits |= flag; + + if (mask_bits != desc->masked) { + pci_write_config_dword(pdev, desc->mask_pos, mask_bits); + desc->masked = mask_bits; + } +} + /** * This is the irqcontrol callback to be registered to uio_info. * It can be used to disable/enable interrupt from user space processes. @@ -146,6 +170,16 @@ igbuio_pci_irqcontrol(struct uio_info *info, s32 irq_state) list_for_each_entry(desc, &pdev->dev.msi_list, list) igbuio_msix_mask_irq(desc, irq_state); #endif + } else if (udev->mode == RTE_INTR_MODE_MSI) { + struct msi_desc *desc; + +#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 3, 0)) + list_for_each_entry(desc, &pdev->msi_list, list) + igbuio_msi_mask_irq(pdev, desc, irq_state); +#else + list_for_each_entry(desc, &pdev->dev.msi_list, list) + igbuio_msi_mask_irq(pdev, desc, irq_state); +#endif } pci_cfg_access_unlock(pdev); @@ -309,6 +343,91 @@ igbuio_pci_release_iomem(struct uio_info *info) } static int +igbuio_pci_enable_interrupts(struct rte_uio_pci_dev *udev) +{ + int err = 0; +#ifdef HAVE_PCI_ENABLE_MSIX + struct msix_entry msix_entry; +#endif + + switch (igbuio_intr_mode_preferred) { + case RTE_INTR_MODE_MSIX: + /* Only 1 msi-x vector needed */ +#ifdef HAVE_PCI_ENABLE_MSIX + msix_entry.entry = 0; + if (pci_enable_msix(udev->pdev, &msix_entry, 1) == 0) { + dev_dbg(&udev->pdev->dev, "using MSI-X"); + udev->info.irq_flags = IRQF_NO_THREAD; + udev->info.irq = msix_entry.vector; + udev->mode = RTE_INTR_MODE_MSIX; + break; + } +#else + if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSIX) == 1) { + dev_dbg(&udev->pdev->dev, "using MSI-X"); + udev->info.irq = pci_irq_vector(udev->pdev, 0); + udev->mode = RTE_INTR_MODE_MSIX; + break; + } +#endif + case RTE_INTR_MODE_MSI: +#ifdef HAVE_PCI_ENABLE_MSI + if (pci_enable_msi(udev->pdev) == 0) { + dev_dbg(&udev->pdev->dev, "using MSI"); + udev->info.irq_flags = IRQF_NO_THREAD; + udev->info.irq = udev->pdev->irq; + udev->mode = RTE_INTR_MODE_MSI; + break; + } +#else + if (pci_alloc_irq_vectors(udev->pdev, 1, 1, PCI_IRQ_MSI) == 1) { + dev_dbg(&udev->pdev->dev, "using MSI"); + udev->info.irq = pci_irq_vector(udev->pdev, 0); + udev->mode = RTE_INTR_MODE_MSI; + break; + } +#endif +
[dpdk-dev] [PATCH 1/3] ethdev: expose Rx hardware timestamp
Added new capability to the list of rx offloads for hw timestamp The PMDs how expose this capability will always have it enabled. But, if the following API got accepted applications can choose between disable/enable this API. http://dpdk.org/dev/patchwork/patch/27470/ Signed-off-by: Raslan Darawsheh --- lib/librte_ether/rte_ethdev.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index 0adf327..cc5d281 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -907,6 +907,8 @@ struct rte_eth_conf { #define DEV_RX_OFFLOAD_QINQ_STRIP 0x0020 #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x0040 #define DEV_RX_OFFLOAD_MACSEC_STRIP 0x0080 +#define DEV_RX_OFFLOAD_TIMESTAMP 0x0100 +/**< Device puts raw timestamp in mbuf. */ /** * TX offload capabilities of a device. -- 2.7.4
[dpdk-dev] [PATCH 2/3] app/testpmd: add Rx timestamp in testpmd
Added new print in case a PMD exposes Rx timestamp. Also, added a print for timestamp value in rxonly mode in case the packet was timestamped. Signed-off-by: Raslan Darawsheh --- app/test-pmd/config.c | 3 +++ app/test-pmd/rxonly.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 3ae3e1c..8a5da5d 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -598,6 +598,9 @@ port_offload_cap_display(portid_t port_id) printf("off\n"); } + if (dev_info.rx_offload_capa & DEV_RX_OFFLOAD_TIMESTAMP) + printf("HW timestamp: on\n"); + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_QINQ_INSERT) { printf("Double VLANs insert: "); if (ports[port_id].tx_ol_flags & diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index 5ef0219..f4d35d7 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -158,6 +158,8 @@ pkt_burst_receive(struct fwd_stream *fs) printf("hash=0x%x ID=0x%x ", mb->hash.fdir.hash, mb->hash.fdir.id); } + if (ol_flags & PKT_RX_TIMESTAMP) + printf(" - timestamp %lu ", mb->timestamp); if (ol_flags & PKT_RX_VLAN_STRIPPED) printf(" - VLAN tci=0x%x", mb->vlan_tci); if (ol_flags & PKT_RX_QINQ_STRIPPED) -- 2.7.4
[dpdk-dev] [PATCH 3/3] net/mlx5: add hardware timestamp
Expose a new capapilty of Rx hw timestamp and added new device args to enable it hw_timestamp. It will add the raw hw timestamp into the packets. Its expected that it will lower down the performance since using it will disable the cqe comprission, and will add extra checkes in the vec rx path. Signed-off-by: Raslan Darawsheh --- drivers/net/mlx5/mlx5.c | 23 +++ drivers/net/mlx5/mlx5.h | 1 + drivers/net/mlx5/mlx5_ethdev.c | 3 ++- drivers/net/mlx5/mlx5_rxq.c | 3 +++ drivers/net/mlx5/mlx5_rxtx.c | 5 + drivers/net/mlx5/mlx5_rxtx.h | 1 + drivers/net/mlx5/mlx5_rxtx_vec_sse.c | 14 ++ 7 files changed, 49 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index b7e5046..4b3a3ab 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -94,6 +94,9 @@ /* Device parameter to enable hardware TSO offload. */ #define MLX5_TSO "tso" +/* Device parameter to enable hardware timestamp offload. */ +#define MLX5_RX_TIMESTAMP "rx_timestamp" + /* Device parameter to enable hardware Tx vector. */ #define MLX5_TX_VEC_EN "tx_vec_en" @@ -113,6 +116,7 @@ struct mlx5_args { int tso; int tx_vec_en; int rx_vec_en; + int hw_timestamp; }; /** * Retrieve integer value from environment variable. @@ -336,6 +340,8 @@ mlx5_args_check(const char *key, const char *val, void *opaque) args->tx_vec_en = !!tmp; } else if (strcmp(MLX5_RX_VEC_EN, key) == 0) { args->rx_vec_en = !!tmp; + } else if (strcmp(MLX5_RX_TIMESTAMP, key) == 0) { + args->hw_timestamp = !!tmp; } else { WARN("%s: unknown parameter", key); return -EINVAL; @@ -367,6 +373,7 @@ mlx5_args(struct mlx5_args *args, struct rte_devargs *devargs) MLX5_TSO, MLX5_TX_VEC_EN, MLX5_RX_VEC_EN, + MLX5_RX_TIMESTAMP, NULL, }; struct rte_kvargs *kvlist; @@ -426,6 +433,8 @@ mlx5_args_assign(struct priv *priv, struct mlx5_args *args) priv->tx_vec_en = args->tx_vec_en; if (args->rx_vec_en != MLX5_ARG_UNSET) priv->rx_vec_en = args->rx_vec_en; + if (args->hw_timestamp != MLX5_ARG_UNSET) + priv->hw_timestamp = args->hw_timestamp; } /** @@ -573,6 +582,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) .tso = MLX5_ARG_UNSET, .tx_vec_en = MLX5_ARG_UNSET, .rx_vec_en = MLX5_ARG_UNSET, + .hw_timestamp = MLX5_ARG_UNSET, }; exp_device_attr.comp_mask = @@ -581,6 +591,7 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) IBV_EXP_DEVICE_ATTR_VLAN_OFFLOADS | IBV_EXP_DEVICE_ATTR_RX_PAD_END_ALIGN | IBV_EXP_DEVICE_ATTR_TSO_CAPS | + IBV_EXP_DEVICE_ATTR_WITH_TIMESTAMP_MASK | 0; DEBUG("using port %u (%08" PRIx32 ")", port, test); @@ -662,6 +673,18 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) IBV_EXP_DEVICE_VXLAN_SUPPORT); DEBUG("L2 tunnel checksum offloads are %ssupported", (priv->hw_csum_l2tun ? "" : "not ")); + if (priv->hw_timestamp) { + priv->hw_timestamp = + (exp_device_attr.comp_mask | +IBV_EXP_DEVICE_ATTR_WITH_TIMESTAMP_MASK); + DEBUG("Timestamping offload is %ssupported", + (priv->hw_timestamp ? "" : "not ")); + priv->cqe_comp = (priv->hw_timestamp ? + 0 : priv->cqe_comp); + DEBUG("%s", + (priv->hw_timestamp ? + "cqe compression is disabled" : "")); + } priv->ind_table_max_size = exp_device_attr.rx_hash_caps.max_rwq_indirection_table_size; /* Remove this check once DPDK supports larger/variable diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 43c5384..4d19351 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -120,6 +120,7 @@ struct priv { unsigned int allmulti_req:1; /* All multicast mode requested. */ unsigned int hw_csum:1; /* Checksum offload is supported. */ unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */ + unsigned int hw_timestamp:1; /* rx timestamp offload is supported. */ unsigned int hw_vlan_strip:1; /* VLAN stripping is supported. */ unsigned int hw_fcs_strip:1; /* FCS stripping is supported
[dpdk-dev] [PATCH] lib/gro: fix typo in .map file
The names of rte_gro_ctx_create() and rte_gro_ctx_destroy() in rte_gro_version.map are incorrect. This patch is to fix this issue. Fixes: e996506a1c07 ("lib/gro: add Generic Receive Offload API framework") Signed-off-by: Jiayu Hu --- lib/librte_gro/rte_gro_version.map | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_gro/rte_gro_version.map b/lib/librte_gro/rte_gro_version.map index bb40bb4..1606b6d 100644 --- a/lib/librte_gro/rte_gro_version.map +++ b/lib/librte_gro/rte_gro_version.map @@ -1,8 +1,8 @@ DPDK_17.08 { global: - rte_gro_ctrl_create; - rte_gro_ctrl_destroy; + rte_gro_ctx_create; + rte_gro_ctx_destroy; rte_gro_get_pkt_count; rte_gro_reassemble; rte_gro_reassemble_burst; -- 2.7.4
[dpdk-dev] [PATCH v2] nfp: support new firmware medatada api
We need to support how firmware metadata was handled until now and also the new api, since NFP NFD 3.0 firmware versions. The new metadata api adds flexibility for working with different metadata types and, mainly, to allow adding metadata from different firmware components independently. Although this patch just supports one type handled by the PMD, future uses regarding firmware apps will extend this support. v2: - Add explanation about what is this new metadata api about - Mention from which firmware version the new metadata api is used - Use rte_pktmbuf_mtod - Add comment about metadata api itself for making easier to read code - Remove unused define - Add defines in same header file Signed-off-by: Alejandro Lucero --- drivers/net/nfp/nfp_net.c | 53 +- drivers/net/nfp/nfp_net_ctrl.h | 7 ++ 2 files changed, 54 insertions(+), 6 deletions(-) diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c index 21ae07b..87aaa91 100644 --- a/drivers/net/nfp/nfp_net.c +++ b/drivers/net/nfp/nfp_net.c @@ -1734,6 +1734,8 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw) #define NFP_HASH_OFFSET ((uint8_t *)mbuf->buf_addr + mbuf->data_off - 4) #define NFP_HASH_TYPE_OFFSET ((uint8_t *)mbuf->buf_addr + mbuf->data_off - 8) +#define NFP_DESC_META_LEN(d) (d->rxd.meta_len_dd & PCIE_DESC_RX_META_LEN_MASK) + /* * nfp_net_set_hash - Set mbuf hash data * @@ -1744,18 +1746,57 @@ static void nfp_net_read_mac(struct nfp_net_hw *hw) nfp_net_set_hash(struct nfp_net_rxq *rxq, struct nfp_net_rx_desc *rxd, struct rte_mbuf *mbuf) { - uint32_t hash; - uint32_t hash_type; struct nfp_net_hw *hw = rxq->hw; + uint8_t *meta_offset; + uint32_t meta_info; + uint32_t hash = 0; + uint32_t hash_type = 0; if (!(hw->ctrl & NFP_NET_CFG_CTRL_RSS)) return; - if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS)) - return; + if (NFD_CFG_MAJOR_VERSION_of(hw->ver) <= 3) { + if (!(rxd->rxd.flags & PCIE_DESC_RX_RSS)) + return; - hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET); - hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET); + hash = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_OFFSET); + hash_type = rte_be_to_cpu_32(*(uint32_t *)NFP_HASH_TYPE_OFFSET); + + } else if (NFP_DESC_META_LEN(rxd)) { + /* +* new metadata api: +* < 32 bit -> +* mfield type word +* e data field #2 +* t data field #1 +* a data field #0 +* +*packet data +* +* Field type word contains up to 8 4bit field types +* A 4bit field type refers to a data field word +* A data field word can have several 4bit field types +*/ + meta_offset = rte_pktmbuf_mtod(mbuf, uint8_t *); + meta_offset -= NFP_DESC_META_LEN(rxd); + meta_info = rte_be_to_cpu_32(*(uint32_t *)meta_offset); + meta_offset += 4; + /* NFP PMD just supports metadata for hashing */ + switch (meta_info & NFP_NET_META_FIELD_MASK) { + case NFP_NET_META_HASH: + /* next field type is about the hash type */ + meta_info >>= NFP_NET_META_FIELD_SIZE; + /* hash value is in the data field */ + hash = rte_be_to_cpu_32(*(uint32_t *)meta_offset); + hash_type = meta_info & NFP_NET_META_FIELD_MASK; + break; + default: + /* Unsupported metadata can be a performance issue */ + return; + } + } else { + return; + } mbuf->hash.rss = hash; mbuf->ol_flags |= PKT_RX_RSS_HASH; diff --git a/drivers/net/nfp/nfp_net_ctrl.h b/drivers/net/nfp/nfp_net_ctrl.h index 2c50043..c1cba0e 100644 --- a/drivers/net/nfp/nfp_net_ctrl.h +++ b/drivers/net/nfp/nfp_net_ctrl.h @@ -52,6 +52,13 @@ /* Offset in Freelist buffer where packet starts on RX */ #define NFP_NET_RX_OFFSET 32 +/* working with metadata api (NFD version > 3.0) */ +#define NFP_NET_META_FIELD_SIZE 4 +#define NFP_NET_META_FIELD_MASK ((1 << NFP_NET_META_FIELD_SIZE) - 1) + +/* Prepend field types */ +#define NFP_NET_META_HASH 1 /* next field carries hash type */ + /* Hash type pre-pended when a RSS hash was computed */ #define NFP_NET_RSS_NONE0 #define NFP_NET_RSS_IPV41 -- 1.9.1
Re: [dpdk-dev] [PATCH v2 1/2] app/testpmd: support the heavywight mode GRO
Hi Ferruh, On Tue, Aug 22, 2017 at 09:30:32AM +0100, Ferruh Yigit wrote: > On 8/22/2017 2:00 AM, Hu, Jiayu wrote: > > Hi, > >> -Original Message- > >> From: Yigit, Ferruh > >> Sent: Monday, August 21, 2017 7:04 PM > >> To: Hu, Jiayu ; dev@dpdk.org > >> Cc: Ananyev, Konstantin ; Tan, Jianfeng > >> ; tho...@monjalon.net; Wu, Jingjing > >> ; Yao, Lei A > >> Subject: Re: [PATCH v2 1/2] app/testpmd: support the heavywight mode GRO > >> > >> On 8/17/2017 10:08 AM, Jiayu Hu wrote: > >>> The GRO library provides two reassembly modes: lightweight mode and > >>> heavyweight mode. This patch is to support the heavyweight mode in > >>> csum forwarding engine. > >>> > >>> With the command "set port gro (heavymode|lightmode) > >> (on|off)", > >>> users can select the lightweight mode or the heavyweight mode to use. > >> With > >>> the command "set gro flush interval ", users can set the interval of > >>> flushing GROed packets from reassembly tables for the heavyweight mode. > >>> With the command "show port gro", users can display GRO > >>> configuration. > >>> > >>> Signed-off-by: Jiayu Hu > >> > >> <...> > >> > >>> lcoreid_t cpuid_idx;/**< index of logical core in CPU id table */ > >>> @@ -434,13 +436,21 @@ extern struct ether_addr > >> peer_eth_addrs[RTE_MAX_ETHPORTS]; > >>> extern uint32_t burst_tx_delay_time; /**< Burst tx delay time(us) for > >>> mac- > >> retry. */ > >>> extern uint32_t burst_tx_retry_num; /**< Burst tx retry number for mac- > >> retry. */ > >>> > >>> +#define GRO_HEAVYMODE 0x1 > >>> +#define GRO_LIGHTMODE 0x2 > >> > >> Why these are not part of the gro library? > >> Is the concept "lightweight mode and heavyweight mode" part of gro > >> library or implemented only in testpmd? > > > > Lightweight mode and heavyweight mode are two reassembly methods we > > provided in the GRO library. For applications, they are just two kinds of > > APIs. > > Applications can select any of them to merge packets. > > GRO modes are defined in testpmd, and kept in testpmd variables, library > seems not aware of these modes. > > What are these two APIs, rte_gro_reassemble() and > rte_gro_reassemble_burst() ? > Perhaps you can detail what Lightweight mode and heavyweight mode are, > doc also don't have much about it. > > This still looks like gro library provides common API and testpmd calls > these API with different parameters and calls these lightweight and > heavyweight, if these modes are common use case, I believe they should Yes, the GRO API doesn't show the concept of 'heavyweight' and 'lightweight'. This concept is only used to describe the supported reassembly modes in the GRO library. > be part of library. If not, instead of saying different gro modes, it > can be presented as different gro usage samples in testpmd. What does "different gro usage samples" mean? Different forwarding engines? > > testpmd good for testing dpdk, and good for providing usage sample for > APIs, but I believe it shouldn't have the concepts coded in it, > libraries should have it, that is what end user uses. > > > > > In testpmd, we want to show how to use these two reassembly modes, so > > I define two macros to present them. Users can select which one to use via > > command line. > > > >> > >>> + > >>> #define GRO_DEFAULT_FLOW_NUM 4 > >>> #define GRO_DEFAULT_ITEM_NUM_PER_FLOW DEF_PKT_BURST > >>> + > >>> +#define GRO_DEFAULT_FLUSH_INTERVAL 2 > >>> +#define GRO_MAX_FLUSH_INTERVAL 4 > >>> + > >>> struct gro_status { > >>> struct rte_gro_param param; > >>> uint8_t enable; > >>> }; > >>> extern struct gro_status gro_ports[RTE_MAX_ETHPORTS]; > >>> +extern uint32_t gro_flush_interval; > >> > >> <...>
Re: [dpdk-dev] [PATCH] lib/gro: fix typo in .map file
On 8/22/2017 2:58 PM, Jiayu Hu wrote: > The names of rte_gro_ctx_create() and rte_gro_ctx_destroy() in > rte_gro_version.map are incorrect. This patch is to fix this issue. > > Fixes: e996506a1c07 ("lib/gro: add Generic Receive Offload API framework") > > Signed-off-by: Jiayu Hu Reviewed-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH v2 1/2] eal: add uevent api for hot plug
a. about uevent mechanism As we know, uevent is come from the kobject of the kernel side, every kobject would have its own uevent, and a sysfs folder identify a kobject, such as cpu,usb,pci,pci-express,virio, these bus component all have uevent. I agree that uevent would be the best if it could integrated in the bus layer. I check the kernel src code , the uevent related is in lib/koject_uvent.c, and only for linux not for bsp, both support uio and vfio, so where shoud dpdk uevent be location? I come to my mind 4 option below, and I propose 2) and 4). 1)Eal_bus: (but uevent like netlink socket thing and event polling not related with bus behavior) 2)eal_dev: (just considerate it like kernel's udev, and create new epoll, uevent handler) 3)add new file eal_udev.c 4)eal_interrupt. (add into the interrupt epoll, use interrupt handler) Shreyansh & jblunck & gaetan Since you recently work on pci bus layer and expert on that, I want to ask you that if you plan about any other bus layer rework would be conflict my proposer, or would let me modify to compatibility with the next architect? If you have, please let me know. thanks. b. about pci uevent handler. I suppose 2 option: 1)use a common interrupt handler for pci pmd to let app or fail-safe pmd to register. 2)use a common uevent handler for pci pmd to let app or fail-safe pmd register. Community, are there any good comment about that ? Best regards, Jeff Guo -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Guo, Jia Sent: Wednesday, July 5, 2017 5:04 PM To: Thomas Monjalon Cc: dev@dpdk.org; Zhang, Helin ; Wu, Jingjing Subject: Re: [dpdk-dev] [PATCH v2 1/2] eal: add uevent api for hot plug On 7/5/2017 3:32 PM, Thomas Monjalon wrote: > 05/07/2017 05:02, Guo, Jia: >> hi, thomas >> >> >> On 7/5/2017 7:45 AM, Thomas Monjalon wrote: >>> Hi, >>> >>> This is an interesting step for hotplug in DPDK. >>> >>> 28/06/2017 13:07, Jeff Guo: + netlink_fd = socket(PF_NETLINK, SOCK_DGRAM, + NETLINK_KOBJECT_UEVENT); >>> It is monitoring the whole system... >>> +int +rte_uevent_get(int fd, struct rte_uevent *uevent) { + int ret; + char buf[RTE_UEVENT_MSG_LEN]; + + memset(uevent, 0, sizeof(struct rte_uevent)); + memset(buf, 0, RTE_UEVENT_MSG_LEN); + + ret = recv(fd, buf, RTE_UEVENT_MSG_LEN - 1, MSG_DONTWAIT); >>> ... and it is read from this function called by one driver. >>> It cannot work without a global dispatch. >> the rte_uevent-connect is called in func of pci_uio_alloc_resource, >> so each socket is created by by each uio device. so i think that >> would not affect each driver isolate to use it. > Ah OK, I missed it. > >>> It must be a global mechanism, probably a service core. >>> The question is also to know whether it should be a mandatory >>> service in DPDK or an optional helper? >> a global mechanism would be good, but so far, include mlx driver, we >> all handle the hot plug event in driver by app's registered callback. >> maybe a better global would be try in the future. but now is would >> work for all pci uio device. > mlx drivers have a special connection to the kernel through the > associated mlx kernel drivers. That's why the PMD handle the events in a > specific way. > > You are adding event handling for UIO. > Now we need also VFIO. > > I am wondering how it could be better integrated in the bus layer. absolutely, hot plug for VFIO must be request more for the live migration, and we plan to add it at next level, when we go thought all uio hot plug feature integration done. so, could i expect an ack if there aren't other concern about uio uevent here. thanks. >> and more, if in pci uio device to use hot plug , i think it might be >> mandatory.
Re: [dpdk-dev] [PATCH] hash: optimize the softrss computation
Hi, 2017-08-22 15:02 GMT+03:00 Yangchao Zhou : > Use rte_bsf32 and fast bit unset operation to optimize the softrss > computation. > The following measurements shows improvement over the default > softrss computation function. > > tuple lens old(cycles) new(cycles) > 31225 337 > 93743 992 > > Signed-off-by: Yangchao Zhou > --- > lib/librte_hash/rte_thash.h | 22 ++ > 1 file changed, 10 insertions(+), 12 deletions(-) > > diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h > index 2fffd61..4fa5e07 100644 > --- a/lib/librte_hash/rte_thash.h > +++ b/lib/librte_hash/rte_thash.h > @@ -207,15 +207,14 @@ static inline uint32_t > rte_softrss(uint32_t *input_tuple, uint32_t input_len, > const uint8_t *rss_key) > { > - uint32_t i, j, ret = 0; > + uint32_t i, j, map, ret = 0; > > for (j = 0; j < input_len; j++) { > - for (i = 0; i < 32; i++) { > - if (input_tuple[j] & (1 << (31 - i))) { > - ret ^= rte_cpu_to_be_32(((const uint32_t > *)rss_key)[j]) << i | > + for (map = input_tuple[j]; map; map &= (map - 1)) { > + i = rte_bsf32(map); > + ret ^= rte_cpu_to_be_32(((const uint32_t > *)rss_key)[j]) << (31 - i) | > > (uint32_t)((uint64_t)(rte_cpu_to_be_32(((const > uint32_t *)rss_key)[j + 1])) >> > - (32 - i)); > - } > + (i + 1)); > } > } > return ret; > @@ -238,14 +237,13 @@ static inline uint32_t > rte_softrss_be(uint32_t *input_tuple, uint32_t input_len, > const uint8_t *rss_key) > { > - uint32_t i, j, ret = 0; > + uint32_t i, j, map, ret = 0; > > for (j = 0; j < input_len; j++) { > - for (i = 0; i < 32; i++) { > - if (input_tuple[j] & (1 << (31 - i))) { > - ret ^= ((const uint32_t *)rss_key)[j] << i > | > - (uint32_t)((uint64_t)(((const > uint32_t *)rss_key)[j + 1]) >> (32 - i)); > - } > + for (map = input_tuple[j]; map; map &= (map - 1)) { > + i = rte_bsf32(map); > + ret ^= ((const uint32_t *)rss_key)[j] << (31 - i) | > + (uint32_t)((uint64_t)(((const uint32_t > *)rss_key)[j + 1]) >> (i + 1)); > } > } > return ret; > -- > 2.7.4 > > Looks good for me. Thanks! Reviewed-by: Medvedkin Vladimir -- Regards, Vladimir
Re: [dpdk-dev] [PATCH] net/liquidio: remove FLR request to PF driver
On 8/21/2017 7:17 AM, Shijith Thotton wrote: > igb_uio and vfio-pci does pci reset during open and release of device. > So FLR request to LiquidIO PF driver during init and close in PMD is not > required. > > See commit b58eedfc7dd5 ("igb_uio: issue FLR during open and release of > device file") > > Signed-off-by: Shijith Thotton Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH] net/ixgbe: add VLAN info in queue info msg to VF
On 8/21/2017 7:21 AM, Wei Dai wrote: > This patch align with PF kerenl driver version 5.1.3 to add the > number of queues to transmit VLAN packets in msg of queue info > to VF. If DCB is enabled, it is the number of DCB traffic > classes. If DCB is not enabled and default VLAN is enabled, > it is 1, For other cases, it is 0. > > Signed-off-by: Wei Dai Applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH v1 0/4] add per-core Turbo Boost capability
Recent generations of the Intel® Xeon® family processors allow Turbo Boost to be enabled/disabled on a per-core basis. This patch set introduces additional API calls to the librte_power library to allow users to enable/disable Turbo Boost on particular cores. Additionally, the use of the library is demonstrated by additions to the vm_power_manager example application, where the new commands have been added to allow the turbo status of cores to be changed dynamically. Extra message types have been added to the virtio-serial channels between the guest_vm_power_manager app and the vm_power_manager apps to demonstrate turbo change requests from a virtual machine. In this case, the guest will send a request to the physical host, which in turn will change the state of the turbo status. Usage Example: -- A VM has been created using 8 CPU cores, and 8 virtio-serial channels have been created as per-core communications channels between the host and the VM. See: http://www.dpdk.org/doc/guides/sample_app_ug/vm_power_management.html for more information on setting up the vm_power applications. In the vm_power_manager app on the host, we can query these channels: vmpower> show_vm ubuntu2 VM: 'ubuntu2', status = ACTIVE Channels 8 [0]: /tmp/powermonitor/ubuntu2.0, status = CONNECTED [1]: /tmp/powermonitor/ubuntu2.1, status = CONNECTED [2]: /tmp/powermonitor/ubuntu2.2, status = CONNECTED [3]: /tmp/powermonitor/ubuntu2.3, status = CONNECTED [4]: /tmp/powermonitor/ubuntu2.4, status = CONNECTED [5]: /tmp/powermonitor/ubuntu2.5, status = CONNECTED [6]: /tmp/powermonitor/ubuntu2.6, status = CONNECTED [7]: /tmp/powermonitor/ubuntu2.7, status = CONNECTED Virtual CPU(s): 8 [0]: Physical CPU Mask 0x10 [1]: Physical CPU Mask 0x20 [2]: Physical CPU Mask 0x40 [3]: Physical CPU Mask 0x80 [4]: Physical CPU Mask 0x100 [5]: Physical CPU Mask 0x200 [6]: Physical CPU Mask 0x400 [7]: Physical CPU Mask 0x800 Once the VM is up and running, if we exercise all the cores on the guest, we can use turbostat on the host to see the frequencies of the guest cores. In this example, it's cores 20-27: 19 00.0125002500 202498 100.0025002498 212498 100.0025002498 222498 100.0025002498 232498 100.0025002498 24 *2498 100.0025002498 252498 100.0025002498 262498 100.0025002498 272498 100.0025002498 28 00.0120322498 We can then issue a command in the vmpower app on the guest: vmpower(guest)> set_cpu_freq 4 enable_turbo This command will pass a message down through virtio-serial to the host, which will enable turbo on core 24, the underlying physical core for the guest's 4th lcore_id. We can then see the change by running turbostat on the host: 19 00.0125002496 202498 100.0025002498 212498 100.0025002498 222498 100.0025002498 232498 100.0025002498 24 *3297 100.0033002498 252498 100.0025002498 262498 100.0025002498 272498 100.0025002498 28 00.0110162498 Core 24 is now running at 3300MHz, whereas the remainder are still running at 2500MHz. We can issue a similar command in the vm_power_manager running on the host to disable turbo on that core, but this time we use the physical core id: vmpower> set_cpu_freq 24 disable_turbo and we see that turbo is now disabled on that core. 19 00.0025002495 202499 100.0025002499 212499 100.0025002499 222499 100.0025002499 232499 100.0025002499 24 *2499 100.0025002499 252499 100.0025002499 262499 100.0025002499 272499 100.0025002499 28 00.0110002499 [1/4] lib/librte_power: add per-core turbo capability [2/4] examples/vm_power_manager: add per-core turbo [3/4] examples/vm_power_cli_guest: add per-core turbo [4/4] lib: limit turbo to particular models of CPU
[dpdk-dev] [PATCH v1 1/4] lib/librte_power: add per-core turbo capability
Adds a new set of APIs to allow per-core turbo enable-disable. Signed-off-by: David Hunt --- lib/librte_power/channel_commands.h | 2 + lib/librte_power/rte_power.c | 9 ++ lib/librte_power/rte_power.h | 41 + lib/librte_power/rte_power_acpi_cpufreq.c | 143 ++ lib/librte_power/rte_power_acpi_cpufreq.h | 40 + lib/librte_power/rte_power_kvm_vm.c | 19 lib/librte_power/rte_power_kvm_vm.h | 35 +++- 7 files changed, 288 insertions(+), 1 deletion(-) diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h index 383897b..484085b 100644 --- a/lib/librte_power/channel_commands.h +++ b/lib/librte_power/channel_commands.h @@ -52,6 +52,8 @@ extern "C" { #define CPU_POWER_SCALE_DOWN2 #define CPU_POWER_SCALE_MAX 3 #define CPU_POWER_SCALE_MIN 4 +#define CPU_POWER_ENABLE_TURBO 5 +#define CPU_POWER_DISABLE_TURBO 6 struct channel_packet { uint64_t resource_id; /**< core_num, device */ diff --git a/lib/librte_power/rte_power.c b/lib/librte_power/rte_power.c index 998ed1c..b327a86 100644 --- a/lib/librte_power/rte_power.c +++ b/lib/librte_power/rte_power.c @@ -50,6 +50,9 @@ rte_power_freq_change_t rte_power_freq_up = NULL; rte_power_freq_change_t rte_power_freq_down = NULL; rte_power_freq_change_t rte_power_freq_max = NULL; rte_power_freq_change_t rte_power_freq_min = NULL; +rte_power_freq_change_t rte_power_turbo_status; +rte_power_freq_change_t rte_power_freq_enable_turbo; +rte_power_freq_change_t rte_power_freq_disable_turbo; int rte_power_set_env(enum power_management_env env) @@ -65,6 +68,9 @@ rte_power_set_env(enum power_management_env env) rte_power_freq_down = rte_power_acpi_cpufreq_freq_down; rte_power_freq_min = rte_power_acpi_cpufreq_freq_min; rte_power_freq_max = rte_power_acpi_cpufreq_freq_max; + rte_power_turbo_status = rte_power_acpi_turbo_status; + rte_power_freq_enable_turbo = rte_power_acpi_enable_turbo; + rte_power_freq_disable_turbo = rte_power_acpi_disable_turbo; } else if (env == PM_ENV_KVM_VM) { rte_power_freqs = rte_power_kvm_vm_freqs; rte_power_get_freq = rte_power_kvm_vm_get_freq; @@ -73,6 +79,9 @@ rte_power_set_env(enum power_management_env env) rte_power_freq_down = rte_power_kvm_vm_freq_down; rte_power_freq_min = rte_power_kvm_vm_freq_min; rte_power_freq_max = rte_power_kvm_vm_freq_max; + rte_power_turbo_status = rte_power_kvm_vm_turbo_status; + rte_power_freq_enable_turbo = rte_power_kvm_vm_enable_turbo; + rte_power_freq_disable_turbo = rte_power_kvm_vm_disable_turbo; } else { RTE_LOG(ERR, POWER, "Invalid Power Management Environment(%d) set\n", env); diff --git a/lib/librte_power/rte_power.h b/lib/librte_power/rte_power.h index 67e0ec0..b17b7a5 100644 --- a/lib/librte_power/rte_power.h +++ b/lib/librte_power/rte_power.h @@ -236,6 +236,47 @@ extern rte_power_freq_change_t rte_power_freq_max; */ extern rte_power_freq_change_t rte_power_freq_min; +/** + * Query the Turbo Boost status of a specific lcore. + * Review each environments specific documentation for usage.. + * + * @param lcore_id + * lcore id. + * + * @return + * - 1 Turbo Boost is enabled for this lcore. + * - 0 Turbo Boost is disabled for this lcore. + * - Negative on error. + */ +extern rte_power_freq_change_t rte_power_turbo_status; + +/** + * Enable Turbo Boost for this lcore. + * Review each environments specific documentation for usage.. + * + * @param lcore_id + * lcore id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +extern rte_power_freq_change_t rte_power_freq_enable_turbo; + +/** + * Disable Turbo Boost for this lcore. + * Review each environments specific documentation for usage.. + * + * @param lcore_id + * lcore id. + * + * @return + * - 0 on success. + * - Negative on error. + */ +extern rte_power_freq_change_t rte_power_freq_disable_turbo; + + #ifdef __cplusplus } #endif diff --git a/lib/librte_power/rte_power_acpi_cpufreq.c b/lib/librte_power/rte_power_acpi_cpufreq.c index a56c9b5..6695f59 100644 --- a/lib/librte_power/rte_power_acpi_cpufreq.c +++ b/lib/librte_power/rte_power_acpi_cpufreq.c @@ -87,6 +87,14 @@ #define POWER_SYSFILE_SETSPEED \ "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed" +/* + * MSR related + */ +#define PLATFORM_INFO 0x0CE +#define TURBO_RATIO_LIMIT 0x1AD +#define IA32_PERF_CTL 0x199 +#define CORE_TURBO_DISABLE_BIT ((uint64_t)1<<32) + enum power_state { POWER_IDLE = 0, POWER_ONGOING, @@ -543,3 +551,138 @@ rte_power_acpi_cpufreq_freq_min(unsigned lcore_id) /* Frequencies in the array are from high to low. */ return set_freq_internal(pi, pi->nb_fre
[dpdk-dev] [PATCH v1 2/4] examples/vm_power_manager: add per-core turbo
Add extra commands to command line to allow enable/disable of per-core turbo. Signed-off-by: David Hunt --- examples/vm_power_manager/channel_monitor.c | 12 +++ examples/vm_power_manager/power_manager.c | 36 examples/vm_power_manager/power_manager.h | 52 + examples/vm_power_manager/vm_power_cli.c| 21 4 files changed, 114 insertions(+), 7 deletions(-) diff --git a/examples/vm_power_manager/channel_monitor.c b/examples/vm_power_manager/channel_monitor.c index e7f5cc4..ac40dac 100644 --- a/examples/vm_power_manager/channel_monitor.c +++ b/examples/vm_power_manager/channel_monitor.c @@ -105,6 +105,12 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) case(CPU_POWER_SCALE_UP): power_manager_scale_core_up(core_num); break; + case(CPU_POWER_ENABLE_TURBO): + power_manager_enable_turbo_core(core_num); + break; + case(CPU_POWER_DISABLE_TURBO): + power_manager_disable_turbo_core(core_num); + break; default: break; } @@ -122,6 +128,12 @@ process_request(struct channel_packet *pkt, struct channel_info *chan_info) case(CPU_POWER_SCALE_UP): power_manager_scale_mask_up(core_mask); break; + case(CPU_POWER_ENABLE_TURBO): + power_manager_enable_turbo_mask(core_mask); + break; + case(CPU_POWER_DISABLE_TURBO): + power_manager_disable_turbo_mask(core_mask); + break; default: break; } diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c index 2644fce..80705f9 100644 --- a/examples/vm_power_manager/power_manager.c +++ b/examples/vm_power_manager/power_manager.c @@ -216,6 +216,24 @@ power_manager_scale_mask_max(uint64_t core_mask) } int +power_manager_enable_turbo_mask(uint64_t core_mask) +{ + int ret = 0; + + POWER_SCALE_MASK(enable_turbo, core_mask, ret); + return ret; +} + +int +power_manager_disable_turbo_mask(uint64_t core_mask) +{ + int ret = 0; + + POWER_SCALE_MASK(disable_turbo, core_mask, ret); + return ret; +} + +int power_manager_scale_core_up(unsigned core_num) { int ret = 0; @@ -250,3 +268,21 @@ power_manager_scale_core_max(unsigned core_num) POWER_SCALE_CORE(max, core_num, ret); return ret; } + +int +power_manager_enable_turbo_core(unsigned int core_num) +{ + int ret = 0; + + POWER_SCALE_CORE(enable_turbo, core_num, ret); + return ret; +} + +int +power_manager_disable_turbo_core(unsigned int core_num) +{ + int ret = 0; + + POWER_SCALE_CORE(disable_turbo, core_num, ret); + return ret; +} diff --git a/examples/vm_power_manager/power_manager.h b/examples/vm_power_manager/power_manager.h index 1b45bab..b74d09b 100644 --- a/examples/vm_power_manager/power_manager.h +++ b/examples/vm_power_manager/power_manager.h @@ -113,6 +113,32 @@ int power_manager_scale_mask_min(uint64_t core_mask); int power_manager_scale_mask_max(uint64_t core_mask); /** + * Enable Turbo Boost on the cores specified in core_mask. + * It is thread-safe. + * + * @param core_mask + * The uint64_t bit-mask of cores to change frequency. + * + * @return + * - 1 on success. + * - Negative on error. + */ +int power_manager_enable_turbo_mask(uint64_t core_mask); + +/** + * Disable Turbo Boost on the cores specified in core_mask. + * It is thread-safe. + * + * @param core_mask + * The uint64_t bit-mask of cores to change frequency. + * + * @return + * - 1 on success. + * - Negative on error. + */ +int power_manager_disable_turbo_mask(uint64_t core_mask); + +/** * Scale up frequency for the core specified by core_num. * It is thread-safe. * @@ -168,6 +194,32 @@ int power_manager_scale_core_min(unsigned core_num); int power_manager_scale_core_max(unsigned core_num); /** + * Enable Turbo Boost for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to boost + * + * @return + * - 1 on success. + * - Negative on error. + */ +int power_manager_enable_turbo_core(unsigned int core_num); + +/** + * Disable Turbo Boost for the core specified by core_num. + * It is thread-safe. + * + * @param core_num + * The core number to boost + * + * @return + * - 1 on success. + * - Negative on error. + */ +int power_manager_disable_turbo_core(unsigned int core_num); + +/** * Get the current freuency of the core specified by core_
[dpdk-dev] [PATCH v1 3/4] examples/vm_power_cli_guest: add per-core turbo
Add extra commands to guest cli to allow enable/disable of per-core turbo. Includes messages to vm_power_mgr in host. Signed-off-by: David Hunt --- examples/vm_power_manager/guest_cli/vm_power_cli_guest.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c index 7931135..4e982bd 100644 --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c @@ -108,6 +108,10 @@ cmd_set_cpu_freq_parsed(void *parsed_result, struct cmdline *cl, ret = rte_power_freq_min(res->lcore_id); else if (!strcmp(res->cmd , "max")) ret = rte_power_freq_max(res->lcore_id); + else if (!strcmp(res->cmd, "enable_turbo")) + ret = rte_power_freq_enable_turbo(res->lcore_id); + else if (!strcmp(res->cmd, "disable_turbo")) + ret = rte_power_freq_disable_turbo(res->lcore_id); if (ret != 1) cmdline_printf(cl, "Error sending message: %s\n", strerror(ret)); } @@ -120,7 +124,7 @@ cmdline_parse_token_string_t cmd_set_cpu_freq_core_num = lcore_id, UINT8); cmdline_parse_token_string_t cmd_set_cpu_freq_cmd_cmd = TOKEN_STRING_INITIALIZER(struct cmd_set_cpu_freq_result, - cmd, "up#down#min#max"); + cmd, "up#down#min#max#enable_turbo#disable_turbo"); cmdline_parse_inst_t cmd_set_cpu_freq_set = { .f = cmd_set_cpu_freq_parsed, -- 2.7.4
[dpdk-dev] [PATCH v1 4/4] lib: limit turbo to particular models of CPU
The per-core turbo functionality is only available on specific models of CPU, so this patch limits it to those models. Signed-off-by: David Hunt --- lib/librte_power/rte_power_acpi_cpufreq.c | 28 1 file changed, 28 insertions(+) diff --git a/lib/librte_power/rte_power_acpi_cpufreq.c b/lib/librte_power/rte_power_acpi_cpufreq.c index 6695f59..ec8d304 100644 --- a/lib/librte_power/rte_power_acpi_cpufreq.c +++ b/lib/librte_power/rte_power_acpi_cpufreq.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -554,6 +555,27 @@ rte_power_acpi_cpufreq_freq_min(unsigned lcore_id) static int +per_core_turbo_supported(void) +{ + uint32_t eax, ebx, ecx, edx; + int family, model; + + __cpuid(1, eax, ebx, ecx, edx); + + family = (eax >> 8) & 0xf; + if (family > 5) + model = ((eax >> 4) & 0xf) | ((eax >> 12) & 0xf0); + else + model = (eax >> 4) & 0xf; + + if (family == 6) + if ((model == 63) || (model == 79) || (model == 85)) + return 1; + return 0; +} + + +static int rdmsr(int lcore, int msr, uint64_t *val) { char filename[32]; @@ -607,6 +629,8 @@ rte_power_acpi_turbo_status(unsigned int lcore_id) } #if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64) + if (!per_core_turbo_supported()) + return 0; retval = rdmsr(lcore_id, IA32_PERF_CTL, &val); if (retval) return retval; @@ -630,6 +654,8 @@ rte_power_acpi_enable_turbo(unsigned int lcore_id) } #if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64) + if (!per_core_turbo_supported()) + return 0; /* * The low byte of 1ADh MSR contains max recomended ratio when a small * number of cores are active. Use this ratio when turbo is enabled. @@ -663,6 +689,8 @@ rte_power_acpi_disable_turbo(unsigned int lcore_id) } #if defined(RTE_ARCH_I686) || defined(RTE_ARCH_X86_64) + if (!per_core_turbo_supported()) + return 0; /* * 0CEh MSR contains max non-turbo ratio in bits 8-15. Use this * for the freq when turbo is disabled for that core. -- 2.7.4
Re: [dpdk-dev] [PATCH v2] nfp: handle packets with length 0 as usual ones
On 8/22/2017 11:41 AM, Alejandro Lucero wrote: > A DPDK app could, whatever the reason, send packets with size 0. > The PMD is not sending those packets, which does make sense, > but the problem is the mbuf is not released either. That leads > to mbufs not being available, because the app trusts the > PMD will do it. > > Although this is a problem related to app wrong behaviour, we > should harden the PMD in this regard. Not sending a packet with > size 0 could be problematic, needing special handling inside the > PMD xmit function. It could be a burst of those packets, which can > be easily handled, but it could also be a single packet in a burst, > what is harder to handle. > > It would be simpler to just send that kind of packets, which will > likely be dropped by the hw at some point. The main problem is how > the fw/hw handles the DMA, because a dma read to a hypothetical 0x0 > address could trigger an IOMMU error. It turns out, it is safe to > send a descriptor with packet size 0 to the hardware: the DMA never > happens, from the PCIe point of view. > > Signed-off-by: Alejandro Lucero Applied to dpdk-next-net/master, thanks.
Re: [dpdk-dev] [PATCH v2] nfp: support new firmware medatada api
On 8/22/2017 3:00 PM, Alejandro Lucero wrote: > We need to support how firmware metadata was handled until now and also > the new api, since NFP NFD 3.0 firmware versions. The new metadata api > adds flexibility for working with different metadata types and, mainly, > to allow adding metadata from different firmware components independently. > > Although this patch just supports one type handled by the PMD, future uses > regarding firmware apps will extend this support. > > Signed-off-by: Alejandro Lucero Applied to dpdk-next-net/master, thanks.
[dpdk-dev] [PATCH 1/2] app/testpmd: add traffic management forwarding mode
This commit extends the testpmd application with new forwarding engine that demonstrates the use of ethdev traffic management APIs and softnic PMD for QoS traffic management. In this mode, 5-level hierarchical tree of the QoS scheduler is built with the help of ethdev TM APIs such as shaper profile add/delete, shared shaper add/update, node add/delete, hierarchy commit, etc. The hierarchical tree has following nodes; root node(x1, level 0), subport node(x1, level 1), pipe node(x4096, level 2), tc node(x16348, level 3), queue node(x65536, level 4). During runtime, each received packet is first classified by mapping the packet fields information to 5-tuples (HQoS subport, pipe, traffic class, queue within traffic class, and color) and storing it in the packet mbuf sched field. After classification, each packet is sent to softnic port which prioritizes the transmission of the received packets, and accordingly sends them on to the output interface. To enable traffic management mode, following testpmd command is used; $ ./testpmd -c c -n 4 --vdev 'net_softnic0,hard_name=:06:00.1,soft_tm=on' -- -i --forward-mode=softnictm Signed-off-by: Jasvinder Singh --- app/test-pmd/Makefile| 5 + app/test-pmd/softnictm.c | 870 +++ app/test-pmd/testpmd.c | 11 + app/test-pmd/testpmd.h | 34 ++ 4 files changed, 920 insertions(+) create mode 100644 app/test-pmd/softnictm.c diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile index c36be19..7c3f5e8 100644 --- a/app/test-pmd/Makefile +++ b/app/test-pmd/Makefile @@ -57,6 +57,7 @@ SRCS-y += rxonly.c SRCS-y += txonly.c SRCS-y += csumonly.c SRCS-y += icmpecho.c +SRCS-y += softnictm.c SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) @@ -81,6 +82,10 @@ ifeq ($(CONFIG_RTE_LIBRTE_PMD_XENVIRT),y) LDLIBS += -lrte_pmd_xenvirt endif +ifeq ($(CONFIG_RTE_LIBRTE_PMD_SOFTNIC),y) +LDLIBS += -lrte_pmd_softnic +endif + endif CFLAGS_cmdline.o := -D_GNU_SOURCE diff --git a/app/test-pmd/softnictm.c b/app/test-pmd/softnictm.c new file mode 100644 index 000..5d1553e --- /dev/null +++ b/app/test-pmd/softnictm.c @@ -0,0 +1,870 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2017 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "testpmd.h" + +#define SUBPORT_NODES_PER_PORT 1 +#define PIPE_NODES_PER_SUBPORT 4096 +#define TC_NODES_PER_PIPE 4 +#define QUEUE_NODES_PER_TC 4 + +#define NUM_PIPE_NODES \ + (SUBPORT_NODES_PER_PORT * PIPE_NODES_PER_SUBPORT) + +#define NUM_TC_NODES \ + (NUM_PIPE_NODES * TC_NODES_PER_PIPE) + +#define ROOT_NODE_ID 100 +#define SUBPORT_NODES_START_ID 90 +#define PIPE_NODES_START_ID80 +#define TC_NODES_START_ID 70 + +#define STATS_MASK_DEFAULT \ + (RTE_TM_STATS_N_PKTS | \ + RTE_TM_STATS_N_BYTES | \ + RTE_TM_STATS_N_PKTS_GREEN_DROPPED | \ + RTE_TM_STATS_N_BYTES_GREEN_DRO
[dpdk-dev] [PATCH 2/2] app/testpmd: add CLI for tm mode
Add following CLIs in testpmd application; - commands to build hierarchical tree for the QoS Scheduler. - commands for runtime update of the hierarchical tree. - commands to display TM capability information. (per port, per hierarchy level and per hierarchy node) - command to set the packet field mask and offset value for classification. - command to set traffic class translation table entry - stats collection Signed-off-by: Jasvinder Singh --- app/test-pmd/cmdline.c | 2975 1 file changed, 2785 insertions(+), 190 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index cd8c358..87a9c7b 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -98,6 +98,19 @@ #ifdef RTE_LIBRTE_BNXT_PMD #include #endif + +#if defined RTE_LIBRTE_PMD_SOFTNIC && defined RTE_LIBRTE_SCHED +#define SOFTNIC_TM_FEATURE 1 +#else +#define SOFTNIC_TM_FEATURE 0 +#endif + +#ifdef SOFTNIC_TM_FEATURE +#include +#include +#include +#endif + #include "testpmd.h" static struct cmdline *testpmd_cl; @@ -230,6 +243,23 @@ static void cmd_help_long_parsed(void *parsed_result, "clear vf stats (port_id) (vf_id)\n" "Reset a VF's statistics.\n\n" + +#ifdef SOFTNIC_TM_FEATURE + "show port tm cap (port_id)\n" + " Display the port TM capability.\n\n" + + "show port tm level cap (port_id) (level_id)\n" + " Display the port TM hierarchical level capability.\n\n" + + "show port tm node cap (port_id) (node_id)\n" + " Display the port TM node capability.\n\n" + + "show port tm node type (port_id) (node_id)\n" + " Display the port TM node type.\n\n" + + "show port tm node stats (port_id) (node_id) (clear)\n" + " Display the port TM node stats.\n\n" +#endif ); } @@ -637,6 +667,61 @@ static void cmd_help_long_parsed(void *parsed_result, "ptype mapping update (port_id) (hw_ptype) (sw_ptype)\n" "Update a ptype mapping item on a port\n\n" +#ifdef SOFTNIC_TM_FEATURE + "add port tm node shaper profile (port_id) (shaper_profile_id)" + " (tb_rate) (tb_size)\n" + " Add port tm node private shaper profile.\n\n" + + "del port tm node shaper profile (port_id) (shaper_profile_id)\n" + " Delete port tm node private shaper profile.\n\n" + + "add port tm node shared shaper (port_id) (shared_shaper_id)" + " (shaper_profile_id)\n" + " Add/update port tm node shared shaper.\n\n" + + "del port tm node shared shaper (port_id) (shared_shaper_id)\n" + " Delete port tm node shared shaper.\n\n" + + "add port tm node wred profile (port_id) (wred_profile_id)" + " (color_g) (min_th_g) (max_th_g) (maxp_inv_g) (wq_log2_g)" + " (color_y) (min_th_y) (max_th_y) (maxp_inv_y) (wq_log2_y)" + " (color_r) (min_th_r) (max_th_r) (maxp_inv_r) (wq_log2_r)\n" + " Add port tm node wred profile.\n\n" + + "del port tm node wred profile (port_id) (wred_profile_id)\n" + " Delete port tm node wred profile.\n\n" + + "add port tm nonleaf node shared shaper (port_id) (node_id)" + " (parent_node_id) (priority) (weight) (level_id)" + " (shaper_profile_id) (shared_shaper_id) (n_shared_shapers)" + " (n_sp_priorities)\n" + " Add port tm nonleaf node.\n\n" + + "add port tm leaf node shared shaper (port_id) (node_id)" + " (parent_node_id) (priority) (weight) (level_id)" + " (cman_mode) (wred_profile_id)\n" + " Add port tm leaf node.\n\n" + + "del port tm node (port_id) (node_id)\n" + " Delete port tm node.\n\n" + +#ifdef RTE_SCHED_SUBPORT_TC_OV + "set port tm node parent (port_id) (node_id) (parent_node_id)" + " (priority) (weight)\n" + " Set port tm node parent.\n\n" +#endif + "set port tm node shaper profile (port_id) (node_id)" + " (shaper_profile_id)\n" + " Set port tm node shaper profile.\n\n" + + "set port tm pktfield (subport|pipe|tc) (port_id) offset" +
Re: [dpdk-dev] [PATCH] igb_uio: MSI IRQ mode, irq enable/disable refactored
On Mon, 21 Aug 2017 19:33:45 +0200 Markus Theil wrote: > This patch adds MSI IRQ mode and in a way, that should > also work on older kernel versions. The base for my patch > was an attempt to do this in cf705bc36c which was later reverted in > d8ee82745a. Compilation was tested on Linux 3.2, 4.10 and 4.12. > > MSI(X) setup was already using pci_alloc_irq_vectors before, > but calls to pci_free_irq_vectors were missing and added. > > Signed-off-by: Markus Theil I wonder if DPDK should only N-1 Long Term Stable kernel.org kernels? That would mean 4.4.83 or later now, and 4.9 or later starting with 18.XX releases. If enterprise distro's want to backport more, that is their prerogative but upstream DPDK shouldn't have to worry about it. The current mess with KNI especially is out of hand.
Re: [dpdk-dev] [PATCH] igb_uio: MSI IRQ mode, irq enable/disable refactored
On 22.08.2017 18:55, Stephen Hemminger wrote: > On Mon, 21 Aug 2017 19:33:45 +0200 > Markus Theil wrote: > >> This patch adds MSI IRQ mode and in a way, that should >> also work on older kernel versions. The base for my patch >> was an attempt to do this in cf705bc36c which was later reverted in >> d8ee82745a. Compilation was tested on Linux 3.2, 4.10 and 4.12. >> >> MSI(X) setup was already using pci_alloc_irq_vectors before, >> but calls to pci_free_irq_vectors were missing and added. >> >> Signed-off-by: Markus Theil > I wonder if DPDK should only N-1 Long Term Stable kernel.org kernels? > That would mean 4.4.83 or later now, and 4.9 or later starting with 18.XX > releases. > > If enterprise distro's want to backport more, that is their prerogative but > upstream > DPDK shouldn't have to worry about it. The current mess with KNI especially > is out > of hand. I don't rely on older kernels than N-1 LTS, but the former MSI patch was reverted because of such an issue. If there is consensus about this, adapting igb_uio to kernels >= 4.4 will be no problem. I'd then write a little patch series with that aim.
[dpdk-dev] link state change not consistent (Flaps/stays DOWN/UP for ever)
Hi All, With DPDK Stable 17.05 branch, we are seeing an issue with the link state of 10G ports on Broadwell. Link status of one of the ports keeps flapping at regular intervals, typically starts after few seconds after the link appeared to settle down. Other behaviors include not honoring a link state change set by the app. So far this is observed with ixgbe only, but have not yet explored if the issue is existing across drivers. Also this issue was not seen with 16.11.2. Is this a known issue? If yes, are there any patches that can fix it? Few patches (between 17.05 and 17.08) that were applied and not found any difference in behavior are below. net/ixgbe: improve link state check on VF In current implementation, when checking VF link state, PF state is checked too, although the function has a parameter to tell if PF state checking is needed. But in some scenario, user may not care about the PF state. This patch enables the unused parameter to only check the VF link state. net/ixgbe: fix LSC interrupt If LSC flag is changed to off at last device start, the enable flag is not cleared in HW. This patch fixes it. net/e1000: fix LSC interrupt If LSC flag is changed to off at last device start, the enable flag is not cleared in HW. This patch fixes it. ethdev: add deferred intermediate device state This device state means that the device is managed externally, by whichever party has set this state (PMD or application). Note: this new device state is only an information. The related device structure and operators are still valid and can be used normally. It is however made private by device management helpers within ethdev, making the device invisible to applications. ethdev: count devices consistently Make the rte_eth_dev_count() return the number of available devices even after some are detached by the hotplug API or put in a deferred state. net/ixgbe: fix Rx/Tx queue interrupt for x550 devices x550 devices don't map interrupt vector before enabling Rx/Tx queue interrupt. Because of this interrupt mode is not working for x550 devices. igb_uio: issue FLR during open and release of device file Set UIO info device file operations open and release. Call pci reset function inside open and release to clear device state at start and end. Copied this behaviour from vfio_pci kernel module code. With this patch, it is not mandatory to issue FLR by PMD's during init and close. Bus master enable and disable are added in open and release respectively to take care of device DMA. ethdev: fix device state on detach The device state should be handled by the ethdev layer when possible. Applications should not have to do it. Not setting the state to UNUSED will make the port_id of the device valid for all ethdev API functions, usually resulting in segfault.
[dpdk-dev] [PATCH] i40e: fix i40e_validate_mac_addr to permit multicast addresses
The i40e maintains a single MAC filter table for both unicast and multicast addresses. The i40e_validate_mac_addr function was preventing multicast addresses from being added to the table via i40evf_add_mac_addr. Fixed the issue by removing the multicast address check in i40e_validate_mac_addr. Signed-off-by: David Harton --- drivers/net/i40e/base/i40e_common.c | 12 +--- drivers/net/i40e/i40e_ethdev.c | 3 ++- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/i40e/base/i40e_common.c b/drivers/net/i40e/base/i40e_common.c index 900d379..9779854 100644 --- a/drivers/net/i40e/base/i40e_common.c +++ b/drivers/net/i40e/base/i40e_common.c @@ -969,10 +969,10 @@ struct i40e_rx_ptype_decoded i40e_ptype_lookup[] = { /** - * i40e_validate_mac_addr - Validate unicast MAC address + * i40e_validate_mac_addr - Validate MAC address * @mac_addr: pointer to MAC address * - * Tests a MAC address to ensure it is a valid Individual Address + * Tests a MAC address to ensure it is a valid Address **/ enum i40e_status_code i40e_validate_mac_addr(u8 *mac_addr) { @@ -980,13 +980,11 @@ enum i40e_status_code i40e_validate_mac_addr(u8 *mac_addr) DEBUGFUNC("i40e_validate_mac_addr"); - /* Broadcast addresses ARE multicast addresses -* Make sure it is not a multicast address + /* * Reject the zero address */ - if (I40E_IS_MULTICAST(mac_addr) || - (mac_addr[0] == 0 && mac_addr[1] == 0 && mac_addr[2] == 0 && - mac_addr[3] == 0 && mac_addr[4] == 0 && mac_addr[5] == 0)) + if (mac_addr[0] == 0 && mac_addr[1] == 0 && mac_addr[2] == 0 && + mac_addr[3] == 0 && mac_addr[4] == 0 && mac_addr[5] == 0) status = I40E_ERR_INVALID_MAC_ADDR; return status; diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 5f26e24..00b6082 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -1199,7 +1199,8 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) /* Get and check the mac address */ i40e_get_mac_addr(hw, hw->mac.addr); - if (i40e_validate_mac_addr(hw->mac.addr) != I40E_SUCCESS) { + if (i40e_validate_mac_addr(hw->mac.addr) != I40E_SUCCESS || + I40E_IS_MULTICAST(hw->mac.addr)) { PMD_INIT_LOG(ERR, "mac address is not valid"); ret = -EIO; goto err_get_mac_addr; -- 2.10.3.dirty
[dpdk-dev] [PATCH] net/ixgbe:fix some bugs about rte zmalloc memory may NULL
In the function ixgbe_flow_create(), the value ntuple_filter_ptr, ethertype_filter_ptr,syn_filter_ptr,fdir_rule_ptr and l2_tn_filter_ptr use rte_zmalloc() malloc memory may return NULL,so, we should add judge the return is NULL or success. Signed-off-by: Rongqiang XIE --- drivers/net/ixgbe/ixgbe_flow.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_flow.c b/drivers/net/ixgbe/ixgbe_flow.c index d679608..c8645f0 100644 --- a/drivers/net/ixgbe/ixgbe_flow.c +++ b/drivers/net/ixgbe/ixgbe_flow.c @@ -2707,6 +2707,10 @@ static inline uint8_t signature_match(const struct rte_flow_item pattern[]) if (!ret) { ntuple_filter_ptr = rte_zmalloc("ixgbe_ntuple_filter", sizeof(struct ixgbe_ntuple_filter_ele), 0); + if (!ntuple_filter_ptr) { + PMD_DRV_LOG(ERR, "failed to allocate memory"); + goto out; + } (void)rte_memcpy(&ntuple_filter_ptr->filter_info, &ntuple_filter, sizeof(struct rte_eth_ntuple_filter)); @@ -2729,6 +2733,10 @@ static inline uint8_t signature_match(const struct rte_flow_item pattern[]) ethertype_filter_ptr = rte_zmalloc( "ixgbe_ethertype_filter", sizeof(struct ixgbe_ethertype_filter_ele), 0); + if (!ethertype_filter_ptr) { + PMD_DRV_LOG(ERR, "failed to allocate memory"); + goto out; + } (void)rte_memcpy(ðertype_filter_ptr->filter_info, ðertype_filter, sizeof(struct rte_eth_ethertype_filter)); @@ -2749,6 +2757,10 @@ static inline uint8_t signature_match(const struct rte_flow_item pattern[]) if (!ret) { syn_filter_ptr = rte_zmalloc("ixgbe_syn_filter", sizeof(struct ixgbe_eth_syn_filter_ele), 0); + if (!syn_filter_ptr) { + PMD_DRV_LOG(ERR, "failed to allocate memory"); + goto out; + } (void)rte_memcpy(&syn_filter_ptr->filter_info, &syn_filter, sizeof(struct rte_eth_syn_filter)); @@ -2809,6 +2821,10 @@ static inline uint8_t signature_match(const struct rte_flow_item pattern[]) if (!ret) { fdir_rule_ptr = rte_zmalloc("ixgbe_fdir_filter", sizeof(struct ixgbe_fdir_rule_ele), 0); + if (!fdir_rule_ptr) { + PMD_DRV_LOG(ERR, "failed to allocate memory"); + goto out; + } (void)rte_memcpy(&fdir_rule_ptr->filter_info, &fdir_rule, sizeof(struct ixgbe_fdir_rule)); @@ -2842,6 +2858,10 @@ static inline uint8_t signature_match(const struct rte_flow_item pattern[]) if (!ret) { l2_tn_filter_ptr = rte_zmalloc("ixgbe_l2_tn_filter", sizeof(struct ixgbe_eth_l2_tunnel_conf_ele), 0); + if (!l2_tn_filter_ptr) { + PMD_DRV_LOG(ERR, "failed to allocate memory"); + goto out; + } (void)rte_memcpy(&l2_tn_filter_ptr->filter_info, &l2_tn_filter, sizeof(struct rte_eth_l2_tunnel_conf)); -- 1.8.3.1
[dpdk-dev] [PATCH] ena: fix init of ena pci_dev info
eth_ena_dev_init() was not initializing all of the common pci dev info for the rte_eth_dev. Added call to rte_eth_copy_pci_info() to complete the init particularly the driver name. Signed-off-by: David Harton --- drivers/net/ena/ena_ethdev.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 80ce1f3..a6c408b 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -1289,6 +1289,7 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev) return 0; pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev); + rte_eth_copy_pci_info(eth_dev, pci_dev); adapter->pdev = pci_dev; PMD_INIT_LOG(INFO, "Initializing %x:%x:%x.%d", -- 1.8.3.1
Re: [dpdk-dev] link state change not consistent (Flaps/stays DOWN/UP for ever)
We are seeing similar behavior with testPmd as well. 1. Keep port 0/1/2/3 in Up/Up/Down/Down status to start the actual test. 1. Following commands will do link status change twice on each port. Hence we should see the same status as in (1). 1. Testpmd: port stop 0;port start 0;port stop 1;port start 1;port start 2; port stop 2; port start 3;port stop 3 1. Repeat step 2, 3 -5 times and app gets into fault state and never recovers (One of the 10G ports, goes into "DOWN" state and never comes UP again". Thanks -Avinash From: Yeddula, Avinash Sent: Tuesday, August 22, 2017 3:08 PM To: dev@dpdk.org Cc: Gajarampalli, Prasanth Subject: link state change not consistent (Flaps/stays DOWN/UP for ever) Hi All, With DPDK Stable 17.05 branch, we are seeing an issue with the link state of 10G ports on Broadwell. Link status of one of the ports keeps flapping at regular intervals, typically starts after few seconds after the link appeared to settle down. Other behaviors include not honoring a link state change set by the app. So far this is observed with ixgbe only, but have not yet explored if the issue is existing across drivers. Also this issue was not seen with 16.11.2. Is this a known issue? If yes, are there any patches that can fix it? Few patches (between 17.05 and 17.08) that were applied and not found any difference in behavior are below. net/ixgbe: improve link state check on VF In current implementation, when checking VF link state, PF state is checked too, although the function has a parameter to tell if PF state checking is needed. But in some scenario, user may not care about the PF state. This patch enables the unused parameter to only check the VF link state. net/ixgbe: fix LSC interrupt If LSC flag is changed to off at last device start, the enable flag is not cleared in HW. This patch fixes it. net/e1000: fix LSC interrupt If LSC flag is changed to off at last device start, the enable flag is not cleared in HW. This patch fixes it. ethdev: add deferred intermediate device state This device state means that the device is managed externally, by whichever party has set this state (PMD or application). Note: this new device state is only an information. The related device structure and operators are still valid and can be used normally. It is however made private by device management helpers within ethdev, making the device invisible to applications. ethdev: count devices consistently Make the rte_eth_dev_count() return the number of available devices even after some are detached by the hotplug API or put in a deferred state. net/ixgbe: fix Rx/Tx queue interrupt for x550 devices x550 devices don't map interrupt vector before enabling Rx/Tx queue interrupt. Because of this interrupt mode is not working for x550 devices. igb_uio: issue FLR during open and release of device file Set UIO info device file operations open and release. Call pci reset function inside open and release to clear device state at start and end. Copied this behaviour from vfio_pci kernel module code. With this patch, it is not mandatory to issue FLR by PMD's during init and close. Bus master enable and disable are added in open and release respectively to take care of device DMA. ethdev: fix device state on detach The device state should be handled by the ethdev layer when possible. Applications should not have to do it. Not setting the state to UNUSED will make the port_id of the device valid for all ethdev API functions, usually resulting in segfault.
[dpdk-dev] [PATCH] ethdev: stop overriding rx_nombuf by rte_eth_stats_get
rte_eth_stats_get() unconditonally would set rx_nombuf even if the device was setting the value. A check has been added in rte_eth_stats_get() to leave the device value in-tact when non-zero. Signed-off-by: David Harton --- lib/librte_ether/rte_ethdev.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 0597641..0319e39 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1336,8 +1336,11 @@ struct rte_eth_dev * memset(stats, 0, sizeof(*stats)); RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP); - stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed; (*dev->dev_ops->stats_get)(dev, stats); + /* only set rx_nombuf if not set by the device */ + if (!stats->rx_nombuf) { + stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed; + } return 0; } -- 1.8.3.1
[dpdk-dev] [PATCH] ixgbe: initialize scattered_rx during dev_configure
An application may want to manipulate the MTU settings of a device without having to start the device first. In order to remove the need to start the device the ixgbe/ixgbevf drivers need to initialize the scattered_rx value during dev_configure. Signed-off-by: David Harton --- drivers/net/ixgbe/ixgbe_ethdev.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 22171d8..e85bdb4 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -2372,6 +2372,13 @@ static int eth_ixgbevf_pci_remove(struct rte_pci_device *pci_dev) intr->flags |= IXGBE_FLAG_NEED_LINK_UPDATE; /* +* Update scattered_rx so we can update MTU immediately +* following configure without having to start the device +*/ + if (dev->data->dev_conf.rxmode.enable_scatter) + dev->data->scattered_rx = 1; + + /* * Initialize to TRUE. If any of Rx queues doesn't meet the bulk * allocation or vector Rx preconditions we will reset it. */ @@ -4949,6 +4956,13 @@ static int ixgbevf_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev, #endif /* +* Update scattered_rx so we can update MTU immediately +* following configure without having to start the device +*/ + if (dev->data->dev_conf.rxmode.enable_scatter) + dev->data->scattered_rx = 1; + + /* * Initialize to TRUE. If any of Rx queues doesn't meet the bulk * allocation or vector Rx preconditions we will reset it. */ -- 1.8.3.1
[dpdk-dev] [PATCH] lib/lib_eal:add mellanox kernel driver type
When use bond function in mellanox driver environment, we call the find_port_id_by_pci_addr() function,if we don't add mellanox kernel driver type in enum rte_kernel_driver, the function will return -1 because kdrv unknow, so we add the mellanox driver type, and when scan the pci, fill the kdrv to fix this problem. Signed-off-by: Rongqiang XIE --- lib/librte_eal/common/include/rte_dev.h | 2 ++ lib/librte_eal/linuxapp/eal/eal_pci.c | 4 2 files changed, 6 insertions(+) diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 5386d3a..067ad07 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -123,6 +123,8 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, + RTE_KDRV_MLX4, + RTE_KDRV_MLX5, RTE_KDRV_NONE, }; diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 8951ce7..31c8ec1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -349,6 +349,10 @@ dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; + else if (!strcmp(driver, "mlx4_core")) + dev->kdrv = RTE_KDRV_MLX4; + else if (!strcmp(driver, "mlx5_core")) + dev->kdrv = RTE_KDRV_MLX5; else dev->kdrv = RTE_KDRV_UNKNOWN; } else -- 1.8.3.1
[dpdk-dev] [PATCH] lib/lib_eal:add mellanox kernel driver type
When use bond function in mellanox driver environment, we call the find_port_id_by_pci_addr() function,if we don't add mellanox kernel driver type in enum rte_kernel_driver, the function will return -1 because kdrv unknown, so we add the mellanox driver type, and when scan the pci, fill the kdrv to fix this problem. Signed-off-by: Rongqiang XIE --- lib/librte_eal/common/include/rte_dev.h | 2 ++ lib/librte_eal/linuxapp/eal/eal_pci.c | 4 2 files changed, 6 insertions(+) diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 5386d3a..067ad07 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -123,6 +123,8 @@ enum rte_kernel_driver { RTE_KDRV_VFIO, RTE_KDRV_UIO_GENERIC, RTE_KDRV_NIC_UIO, + RTE_KDRV_MLX4, + RTE_KDRV_MLX5, RTE_KDRV_NONE, }; diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index 8951ce7..31c8ec1 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -349,6 +349,10 @@ dev->kdrv = RTE_KDRV_IGB_UIO; else if (!strcmp(driver, "uio_pci_generic")) dev->kdrv = RTE_KDRV_UIO_GENERIC; + else if (!strcmp(driver, "mlx4_core")) + dev->kdrv = RTE_KDRV_MLX4; + else if (!strcmp(driver, "mlx5_core")) + dev->kdrv = RTE_KDRV_MLX5; else dev->kdrv = RTE_KDRV_UNKNOWN; } else -- 1.8.3.1
[dpdk-dev] [PATCH v2 1/2] ethdev: stop overriding rx_nombuf by rte_eth_stats_get
rte_eth_stats_get() unconditonally would set rx_nombuf even if the device was setting the value. A check has been added in rte_eth_stats_get() to leave the device value in-tact when non-zero. Signed-off-by: David Harton --- v2: Fixed braces complaint required by other coding standards. lib/librte_ether/rte_ethdev.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index 0597641..0a1d3b8 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1336,8 +1336,11 @@ struct rte_eth_dev * memset(stats, 0, sizeof(*stats)); RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP); - stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed; (*dev->dev_ops->stats_get)(dev, stats); + /* only set rx_nombuf if not set by the device */ + if (!stats->rx_nombuf) + stats->rx_nombuf = dev->data->rx_mbuf_alloc_failed; + return 0; } -- 1.8.3.1
Re: [dpdk-dev] [PATCH 0/7] Add Membership Library
Hi Stephen, Thanks for the comments. We will remove the needless return initialization in next version. For the unified API concern, we think the current implementation has two benefits: 1. It is easier to extend the library with new types of filters without adding a lot of top level APIs every time. 2. When users switch between different types in their application, they do not need to switch between function calls. They can reuse same code for different types. However, we agree with you that a switch case in the library affects the performance. We did a quick test and here is the results (cycles/operation): lookuplookup_bulk(16 batch) lookup_multi lookup_multi_bulk switch_case 50 36 5745 direct call 4735 5445 There will be 3 cycle difference for non-bulk version lookup. I guess if users usually use bulk version, it maybe not a big concern, but single key lookup indeed slower. If you think the benefit we mentioned does not outweigh the performance slowdown, we would like to make the change. We also considered using function pointers to get rid of switch case, but it will be unfriendly to the multi-process usages. Please comments. Thanks Yipeng > -Original Message- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Monday, August 21, 2017 9:02 PM > To: Wang, Yipeng1 > Cc: vincent.jar...@6wind.com; Richardson, Bruce > ; Ananyev, Konstantin > ; tho...@monjalon.net; dev@dpdk.org; Tai, > Charlie ; Gobriel, Sameh ; > Wang, Ren > Subject: Re: [PATCH 0/7] Add Membership Library > > On Mon, 21 Aug 2017 17:19:46 -0700 > Yipeng Wang wrote: > > > This patch set implements two types of set-summaries, i.e., hash-table > > based set-summary (HTSS) and Vector Bloom Filter (vBF). HTSS supports > > both the non-cache and cache modes. The non-cache mode can incur a > > small chance of false-positives which is the case when the set-summary > > indicates a key belongs to a given set while actually it is not. The > > cache mode can also have false-negatives in addition to > > false-positives. False-negatives means the case when the set-summary > > indicates a key does not belong to a given set while actually it does. > > This happens because cache mode allows new key to evict existing keys. vBF > only has false-positives similar to the non-cache HTSS. > > However, one can set the false-positive rate arbitrarily. HTSS's > > false-positive rate is determined by the hash-table size and the signature > > size. > > I don't think it makes sense to merge two different types of tables in one > API. > Especially in DPDK where every cycle counts. You are taking an extra branch on > each lookup. The user of this API is likely to know exactly what type of > objects > and look are desired.
[dpdk-dev] [PATCH] examples/l2fwd_fork: fix messaage pool init
rte_pktmbuf_pool_init and rte_pktmbuf_init callback caused memory corruption on a message memory pool, remove both. On the other hand, add rte_pktmbuf_pool assertion of private data size in function rte_pktmbuf_pool_init() to avoid initializing none mbuf memory pool. Fixes: 95e8005a56e8 ("examples/l2fwd_fork: new app") Cc: Sergio Gonzalez Monroy Cc: Olivier Matz Signed-off-by: Xueming Li --- examples/multi_process/l2fwd_fork/main.c | 5 + lib/librte_mbuf/rte_mbuf.c | 2 ++ 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/examples/multi_process/l2fwd_fork/main.c b/examples/multi_process/l2fwd_fork/main.c index f8a626ba7..2e70c2faf 100644 --- a/examples/multi_process/l2fwd_fork/main.c +++ b/examples/multi_process/l2fwd_fork/main.c @@ -1204,10 +1204,7 @@ main(int argc, char **argv) message_pool = rte_mempool_create("ms_msg_pool", NB_CORE_MSGBUF * RTE_MAX_LCORE, sizeof(enum l2fwd_cmd), NB_CORE_MSGBUF / 2, - 0, - rte_pktmbuf_pool_init, NULL, - rte_pktmbuf_init, NULL, - rte_socket_id(), 0); + 0, NULL, NULL, NULL, NULL, rte_socket_id(), 0); if (message_pool == NULL) rte_exit(EXIT_FAILURE, "Create msg mempool failed\n"); diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index 26a62b8e1..aa924fde6 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -88,6 +88,8 @@ rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) uint16_t roomsz; RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf)); + RTE_ASSERT(mp->private_data_size == ((sizeof(*mbp_priv) + + RTE_MEMPOOL_ALIGN_MASK) & (~RTE_MEMPOOL_ALIGN_MASK))); /* if no structure is provided, assume no mbuf private area */ user_mbp_priv = opaque_arg; -- 2.13.3
[dpdk-dev] [PATCH] eal/malloc: fix malloc cookie check
From: xuemingl DPDK uses it's own memory management, few regular memory profiler tool support DPDK now. Malloc cookie check provides limited memory corruption check, better than nothing. This patch fixes the following: 1. Replace broken generated configuration macro RTE_LIBRTE_MALLOC_DEBUG with RTE_MALLOC_DEBUG 2. Fix malloc size calculation when RTE_MALLOC_DEBUG cookie check enabled. >From real test, it IS very helpful to detect memory corruption. A better designed DPDK application should quit nicely with resoure releasing so that all allocated memory could be checked against cookie. Fixes: af75078 ("first public release") Cc: Sergio Gonzalez Monroy Signed-off-by: Xueming Li --- lib/librte_eal/common/malloc_elem.c | 8 lib/librte_eal/common/malloc_elem.h | 4 ++-- test/test/test_malloc.c | 10 +- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c index 150769057..889dffd21 100644 --- a/lib/librte_eal/common/malloc_elem.c +++ b/lib/librte_eal/common/malloc_elem.c @@ -275,14 +275,14 @@ malloc_elem_free(struct malloc_elem *elem) return -1; rte_spinlock_lock(&(elem->heap->lock)); - size_t sz = elem->size - sizeof(*elem); + size_t sz = elem->size - sizeof(*elem) - MALLOC_ELEM_TRAILER_LEN; uint8_t *ptr = (uint8_t *)&elem[1]; struct malloc_elem *next = RTE_PTR_ADD(elem, elem->size); if (next->state == ELEM_FREE){ /* remove from free list, join to this one */ elem_free_list_remove(next); join_elem(elem, next); - sz += sizeof(*elem); + sz += (sizeof(*elem) + MALLOC_ELEM_TRAILER_LEN); } /* check if previous element is free, if so join with it and return, @@ -291,8 +291,8 @@ malloc_elem_free(struct malloc_elem *elem) if (elem->prev != NULL && elem->prev->state == ELEM_FREE) { elem_free_list_remove(elem->prev); join_elem(elem->prev, elem); - sz += sizeof(*elem); - ptr -= sizeof(*elem); + sz += (sizeof(*elem) + MALLOC_ELEM_TRAILER_LEN); + ptr -= (sizeof(*elem) + MALLOC_ELEM_TRAILER_LEN); elem = elem->prev; } malloc_elem_free_list_insert(elem); diff --git a/lib/librte_eal/common/malloc_elem.h b/lib/librte_eal/common/malloc_elem.h index f04b2d1e4..ce39129d9 100644 --- a/lib/librte_eal/common/malloc_elem.h +++ b/lib/librte_eal/common/malloc_elem.h @@ -53,13 +53,13 @@ struct malloc_elem { volatile enum elem_state state; uint32_t pad; size_t size; -#ifdef RTE_LIBRTE_MALLOC_DEBUG +#ifdef RTE_MALLOC_DEBUG uint64_t header_cookie; /* Cookie marking start of data */ /* trailer cookie at start + size */ #endif } __rte_cache_aligned; -#ifndef RTE_LIBRTE_MALLOC_DEBUG +#ifndef RTE_MALLOC_DEBUG static const unsigned MALLOC_ELEM_TRAILER_LEN = 0; /* dummy function - just check if pointer is non-null */ diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c index 013fd4407..5558acda4 100644 --- a/test/test/test_malloc.c +++ b/test/test/test_malloc.c @@ -108,7 +108,7 @@ test_align_overlap_per_lcore(__attribute__((unused)) void *arg) } for(j = 0; j < 1000 ; j++) { if( *(char *)p1 != 0) { - printf("rte_zmalloc didn't zero" + printf("rte_zmalloc didn't zero " "the allocated memory\n"); ret = -1; } @@ -180,7 +180,7 @@ test_reordered_free_per_lcore(__attribute__((unused)) void *arg) } for(j = 0; j < 1000 ; j++) { if( *(char *)p1 != 0) { - printf("rte_zmalloc didn't zero" + printf("rte_zmalloc didn't zero " "the allocated memory\n"); ret = -1; } @@ -293,7 +293,7 @@ test_multi_alloc_statistics(void) struct rte_malloc_socket_stats pre_stats, post_stats ,first_stats, second_stats; size_t size = 2048; int align = 1024; -#ifndef RTE_LIBRTE_MALLOC_DEBUG +#ifndef RTE_MALLOC_DEBUG int trailer_size = 0; #else int trailer_size = RTE_CACHE_LINE_SIZE; @@ -623,7 +623,7 @@ test_rte_malloc_validate(void) const size_t request_size = 1024; size_t allocated_size; char *data_ptr = rte_malloc(NULL, request_size, RTE_CACHE_LINE_SIZE); -#ifdef RTE_LIBRTE_MALLOC_DEBUG +#ifdef RTE_MALLOC_DEBUG int retval; char *over_write_vals = NULL; #endif @@ -645,7 +645,7 @@ test_rte_malloc_validate(void) if (allocated_size < request_size) err_return(); -#i
Re: [dpdk-dev] DPDK qos support for 40G port
Thanks, Cristian -Original Message- From: Dumitrescu, Cristian [mailto:cristian.dumitre...@intel.com] Sent: Friday, August 18, 2017 7:24 PM To: Kevin Yan ; dev@dpdk.org Subject: RE: DPDK qos support for 40G port Hi Kevin, > Hi Cristian, > Sorry to bother again, could you give suggestions/hints of code change > to support single 40G port? Because in our setup, we will use single > 40G port (Intel XL710) as the network interface. > > Or is there any workaround to bypass the limitation?(we are not > willing to use 4*10G setup) > > Thanks. > Probably the easiest thing to do as workaround to support single port of 40GbE rate is to change the code so that each credit is equivalent to 2 bytes instead of one. This is likely to result in some scheduling accuracy loss, but it can be implemented relatively quickly while avoiding complex code changes. Regards, Cristian Please Note: My email address is changing. Starting May 1st 2017 my email will solely be my Mavenir email firstname.lastn...@mavenir.com. All other prior email accounts will become inactive. To ensure continuity, please send all emails to my Mavenir email ID which is currently active and available for use. This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to secur...@mavenir.com. Thank You. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects.
Re: [dpdk-dev] [PATCH] ena: fix init of ena pci_dev info
Acked-by: Michal Krawczyk 2017-08-23 2:41 GMT+02:00 David Harton : > eth_ena_dev_init() was not initializing all of the common > pci dev info for the rte_eth_dev. Added call to > rte_eth_copy_pci_info() to complete the init particularly > the driver name. > > Signed-off-by: David Harton > --- > drivers/net/ena/ena_ethdev.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c > index 80ce1f3..a6c408b 100644 > --- a/drivers/net/ena/ena_ethdev.c > +++ b/drivers/net/ena/ena_ethdev.c > @@ -1289,6 +1289,7 @@ static int eth_ena_dev_init(struct rte_eth_dev > *eth_dev) > return 0; > > pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev); > + rte_eth_copy_pci_info(eth_dev, pci_dev); > adapter->pdev = pci_dev; > > PMD_INIT_LOG(INFO, "Initializing %x:%x:%x.%d", > -- > 1.8.3.1 > >
Re: [dpdk-dev] [RFC PATCH 0/4] ethdev new offloads API
Hi, I would like to get some inputs on the below. This is a big (and important) work which I want to include on 17.11. I need to understand the current approach is acceptable before I continue. Monday, August 7, 2017 1:54 PM, Shahaf Shuler: > Tx offloads configuration is per queue. Tx offloads are enabled by default, > and can be disabled using ETH_TXQ_FLAGS_NO* flags. > This behaviour is not consistent with the Rx side where the Rx offloads > configuration is per port. Rx offloads are disabled by default and enabled > according to bit field in rte_eth_rxmode structure. > > Moreover, considering more Tx and Rx offloads will be added over time, the > cost of managing them all inside the PMD will be tremendous, as the PMD > will need to check the matching for the entire offload set for each mbuf it > handles. > In addition, on the current approach each Rx offload added breaks the ABI > compatibility as it requires to add entries to existing bit-fields. > > The RFC address above issues by defining a new offloads API. > With the new API, Tx and Rx offloads configuration is per queue. > The offloads are disabled by default. Each offload can be enabled or disabled > using the existing DEV_TX_OFFLOADS_* or DEV_RX_OFFLOADS_* flags. > Such API will enable to easily add or remove offloads, without breaking the > ABI compatibility. > > The new API does not have an equivalent for the below Tx flags: > > * ETH_TXQ_FLAGS_NOREFCOUNT > * ETH_TXQ_FLAGS_NOMULTMEMP > > The reason is that those flags are not to manage offloads, rather some > guarantee from application on the way it uses mbufs, therefore could not be > present as part of DEV_TX_OFFLOADS_*. > Such flags are useful only for benchmarks, and therefore provide a non- > realistic > performance for DPDK customers using simple benchmarks for evaluation. > Leveraging the work being done in this series to clean up those flags. > > In order to provide a smooth transition between the APIs the following > actions were taken: > * The old offloads API is kept for the meanwhile. > * New capabilities were added for PMD to advertize it has moved to the > new >offloads API. > * Helper function which copy from old to new API and vice versa were > added to ethdev, >enabling the PMD to support only one of the APIs, and the application to > move to >the new API regardless the underlying device and without extra branching. > > Shahaf Shuler (4): > ethdev: rename Rx and Tx configuration structs > ethdev: introduce Rx queue offloads API > ethdev: introduce Tx queue offloads API > ethdev: add helpers to move to the new offloads API > > lib/librte_ether/rte_ethdev.c | 144 > - > lib/librte_ether/rte_ethdev.h | 72 +++ > 2 files changed, 202 insertions(+), 14 deletions(-) > > -- > 2.12.0