[dpdk-dev] ip_pipeline firewall : fragmented packets filtering
Hi All Filtering based on TCP/UDP fields like src/dest port range works correctly only on non-fragmented packets , that means reassembly must be done before packets hit firewall rules table. Also packets must be fragmented before transmission if larger than port mtu. This is unsupported currently, any plans for this in near future? Regards Shyam
Re: [dpdk-dev] [PATCH v3 1/6] net/tap: add MAC address management ops
On Thu, 9 Mar 2017 14:05:51 + Ferruh Yigit wrote: > On 3/7/2017 4:31 PM, Pascal Mazon wrote: > > Set a random MAC address when probing the device, as to not leave an > > empty MAC in pmd->eth_addr. > > > > This MAC will be set on the tap netdevice as soon as it's been > > created using tun_alloc(). As the tap_mac_add() function depend on > > the fd in the first rxq, move code from tun_alloc() to > > tap_setup_queue(), after it's been set. > > > > Signed-off-by: Pascal Mazon > > --- > > doc/guides/nics/features/tap.ini | 1 + > > drivers/net/tap/rte_eth_tap.c| 97 > > ++-- 2 files changed, 85 > > insertions(+), 13 deletions(-) > > > > diff --git a/doc/guides/nics/features/tap.ini > > b/doc/guides/nics/features/tap.ini index f4aca6921ddc..d9b47a003654 > > 100644 --- a/doc/guides/nics/features/tap.ini > > +++ b/doc/guides/nics/features/tap.ini > > @@ -9,6 +9,7 @@ Jumbo frame = Y > > Promiscuous mode = Y > > Allmulticast mode= Y > > Basic stats = Y > > +Unicast MAC filter = Y > > Other kdrv = Y > > ARMv7= Y > > ARMv8= Y > > diff --git a/drivers/net/tap/rte_eth_tap.c > > b/drivers/net/tap/rte_eth_tap.c index ece3a5fcc897..1e46ee36efa2 > > 100644 --- a/drivers/net/tap/rte_eth_tap.c > > +++ b/drivers/net/tap/rte_eth_tap.c > > @@ -63,6 +63,8 @@ > > #define RTE_PMD_TAP_MAX_QUEUES 1 > > #endif > > > > +#define RTE_PMD_TAP_MAX_MAC_ADDRS 1 > > mac_addr_add and mac_addr_remove not really supported, because only > one MAC is supported. For mac_addr_add() all indexes other than 0 > will give an error. So only mac_addr_set is supported. > > For this case what is the benefit of implementing these functions and > claim support, instead of just leaving mac_addr_add and > mac_addr_remove NULL? > Well, I wanted to implement those along with the mac_addr_set, as they all dealt with mac addresses. But you're right, I might as well leave the ops NULL. I'll send a new version reflecting this. > > <...> > > > + if (qid == 0) { > > + /* > > +* tap_setup_queue() is called for both tx and rx. > > +* Let's use dev->data->r/tx_queues[qid] to > > determine if init > > +* has already been done. > > +*/ > > + if (dev->data->rx_queues[qid] && > > dev->data->tx_queues[qid]) > > + return fd; > > + > > + tap_mac_set(dev, &pmd->eth_addr); > > What is the reason of changing behavior here? > > Tap devices assigned random MAC by kernel, and previous implementation > was reading that value and using it for DPDK. > > Now kernel assigns a random MAC, and DPDK overwrites it another random > MAC, previous implementation was simpler I think. > > It is OK to move this code tap_setup_queue(), I just missed the > benefit of overwriting with DPDK random MAC? > > <...> As far as I remember, I did it because somewhere the mac_addr_set was checked as part of reconfiguration, which happenened before queue setup was done. The default mac address (dev->data->mac_addrs[0]) got set to 0 and later call for the default mac address tried using this mac address. Or something along those lines. I'll definitely re-take a closer look at all this for my next version. Regards, Pascal
Re: [dpdk-dev] [PATCH v3 2/6] net/tap: add speed capabilities
On Thu, 9 Mar 2017 16:05:47 + Ferruh Yigit wrote: > On 3/9/2017 2:36 PM, Wiles, Keith wrote: > > > >> On Mar 9, 2017, at 8:18 AM, Yigit, Ferruh > >> wrote: > >> > >> On 3/7/2017 4:31 PM, Pascal Mazon wrote: > >>> Tap PMD is flexible, it supports any speed. > >>> > >>> Signed-off-by: Pascal Mazon > >>> --- > >>> doc/guides/nics/features/tap.ini | 1 + > >>> drivers/net/tap/rte_eth_tap.c| 35 > >>> +++ 2 files changed, 36 > >>> insertions(+) > >>> > >>> diff --git a/doc/guides/nics/features/tap.ini > >>> b/doc/guides/nics/features/tap.ini index > >>> d9b47a003654..dad5a0561087 100644 --- > >>> a/doc/guides/nics/features/tap.ini +++ > >>> b/doc/guides/nics/features/tap.ini @@ -9,6 +9,7 @@ Jumbo > >>> frame = Y Promiscuous mode = Y > >>> Allmulticast mode= Y > >>> Basic stats = Y > >>> +Speed capabilities = Y > >>> Unicast MAC filter = Y > >>> Other kdrv = Y > >>> ARMv7= Y > >>> diff --git a/drivers/net/tap/rte_eth_tap.c > >>> b/drivers/net/tap/rte_eth_tap.c index 1e46ee36efa2..ef525a3f0826 > >>> 100644 --- a/drivers/net/tap/rte_eth_tap.c > >>> +++ b/drivers/net/tap/rte_eth_tap.c > >>> @@ -351,6 +351,40 @@ tap_dev_configure(struct rte_eth_dev *dev > >>> __rte_unused) return 0; > >>> } > >>> > >>> +static uint32_t > >>> +tap_dev_speed_capa(void) > >>> +{ > >>> + uint32_t speed = pmd_link.link_speed; > >> > >> link_speed is already hardcoded into PMD, so there is nothing to > >> detect here. Would it be different if PMD directly return > >> pmd_link.link_speed? > > > > The link speed is passed into the PMD via the command line, which > > means it can change per run. > > Right, I missed that. I'll use switch/case in the next version in any case. But yes, as Keith said speed is a runtime option. Regards, Pascal
Re: [dpdk-dev] [PATCH v3 2/6] net/tap: add speed capabilities
On Fri, 10 Mar 2017 10:03:12 +0100 Pascal Mazon wrote: > On Thu, 9 Mar 2017 16:05:47 + > Ferruh Yigit wrote: > > > On 3/9/2017 2:36 PM, Wiles, Keith wrote: > > > > > >> On Mar 9, 2017, at 8:18 AM, Yigit, Ferruh > > >> wrote: > > >> > > >> On 3/7/2017 4:31 PM, Pascal Mazon wrote: > > >>> Tap PMD is flexible, it supports any speed. > > >>> > > >>> Signed-off-by: Pascal Mazon > > >>> --- > > >>> doc/guides/nics/features/tap.ini | 1 + > > >>> drivers/net/tap/rte_eth_tap.c| 35 > > >>> +++ 2 files changed, 36 > > >>> insertions(+) > > >>> > > >>> diff --git a/doc/guides/nics/features/tap.ini > > >>> b/doc/guides/nics/features/tap.ini index > > >>> d9b47a003654..dad5a0561087 100644 --- > > >>> a/doc/guides/nics/features/tap.ini +++ > > >>> b/doc/guides/nics/features/tap.ini @@ -9,6 +9,7 @@ Jumbo > > >>> frame = Y Promiscuous mode = Y > > >>> Allmulticast mode= Y > > >>> Basic stats = Y > > >>> +Speed capabilities = Y > > >>> Unicast MAC filter = Y > > >>> Other kdrv = Y > > >>> ARMv7= Y > > >>> diff --git a/drivers/net/tap/rte_eth_tap.c > > >>> b/drivers/net/tap/rte_eth_tap.c index 1e46ee36efa2..ef525a3f0826 > > >>> 100644 --- a/drivers/net/tap/rte_eth_tap.c > > >>> +++ b/drivers/net/tap/rte_eth_tap.c > > >>> @@ -351,6 +351,40 @@ tap_dev_configure(struct rte_eth_dev *dev > > >>> __rte_unused) return 0; > > >>> } > > >>> > > >>> +static uint32_t > > >>> +tap_dev_speed_capa(void) > > >>> +{ > > >>> + uint32_t speed = pmd_link.link_speed; > > >> > > >> link_speed is already hardcoded into PMD, so there is nothing to > > >> detect here. Would it be different if PMD directly return > > >> pmd_link.link_speed? > > > > > > The link speed is passed into the PMD via the command line, which > > > means it can change per run. > > > > Right, I missed that. > > I'll use switch/case in the next version in any case. > But yes, as Keith said speed is a runtime option. Sorry, I've sent that mail a little too quick. Of course I can't use switch/case as we're not checking exact values matching. > > Regards, > Pascal
Re: [dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD
On Thu, Mar 09, 2017 at 09:15:14AM +, Bruce Richardson wrote: On Wed, Mar 08, 2017 at 11:54:02AM -0500, Neil Horman wrote: On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote: > This PMD intercepts and manages Ethernet device removal events issued by > slave PMDs and re-initializes them transparently when brought back so that > existing applications do not need to be modified to benefit from true > hot-plugging support. > > The stacked PMD approach shares many similarities with the bonding PMD but > with a different purpose. While bonding provides the ability to group > several links into a single logical device for enhanced throughput and > supports fail-over at link level, this one manages the sudden disappearance > of the underlying device; it guarantees applications face a valid device in > working order at all times. > Why not just add this feature to the bonding pmd then? A bond is perfectly capable of handling the trivial case of a single underlying device, and adding an option to make the underly slave 'persistent' seem both much simpler in terms of implementation and code size, than adding an entire new pmd, along with its supporting code. Neil @Neil I don't know if you saw my answer to Bruce on the matter [1], it partially adresses your point. +1 I don't like the idea of having multiple PMDs in DPDK to handle combining multiple other devices into one. /Bruce I understand the concern. Let's first put aside for the moment the link grouping, which is only part of the fail-safe PMD function. The fail-safe PMD at its core, provides an alternative paradigm, a new proposal for a hot-plug functionality in a lightweight form-factor from a user standpoint. The central question that I would like to tackle is this: why should we require from our users declaring a bonding device to have hot-plug support? I took some time to illustrate a few modes of operation: Fig. 1 .-. | application | `--.--' | .'-.-. <-- init, conf, Rx/Tx | | | | .---|--.--|--. <--- conf, link check, Rx/Tx | | | | | | v | v v v v .-. | .---. .--. | bonding | | | ixgbe | | mlx4 | `.' | `---' `--' | | `--' Typical link fail-over. Fig. 2 .-. | application | `--.--' | < init, conf, Rx/Tx v .---. | fail-safe | `-.-' | .---'. <--- init, conf, dev check, Rx/Tx || vv .---. .--. | ixgbe | | mlx4 | `---' `--' Typical automatic hot-plug handling with device fail-over. Fig. 3 .-. | application | `--.--' | .'-.-. <-- init, conf, Rx/Tx | | | | .---|--.--|--. <--- conf, link check, Rx/Tx | | | | | | v | v v v v .-. | .---. .---. | bonding | | | fail-safe | | fail-safe | `.' | `-.-' `-.-' | | | | <-- init, conf, dev check, Rx/Tx `--' v v .---. .--. | ixgbe | | mlx4 | `---' `--' Combination to provide link fail-over with automatic hot-plug handling. Fig. 4 .-. | application | `--.--' | .'-.-. <-- init, conf, Rx/Tx | | | | .---|--.--|--. <--- conf, link check, Rx/Tx | | | | | | v | v v v v .-. | .---. .---. | bonding | | | fail-safe | | fail-safe | `.' | `-.-' `-.-' | | | | <--- init, conf, dev check, Rx/Tx `--' | | .--'---. .---'--. | | | | v v v v .. .. .. .. | mlx4 1 | | mlx4 2 | | mlx4 1 | | mlx4 2 | | port 1 | | port 1 | | port 2 | | port 2 | `' `' `' `' Complex use case with link fail-over at port level and automatic hot-plug handling with device fail-over. 1. LSC vs. RMV A link status change is a valid state for a device. It calls for specific responses, e.g. a link switch in a bonding device, without losing the general configuration of the port. The removal of a device calls for more than pausing operations and switching an active device. The party responsible for initializing the device should take care of closing it properly. If this party also wants to be able to restore the device if it was plugged back in, it would need be able to initialize it back and reconfigure its previous state. As
Re: [dpdk-dev] [PATCH v3 3/4] net/tap: add netlink back-end for flow API
On Thu, 9 Mar 2017 15:29:01 + Ferruh Yigit wrote: > On 3/7/2017 4:35 PM, Pascal Mazon wrote: > > Each kernel netdevice may have queueing disciplines set for it, > > which determine how to handle the packet (mostly on egress). That's > > part of the TC (Traffic Control) mechanism. > > This is nice. > qdisc is egress part of the network stack right, is there any ingress > part of it? > qdisc is mainly for egress (can range from 0 to fffe), but there is one qdisc for ingress (). > > > > Through TC, it is possible to set filter rules that match specific > > packets, and act according to what is in the rule. This is a perfect > > candidate to implement the flow API for the tap PMD, as it has an > > associated kernel netdevice automatically. > > > > Each flow API rule will be translated into its TC counterpart. > > What can be use cases here? Well, it can be any case with rte_flow. Such as directing incoming packets to specific queues for the application, dropping them, and any kind of filtering (along those supported, see later patch). > > > > > To leverage TC, it is necessary to communicate with the kernel using > > netlink. This patch introduces a library to help that communication. > > > > What do you think implementing these out of tap PMD? These can be used > by KNI too. > Well, I don't know about KNI, but I think setting it in tap PMD, which is the current sole user for this, is a good start. It will always be time later to make it more generic for other uses. Regards, Pascal > > Inside netlink.c, functions are generic for any netlink messaging. > > Inside tcmsgs.c, functions are specific to deal with TC rules. > > > > Signed-off-by: Pascal Mazon > > Acked-by: Olga Shern > <...>
Re: [dpdk-dev] [PATCH v3 1/4] net/tap: move private elements to external header
On Thu, 9 Mar 2017 15:28:31 + Ferruh Yigit wrote: > On 3/7/2017 4:35 PM, Pascal Mazon wrote: > > In the next patch, access to struct pmd_internals will be necessary > > in tap_flow.c to store the flows. > > > > Signed-off-by: Pascal Mazon > > Acked-by: Olga Shern > > --- > > drivers/net/tap/Makefile | 1 + > > drivers/net/tap/rte_eth_tap.c | 34 ++-- > > drivers/net/tap/tap.h | 73 > > +++ > > tap.h is a generic name, I think rte_eth_tap.h fits better here. > > <...> I'm ok with that. I'll change it in my next version.
Re: [dpdk-dev] [PATCH v11 5/7] mbuf: add a timestamp to the mbuf for latencystats
> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Cc: dev@dpdk.org; Van Haaren, Harry ; Thomas > Monjalon > ; Pattan, Reshma > Subject: Re: [dpdk-dev] [PATCH v11 5/7] mbuf: add a timestamp to the mbuf for > latencystats > > On Thu, 9 Mar 2017 16:25:32 + > Remy Horton wrote: > > > From: Harry van Haaren > > > > This commit adds a uint64_t to the mbuf struct, > > allowing collection of latency and jitter statistics > > by measuring packet I/O timestamps. This change is > > required by the latencystats library. > > > > Signed-off-by: Reshma Pattan > > Signed-off-by: Harry van Haaren > > --- > > lib/librte_mbuf/rte_mbuf.h | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > index ce57d47..e0dad6e 100644 > > --- a/lib/librte_mbuf/rte_mbuf.h > > +++ b/lib/librte_mbuf/rte_mbuf.h > > @@ -514,6 +514,9 @@ struct rte_mbuf { > > > > /** Timesync flags for use with IEEE1588. */ > > uint16_t timesync; > > + > > + /** Timestamp for measuring latency. */ > > + uint64_t timestamp; > > } __rte_cache_aligned; > > > > /** > > This creates a hole in the mbuf structure, and won't apply to current > version of mbuf that has priv_size. This series was previously targeted to 17.02 when the mbuf rework was on the horizon, so placement of the timestamp was not as critical as it is now. Given the mbuf rework[1] is currently in progress, perhaps it is smarter to remove this patch from the patchset and depend on the mbuf rework patchset to add the timestamp instead. The latency stat library should probably also set the new PKT_RX_TIMESTAMP field in the mbuf[2]. [1] mbuf reworkhttp://dpdk.org/dev/patchwork/patch/21601/ [2] mbuf timestamp http://dpdk.org/dev/patchwork/patch/21607/
Re: [dpdk-dev] DPDK17.02 flow classification can not take effect
I am using 16.11 and noticed the same thing, mbuf->packet_type is not populated == 0 . On Thu, Mar 9, 2017 at 12:57 PM, Chillance Zen wrote: > previous 16.07 works well , when I examine the rte_mbuf->packet_type,I can > get the well-corresponded packet type. after migrating code to 17.02 ,I > will never get packet_type again, should I switch on something ,or did I > miss any configuration ? > > btw,how could I judge whether the nic support packet type > classification?(since there is no bit indicating this offloading feature in > ol_flags) > > Thanks&Regards > Linc >
Re: [dpdk-dev] [PATCH] devtools: make commits with stable tag outstanding
2017-02-23 10:49, Yuanhan Liu: > So that, as a stable maintainer while picking commits to a stable release, > I could pay less attention to those have it and pay more attention to those > don't have it. Good idea > + stable="-" > + git show $id | grep -qi 'Cc: .*sta...@dpdk.org' && stable="S" Instead of git show, it is preferrable to get only the message content: git log --format='%b' -1 $id The regex may miss a Cc: without space, and may match a Cc in the middle of a sentence. I suggest this one: grep -qi '^Cc: *sta...@dpdk.org' The script is written in the style "set -e" so it must be avoided to have a "false" statement not catched. It can be done in 2 ways: 1/ git log --format='%b' -1 $id | grep -qi '^Cc: *sta...@dpdk.org' && stable='S' || stable='-' 2/ if git log --format='%b' -1 $id | grep -qi '^Cc: *sta...@dpdk.org' ; then stable='S' else stable='-' endif We can also move it in a function in order to keep only the logic in the "main" block: stable=$(stable_tag $id) # print a marker for stable tag presence stable_tag () # { if git log --format='%b' -1 $id | grep -qi '^Cc: *sta...@dpdk.org' ; then echo 'S' else echo '-' endif }
Re: [dpdk-dev] FW: Issues with ixgbe and rte_flow
Hi, On Fri, Mar 10, 2017 at 07:12:24AM +, Lu, Wenzhuo wrote: > Some replies in line. > Send it again with off the us...@dpdk.org. Seems I cannot send the mail > successfully with it. I'm removing everyone from the CC list and putting back dev@dpdk.org then, let's not break everyone's DPDK-related spam filters anymore. This is separate from the VLAN item issue mentioned in the same thread, I think this one is related to the ixgbe implementation (sorry Wenzhuo :) More below. [...] > Hi Le Scouarnec, > > > -Original Message- > > From: Le Scouarnec Nicolas [...] > > I also have a side comment which might be more related to the general > > rte_flow API than to the specific implementation in ixgbe. I don't know > > if it is specific to ixgbe's implementation but, as a user, the > > rte_flow_error returned was not very useful for it does return the error > > of the last tried filter-type (L2 tunnel in ixgbe), and not the error of > > the filter-type that my setup should use (flow director). The helpfulness of error messages is entirely a PMD's responsibility since they are not hard-coded into the API. rte_flow is deliberately not aware of the various underlying APIs used by PMDs to implement flow rules. In this case, assuming your rule could only work through flow director, the PMD should have saved and reported the error encountered with that filter type (either by saving it before attempting others, or recognizing that this rule wouldn't work with others and not attempt them). > > I had to change the order in which filters are tried in ixgbe code to > > get a more useful error message. As NICs can have several filter-types, > > It would be be useful to the user if rte_flow_validate/create could > > return the errors for all filter types tried although that would require > > to change the rte_flow API and returning an array of rte_flow_error and > > not a single struct. rte_flow_error is a compromise to provide a detailed explanation about the errno value returned by a function, which describes exactly one error (ideally the first error encountered). While returning an array could provide additional details about subsequent errors, I think it would needlessly complicate the API and make it slower without much benefit, given that most (if not all) PMD functions return as soon as one error is detected and also for performance reasons. > It's a good suggestion. I remember we have some discussion about how to > feedback the error to the APP. I think the reason why we don't make it too > complex because it's the first step of generic API. Now we see some feedback > from the users, we can keep optimizing it :) Right. Note ixgbe could append several messages to rte_flow_error.message if necessary as in such cases. Storage for the message is provided by the PMD and can be const, static or dynamic. However I really think the best approach would be to report the most relevant (first) error only. > And about the tpid, ethertype. I have a thought that why we need it as it's > duplicate with the item type. I think the initial design is just following > the IEEE spec to define the structures so we will not miss anything. But why > not do some optimization. For VLAN the tpid must be 0x8100, for IPv4, the > ethertype must be 0x0800. So why bothering let APP provide them and driver > check them? Seems we can just remove these fields from the structures, it can > make things simpler. > > Adrien, as you're the maintainer of rte_flow, any thought about these ideas? > Thanks. Basically I think we must give users the flexibility to provide nonstandard TPIDs as well (there's apparently already a few), so we can't just leave it out entirely. It's really about whether we want to make the inner type part of the VLAN item with TPID outside or keep it as-is. Anyway please reply to my previous message if you want to talk about that and let's fork this one to discuss the rte_flow_error issue. -- Adrien Mazarguil 6WIND
[dpdk-dev] [PATCH 0/3] add support of musl
musl is an alternative LIBC to GLIBC. It is an implementation of the userspace portion of the standard library functionality described in the ISO C and POSIX standards, plus common extensions. Some DPDK customers fail to build DPDK with musl. But so far execinfo.h is not supported by musl. In order to build DPDK with musl, there is a need to remove references to execinfo.h. Currently only backtrace() and backtrace_symbols( ) from execinfo.h are used in rte_dump_stack( ) in lib/librte_eal/linuxapp/eal/eal_debug.c . This rte_dump_stack( ) is only used to get the name of fucntions in call stack for debugging. Wei Dai (3): examples/performance-thread: remove reference to execinfo.h config: add support of musl eal: remove references to execinfo.h for musl config/common_linuxapp | 1 + examples/performance-thread/common/lthread_tls.c | 1 - lib/librte_eal/linuxapp/eal/eal_debug.c | 7 ++- 3 files changed, 7 insertions(+), 2 deletions(-) -- 2.7.4
[dpdk-dev] [PATCH 1/3] examples/performance-thread: remove reference to execinfo.h
There is no function to refer any part of it, so remove the reference to it. And there is no this file in musl. So need to remove it to support musl. The musl is an alternative LIBC to GLIBC and provides the standard C/POSIX library and extensions. The musl can be got from http://www.musl-libc.org Signed-off-by: Wei Dai --- examples/performance-thread/common/lthread_tls.c | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/performance-thread/common/lthread_tls.c b/examples/performance-thread/common/lthread_tls.c index 6876f83..47505f2 100644 --- a/examples/performance-thread/common/lthread_tls.c +++ b/examples/performance-thread/common/lthread_tls.c @@ -42,7 +42,6 @@ #include #include #include -#include #include #include -- 2.7.4
[dpdk-dev] [PATCH 2/3] config: add support of musl
When building DPDK with musl, there is need to generate the MACRO named RTE_LIBC_MUSL in rte_config.h to remove some references to execinfo.h which is not supported by musl now. Got more details about musl from http://www.musl-libc.org . Signed-off-by: Wei Dai --- config/common_linuxapp | 1 + 1 file changed, 1 insertion(+) diff --git a/config/common_linuxapp b/config/common_linuxapp index 00ebaac..66fb0a3 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -46,3 +46,4 @@ CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y +CONFIG_RTE_LIBC_MUSL=n -- 2.7.4
[dpdk-dev] [PATCH 3/3] eal: remove references to execinfo.h for musl
execinfo.h is not supported by musl now. need to remove references to execinfo.h to build DPDK with musl. musl is an implementation of the userspace portion of the standard library functionality described in the ISO C and POSIX standards, plus common extensions. Get more details about musl from http://www.musl-libc.org Signed-off-by: Wei Dai --- lib/librte_eal/linuxapp/eal/eal_debug.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_debug.c b/lib/librte_eal/linuxapp/eal/eal_debug.c index 5fbc17c..d2416ee 100644 --- a/lib/librte_eal/linuxapp/eal/eal_debug.c +++ b/lib/librte_eal/linuxapp/eal/eal_debug.c @@ -31,7 +31,10 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#include +#ifndef RTE_LIBC_MUSL + #include +#endif + #include #include #include @@ -47,6 +50,7 @@ /* dump the stack of the calling core */ void rte_dump_stack(void) { +#ifndef RTE_LIBC_MUSL void *func[BACKTRACE_SIZE]; char **symb = NULL; int size; @@ -64,6 +68,7 @@ void rte_dump_stack(void) } free(symb); +#endif } /* not implemented in this environment */ -- 2.7.4
Re: [dpdk-dev] [PATCH] examples/ip_fragmentation: fix check of packet type
Hi Wei, > -Original Message- > From: Dai, Wei > Sent: Friday, March 10, 2017 3:28 AM > To: dev@dpdk.org; Ananyev, Konstantin > Cc: Dai, Wei ; sta...@dpdk.org > Subject: [PATCH] examples/ip_fragmentation: fix check of packet type > > The packet_type in mbuf is not correctly filled by ixgbe 82599 NIC. > To use the ether_type in ethernet header to check packet type is > more reliaber. > > Fixes: 3c0184cc0c60 ("examples: replace some offload flags with packet type") > Fixes: ab351fe1c95c ("mbuf: remove packet type from offload flags") > > Cc: sta...@dpdk.org > > Reported-by: Fangfang Wei > Signed-off-by: Wei Dai > Tested-by: Fagnfang Wei > --- > examples/ip_fragmentation/main.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/examples/ip_fragmentation/main.c > b/examples/ip_fragmentation/main.c > index e1e32c6..8612984 100644 > --- a/examples/ip_fragmentation/main.c > +++ b/examples/ip_fragmentation/main.c > @@ -268,6 +268,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct > lcore_queue_conf *qconf, > uint32_t i, len, next_hop_ipv4; > uint8_t next_hop_ipv6, port_out, ipv6; > int32_t len2; > + struct ether_hdr *eth_hdr; > > ipv6 = 0; > rxq = &qconf->rx_queue_list[queueid]; > @@ -276,13 +277,14 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct > lcore_queue_conf *qconf, > port_out = port_in; > > /* Remove the Ethernet header and trailer from the input packet */ > + eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); > rte_pktmbuf_adj(m, (uint16_t)sizeof(struct ether_hdr)); > > /* Build transmission burst */ > len = qconf->tx_mbufs[port_out].len; > > /* if this is an IPv4 packet */ > - if (RTE_ETH_IS_IPV4_HDR(m->packet_type)) { > + if (eth_hdr->ether_type == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { > struct ipv4_hdr *ip_hdr; > uint32_t ip_dst; > /* Read the lookup key (i.e. ip_dst) from the input packet */ > @@ -316,7 +318,7 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct > lcore_queue_conf *qconf, > if (unlikely (len2 < 0)) > return; > } > - } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) { > + } else if (eth_hdr->ether_type == rte_be_to_cpu_16(ETHER_TYPE_IPv6)) { > /* if this is an IPv6 packet */ > struct ipv6_hdr *ip_hdr; > > @@ -363,8 +365,8 @@ l3fwd_simple_forward(struct rte_mbuf *m, struct > lcore_queue_conf *qconf, > void *d_addr_bytes; > > m = qconf->tx_mbufs[port_out].m_table[i]; > - struct ether_hdr *eth_hdr = (struct ether_hdr *) > - rte_pktmbuf_prepend(m, (uint16_t)sizeof(struct > ether_hdr)); > + eth_hdr = (struct ether_hdr *)rte_pktmbuf_prepend(m, > + (uint16_t)sizeof(struct ether_hdr)); > if (eth_hdr == NULL) { > rte_panic("No headroom in mbuf.\n"); > } Thanks for the fix. Would it be more convenient to do what l3fwd does: Check what ptype capabilities are provided by HW, if no ptype support detected, then install an RX callback? Konstantin > -- > 2.7.4
Re: [dpdk-dev] [PATCH v3 5/6] net/tap: add packet type management
On Thu, 9 Mar 2017 14:26:08 + Ferruh Yigit wrote: > On 3/7/2017 4:31 PM, Pascal Mazon wrote: > > Advertize RTE_PTYPE_UNKNOWN since tap does not report any packet > > type. > > > > Signed-off-by: Pascal Mazon > > --- > > doc/guides/nics/features/tap.ini | 1 + > > drivers/net/tap/rte_eth_tap.c| 15 +++ > > 2 files changed, 16 insertions(+) > > > > diff --git a/doc/guides/nics/features/tap.ini > > b/doc/guides/nics/features/tap.ini index 6aa11874e2bc..7f3f4d661dd7 > > 100644 --- a/doc/guides/nics/features/tap.ini > > +++ b/doc/guides/nics/features/tap.ini > > @@ -13,6 +13,7 @@ MTU update = Y > > Multicast MAC filter = Y > > Speed capabilities = Y > > Unicast MAC filter = Y > > +Packet type parsing = Y > > Other kdrv = Y > > ARMv7= Y > > ARMv8= Y > > diff --git a/drivers/net/tap/rte_eth_tap.c > > b/drivers/net/tap/rte_eth_tap.c index d76f1dc83b03..edb5d2a82f12 > > 100644 --- a/drivers/net/tap/rte_eth_tap.c > > +++ b/drivers/net/tap/rte_eth_tap.c > > @@ -36,6 +36,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -216,6 +217,8 @@ pmd_rx_burst(void *queue, struct rte_mbuf > > **bufs, uint16_t nb_pkts) mbuf->data_len = len; > > mbuf->pkt_len = len; > > mbuf->port = rxq->in_port; > > + mbuf->packet_type = rte_net_get_ptype(mbuf, NULL, > > + > > RTE_PTYPE_ALL_MASK); > > Isn't PMD become inconsistent with this update. It reports > RTE_PTYPE_UNKNOWN, but sets mbuf->packet_type with various ptype. I discussed this briefly with Keith, but my argument was that the librte_net did not provide a list of supported packet types, so I didn't want to have to keep tap supported packet type list in sync with the lib. But it's actually just a few lines to add, I'll do that and also set a comment to mention that it must be in sync with librte_net. > > Do we want software packet type parsing in PMD level? Any change some > users may not interested with this data at all. Well, most PMDs support packet type parsing, and among the vdev, I inspired that part from virtio, which does software ptype parsing. An application coded with physical hardware PMD in mind doesn't have to be recoded when switching to a tap PMD. Furthermore, the tap PMD is already using syscalls in its datapath, the cost of software packet parsing won't really change overall performance. Regards, Pascal > > > > > /* account for the receive frame */ > > bufs[num_rx++] = mbuf; > > @@ -769,6 +772,17 @@ tap_mtu_set(struct rte_eth_dev *dev, uint16_t > > mtu) return 0; > > } > > > > +static const uint32_t* > > +tap_dev_supported_ptypes_get(struct rte_eth_dev *dev __rte_unused) > > +{ > > + static const uint32_t ptypes[] = { > > + RTE_PTYPE_UNKNOWN, > > + > > + }; > > + > > + return ptypes; > > +} > > + > > static const struct eth_dev_ops ops = { > > .dev_start = tap_dev_start, > > .dev_stop = tap_dev_stop, > > @@ -793,6 +807,7 @@ static const struct eth_dev_ops ops = { > > .mtu_set= tap_mtu_set, > > .stats_get = tap_stats_get, > > .stats_reset= tap_stats_reset, > > + .dev_supported_ptypes_get = tap_dev_supported_ptypes_get, > > }; > > > > static int > > >
Re: [dpdk-dev] [PATCH 3/3] eal: remove references to execinfo.h for musl
2017-03-10 19:58, Wei Dai: > @@ -47,6 +50,7 @@ > /* dump the stack of the calling core */ > void rte_dump_stack(void) > { > +#ifndef RTE_LIBC_MUSL > void *func[BACKTRACE_SIZE]; > char **symb = NULL; > int size; > @@ -64,6 +68,7 @@ void rte_dump_stack(void) > } > > free(symb); > +#endif > } There are probably other libc implementations not supporting this feature. Instead of calling it "RTE_LIBC_MUSL", it should something like "ENABLE_BACKTRACE". Then you can add a musl section in the Linux quick start guide.
Re: [dpdk-dev] [PATCH] app/pdump: fix pdump can't find the driver when compiled dpdk to shared libraries
2017-03-03 17:27, zhaozhanxu: > When I compiled dpdk With configuration "CONFIG_RTE_BUILD_SHARED_LIB=y", > I get error message "EAL: no driver found for net_pcap_rx_0" and > "EAL: Driver cannot attach the device (net_pcap_rx_0)" by running pdump. > So I add library librte_pmd_pcap.so. [...] > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y) > + _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap > +endif The idea of having drivers as shared libraries is to use them as plugins. We are not going to link the applications with every drivers. Instead we load them explicitly with -d option. Someone should document it in http://dpdk.org/doc/guides/linux_gsg/build_sample_apps.html#running-a-sample-application Any volunteer?
Re: [dpdk-dev] [PATCH] kni: fast data availability check in thread_single loop
2017-01-18 07:11, Jay Rolette: > On Wed, Jan 18, 2017 at 5:05 AM, Sergey Vyazmitinov < > s.vyazmiti...@brain4net.com> wrote: > > > On Thu, Jan 12, 2017 at 12:29 AM, Ferruh Yigit > > wrote: > > > > > On 12/29/2016 11:23 PM, Sergey Vyazmitinov wrote: > > > > This allow to significant reduces packets processing latency. > > > > > > > > Signed-off-by: Sergey Vyazmitinov [...] > > > > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c > > > > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c > > > > @@ -45,6 +45,7 @@ MODULE_AUTHOR("Intel Corporation"); > > > > MODULE_DESCRIPTION("Kernel Module for managing kni devices"); > > > > > > > > #define KNI_RX_LOOP_NUM 1000 > > > > +#define KNI_RX_DATA_LOOP_NUM 2500 > > > > > > > > #define KNI_MAX_DEVICES 32 > > > > > > > > @@ -129,25 +130,39 @@ static struct pernet_operations kni_net_ops = { > > > > #endif > > > > }; > > > > > > > > -static int > > > > -kni_thread_single(void *data) > > > > +static inline void > > > > +kni_thread_single_rx_data_loop(struct kni_net *knet) > > > > { > > > > - struct kni_net *knet = data; > > > > - int j; > > > > struct kni_dev *dev; > > > > + int i; > > > > > > > > - while (!kthread_should_stop()) { > > > > - down_read(&knet->kni_list_lock); > > > > - for (j = 0; j < KNI_RX_LOOP_NUM; j++) { > > > > - list_for_each_entry(dev, &knet->kni_list_head, > > > list) { > > > > + for (i = 0; i < KNI_RX_DATA_LOOP_NUM; ++i) { > > > > > > When there are multiple KNI interfaces, and lets assume there is traffic > > > too, this will behave like: > > > > > > KNI1x2500 data_packets + KNI2x2500 data_packets KNI10x2500 > > > > > > After data packets, KNI1 resp_packet + KNI2 resp_packets ... > > > > > > Won't this scenario also may cause latency? And perhaps jitter according > > > KNI interface traffic loads? > > > > > > This may be good for some use cases, but not sure if this is good for > > all. > > > > > We can decrease KNI_RX_DATA_LOOP_NUM to some reasonable value. > > I can make test to find lower bound. > > Also, the point is in fast check for a new data in interface rx queue. > > May be will be better add some kind of break after several kni_net_rx > > calls. > > Without them loop ends very quickly. > > Anyway, this patch decrease average latency in my case from 4.5ms to > > 0.011ms in ping test with 10 packets. > > > > If you were seeing latency of 4.5ms, then it is more likely a different > issue. > > At the end of the loop where KNI is reading packets from the queue, it > calls *schedule_timeout_interruptible()* with (by default) a 5us timeout. > However, that call just guarantees that the thread will sleep for AT LEAST > 5us. > > For most x86 Linux distros, HZ = 250 in the kernel, which works out to 4ms. > I'm reasonably certain the latency you are seeing is because the KNI thread > is sleeping and not getting woken up like you might expect. > > When you increased the number of loops happening before the sleep, you > increased how long KNI spends before it sleeps and it happened to be long > enough in your particular test to change your average latency. If you ran > your test for a few minutes and built a histogram of ping times, I bet > you'll see ~4ms of latency pop up regularly. > > More details from when I dug into this behavior previously: > http://dpdk.org/ml/archives/dev/2015-June/018858.html No answer in this discussion. Should we close it in patchwork?
Re: [dpdk-dev] [PATCH v2 3/3] app/test: update test code
2017-03-07 04:19, Qi Zhang: > Update the test code to algin with callback function change. I think this patch must be squashed with the change in the header (patch 2). I have not tested but I guess the tests are not compiling between patch 2 and 3.
Re: [dpdk-dev] [PATCH v2 1/3] vfio: keep interrupt source read only
+Cc Anatoly, VFIO maintainer 2017-03-07 04:19, Qi Zhang: > Remove the inappropriate modification on get_max_intr > field that keep the intr_source read only. > > Signed-off-by: Qi Zhang
Re: [dpdk-dev] [PATCH] app/testpmd: Fix typos
2017-02-28 17:20, Mcnamara, John: > > fixes trivial typos in app/test-pmd/cmdline.c, app/test-pmd/icmpecho.c, > > app/test-pmd/testpmd.c > > > > Signed-off-by: Nirmoy Das > > > Hi Nirmoy, > > Thanks for that. > > Just a small heads-up since this is your first patch. The subject line should > be lowercase (apart from known abbreviations) and there is no need to list > the files in the body since git will record that anyway. Have a look at > the contribution guidelines for some more pointers: > > http://dpdk.org/doc/guides/contributing/patches.html#commit-messages-subject-line > > Apart from that, thanks for the patch. > > Acked-by: John McNamara Applied, thanks Nirmoy, we are looking for reviewers with a good english reading ;)
Re: [dpdk-dev] [PATCH] kni: fast data availability check in thread_single loop
On Fri, Mar 10, 2017 at 6:59 AM, Thomas Monjalon wrote: > 2017-01-18 07:11, Jay Rolette: > > On Wed, Jan 18, 2017 at 5:05 AM, Sergey Vyazmitinov < > > s.vyazmiti...@brain4net.com> wrote: > > > > > On Thu, Jan 12, 2017 at 12:29 AM, Ferruh Yigit > > > > wrote: > > > > > > > On 12/29/2016 11:23 PM, Sergey Vyazmitinov wrote: > > > > > This allow to significant reduces packets processing latency. > > > > > > > > > > Signed-off-by: Sergey Vyazmitinov > [...] > > > > > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c > > > > > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c > > > > > @@ -45,6 +45,7 @@ MODULE_AUTHOR("Intel Corporation"); > > > > > MODULE_DESCRIPTION("Kernel Module for managing kni devices"); > > > > > > > > > > #define KNI_RX_LOOP_NUM 1000 > > > > > +#define KNI_RX_DATA_LOOP_NUM 2500 > > > > > > > > > > #define KNI_MAX_DEVICES 32 > > > > > > > > > > @@ -129,25 +130,39 @@ static struct pernet_operations kni_net_ops > = { > > > > > #endif > > > > > }; > > > > > > > > > > -static int > > > > > -kni_thread_single(void *data) > > > > > +static inline void > > > > > +kni_thread_single_rx_data_loop(struct kni_net *knet) > > > > > { > > > > > - struct kni_net *knet = data; > > > > > - int j; > > > > > struct kni_dev *dev; > > > > > + int i; > > > > > > > > > > - while (!kthread_should_stop()) { > > > > > - down_read(&knet->kni_list_lock); > > > > > - for (j = 0; j < KNI_RX_LOOP_NUM; j++) { > > > > > - list_for_each_entry(dev, > &knet->kni_list_head, > > > > list) { > > > > > + for (i = 0; i < KNI_RX_DATA_LOOP_NUM; ++i) { > > > > > > > > When there are multiple KNI interfaces, and lets assume there is > traffic > > > > too, this will behave like: > > > > > > > > KNI1x2500 data_packets + KNI2x2500 data_packets KNI10x2500 > > > > > > > > After data packets, KNI1 resp_packet + KNI2 resp_packets ... > > > > > > > > Won't this scenario also may cause latency? And perhaps jitter > according > > > > KNI interface traffic loads? > > > > > > > > This may be good for some use cases, but not sure if this is good for > > > all. > > > > > > > We can decrease KNI_RX_DATA_LOOP_NUM to some reasonable value. > > > I can make test to find lower bound. > > > Also, the point is in fast check for a new data in interface rx queue. > > > May be will be better add some kind of break after several kni_net_rx > > > calls. > > > Without them loop ends very quickly. > > > Anyway, this patch decrease average latency in my case from 4.5ms to > > > 0.011ms in ping test with 10 packets. > > > > > > > If you were seeing latency of 4.5ms, then it is more likely a different > > issue. > > > > At the end of the loop where KNI is reading packets from the queue, it > > calls *schedule_timeout_interruptible()* with (by default) a 5us > timeout. > > However, that call just guarantees that the thread will sleep for AT > LEAST > > 5us. > > > > For most x86 Linux distros, HZ = 250 in the kernel, which works out to > 4ms. > > I'm reasonably certain the latency you are seeing is because the KNI > thread > > is sleeping and not getting woken up like you might expect. > > > > When you increased the number of loops happening before the sleep, you > > increased how long KNI spends before it sleeps and it happened to be long > > enough in your particular test to change your average latency. If you ran > > your test for a few minutes and built a histogram of ping times, I bet > > you'll see ~4ms of latency pop up regularly. > > > > More details from when I dug into this behavior previously: > > http://dpdk.org/ml/archives/dev/2015-June/018858.html > > No answer in this discussion. > Should we close it in patchwork? > I don't believe we should merge the patch. Jay
Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: fix port stop
2017-02-12 04:34, Wu, Jingjing: > > > > --- a/app/test-pmd/testpmd.c > > > > +++ b/app/test-pmd/testpmd.c > > > > @@ -1490,13 +1490,13 @@ stop_port(portid_t pid) > > > > continue; > > > > } > > > > > > > > + rte_eth_dev_stop(pi); > > > > + > > > > port = &ports[pi]; > > > > if (rte_atomic16_cmpset(&(port->port_status), > > > > RTE_PORT_STARTED, > > > > RTE_PORT_HANDLING) == 0) > > > > continue; > > > > > > > > - rte_eth_dev_stop(pi); > > > > - > > > > > > I don't think this fix is correct to move rte_eth_dev_stop above. > > > > > > We need to make sure rte_eth_dev_start is called in start_port. For > > > vmdq configuration, You just need to change the configuration when > > > port is stopped. > > > > I think the stop_port() function should always stop the port even if the > > port_status is not correct for any reason. > > At present stop_port() returns without stopping the port if the port_status > > is not > > RTE_PORT_STARTED. > > > This is testpmd's design. If you think it is an issue, maybe you need to > prepare patch to optimize it. But for VMDQ configuration, I'd like to make it > independent but not mixed. > > > The VMDq configuration is done whet the port is stopped, however to the > > complete the VMDq configuration the port must be started. > > > > To change minor, we can stop port, then configure VMDQ and then start port. > > You make port started in VMDQ config, the Symmetry of stop/start command is > broken and it is not easy to maintain. Should we close this patch in patchwork?
Re: [dpdk-dev] [PATCH] app/testpmd: add default MAC set cmd
2017-03-03 10:20, Pascal Mazon: > Signed-off-by: Pascal Mazon > --- > app/test-pmd/cmdline.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) This patch looks trivial but waiting for Jingjing approval to be sure.
Re: [dpdk-dev] [PATCH v2 1/3] vfio: keep interrupt source read only
> Remove the inappropriate modification on get_max_intr field that keep the > intr_source read only. > > Signed-off-by: Qi Zhang > --- Acked-by: Anatoly Burakov
Re: [dpdk-dev] [PATCH v3] proc-info: added collectd-format and host-id options.
> > > Extended proc-info application to send DPDK port statistics to STDOUT in > > > the > > > format expected by collectd exec plugin. Added HOST ID option to identify > > > the > > > host DPDK process is running on when multiple instance of DPDK are > > > running in > > > parallel. This is needed for the barometer project in OPNFV. > > > > > > Signed-off-by: Roman Korynkevych > > > > Reviewed-by: Maryam Tahhan > > Acked-by: Harry van Haaren Applied, thanks
[dpdk-dev] [PATCH] net/e1000: advertise offload capabilities for the EM PMD
The hardware offload capabilities are not being advertised for the EM PMD. Because of this, applications that only enable these features if the device advertises them will never do so. Normally this is not an issue since normal packet processing should work even if hardware offload is not available. But, in older versions of Virtual Box the e1000 device emulation (Intel PRO/1000 MT Desktop 82540EM) assumes that it should enable VLAN stripping even if the driver does not request it. This means that any ingress packets that have a VLAN tag will be stripped. Since the application did not request to enable VLAN stripping it is not expecting these packets so they are not processed as VLAN packets. Regardless of the Virtual Box issue, the driver should be advertising supported capabilities as is done in other drivers. Signed-off-by: Allain Legacy --- drivers/net/e1000/em_ethdev.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c index 4066ef9..e76e34b 100644 --- a/drivers/net/e1000/em_ethdev.c +++ b/drivers/net/e1000/em_ethdev.c @@ -1086,6 +1086,16 @@ static int eth_em_set_mc_addr_list(struct rte_eth_dev *dev, dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */ dev_info->max_rx_pktlen = em_get_max_pktlen(hw); dev_info->max_mac_addrs = hw->mac.rar_entry_count; + dev_info->rx_offload_capa = + DEV_RX_OFFLOAD_VLAN_STRIP | + DEV_RX_OFFLOAD_IPV4_CKSUM | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM; + dev_info->tx_offload_capa = + DEV_TX_OFFLOAD_VLAN_INSERT | + DEV_TX_OFFLOAD_IPV4_CKSUM | + DEV_TX_OFFLOAD_UDP_CKSUM | + DEV_TX_OFFLOAD_TCP_CKSUM; /* * Starting with 631xESB hw supports 2 TX/RX queues per port. -- 1.8.3.1
Re: [dpdk-dev] [PATCH] examples: optind should be reset to one not zero
2017-03-09 21:11, Wiles, Keith: > > > On Mar 9, 2017, at 2:41 PM, Thomas Monjalon > > wrote: > > > > 2017-02-14 16:09, Keith Wiles: > >> Signed-off-by: Keith Wiles > > > > Please, could explain and describe what was the consequence of this > > wrong reset value? > > You can just reply and I will integrate it in the commit when applying. > > Here is the man page text: > > "The variable optind is the index of the next element to be processed in > argv. The system initializes this value to 1. > The caller can reset it to 1 to restart scanning of the same argv, or when > scanning a new argument vector.” > > The problem I saw with my application was trying to parse the wrong option, > which can happen as DPDK parses the first part of the command line and the > application parses the second part. If you call getopt() multiple times in > the same execution, the behavior is not maintained when using zero for optind. > > > — Do not put the next part in the commit message unless you want — > As a side note it appears MacOS is much more picky about trying to use optind > of zero and not one. I would get a segfault on DPDK running in MacOS and I > assumed Linux/FreeBSD could be fixing optind internally, but it is best to > set the correct value in all cases. > > I hope that helps. Applied with this explanation integrated, thanks.
Re: [dpdk-dev] [PATCH 3/3] eal: remove references to execinfo.h for musl
On Fri, Mar 10, 2017 at 1:40 PM, Thomas Monjalon wrote: > 2017-03-10 19:58, Wei Dai: >> @@ -47,6 +50,7 @@ >> /* dump the stack of the calling core */ >> void rte_dump_stack(void) >> { >> +#ifndef RTE_LIBC_MUSL >> void *func[BACKTRACE_SIZE]; >> char **symb = NULL; >> int size; >> @@ -64,6 +68,7 @@ void rte_dump_stack(void) >> } >> >> free(symb); >> +#endif >> } > > There are probably other libc implementations not supporting this feature. > Instead of calling it "RTE_LIBC_MUSL", it should something like > "ENABLE_BACKTRACE". > Then you can add a musl section in the Linux quick start guide. Also I would improve the code readability by removing the preprocessor junk from it by moving the rte_dump_stack() function into eal_backtrace.c and make that conditionally compile based on CONFIG_ENABLE_BACKTRACE.
[dpdk-dev] [PATCH v2] eventdev: remove default queue overriding
PMDs that only do a specific type of scheduling cannot provide CFG_ALL_TYPES, so the Eventdev infrastructure should not demand that every PMD supports CFG_ALL_TYPES. By not overriding the default configuration of the queue as suggested by the PMD, the eventdev_common unit tests can pass on all PMDs, regardless of their capabilities. RTE_EVENT_QUEUE_CFG_DEFAULT is no longer used by the eventdev layer it can be removed now. Applications should use CFG_ALL_TYPES if they require enqueue of all types a queue, or specify which type of queue they require. The CFG_DEFAULT value is changed to CFG_ALL_TYPES in event/skeleton, to not break the compile. A capability flag is added that indicates if the underlying PMD supports creating queues of ALL_TYPES. Signed-off-by: Harry van Haaren --- v2: - added capability flag to indicate if PMD supports ALL_TYPES --- drivers/event/skeleton/skeleton_eventdev.c | 2 +- lib/librte_eventdev/rte_eventdev.c | 1 - lib/librte_eventdev/rte_eventdev.h | 13 +++-- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/event/skeleton/skeleton_eventdev.c b/drivers/event/skeleton/skeleton_eventdev.c index dee0faf..308e28e 100644 --- a/drivers/event/skeleton/skeleton_eventdev.c +++ b/drivers/event/skeleton/skeleton_eventdev.c @@ -196,7 +196,7 @@ skeleton_eventdev_queue_def_conf(struct rte_eventdev *dev, uint8_t queue_id, queue_conf->nb_atomic_flows = (1ULL << 20); queue_conf->nb_atomic_order_sequences = (1ULL << 20); - queue_conf->event_queue_cfg = RTE_EVENT_QUEUE_CFG_DEFAULT; + queue_conf->event_queue_cfg = RTE_EVENT_QUEUE_CFG_ALL_TYPES; queue_conf->priority = RTE_EVENT_DEV_PRIORITY_NORMAL; } diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c index 68bfc3b..c32a776 100644 --- a/lib/librte_eventdev/rte_eventdev.c +++ b/lib/librte_eventdev/rte_eventdev.c @@ -593,7 +593,6 @@ rte_event_queue_setup(uint8_t dev_id, uint8_t queue_id, RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_def_conf, -ENOTSUP); (*dev->dev_ops->queue_def_conf)(dev, queue_id, &def_conf); - def_conf.event_queue_cfg = RTE_EVENT_QUEUE_CFG_DEFAULT; queue_conf = &def_conf; } diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h index 7073987..4c73a82 100644 --- a/lib/librte_eventdev/rte_eventdev.h +++ b/lib/librte_eventdev/rte_eventdev.h @@ -271,6 +271,13 @@ struct rte_mbuf; /* we just use mbuf pointers; no need to include rte_mbuf.h */ * * @see rte_event_schedule(), rte_event_dequeue_burst() */ +#define RTE_EVENT_DEV_CAP_QUEUE_ALL_TYPES (1ULL << 3) +/**< Event device is capable of enqueuing events of any type to any queue. + * If this capability is not set, the queue only supports events of the + * *RTE_EVENT_QUEUE_CFG_* type that it was created with. + * + * @see RTE_EVENT_QUEUE_CFG_* values + */ /* Event device priority levels */ #define RTE_EVENT_DEV_PRIORITY_HIGHEST 0 @@ -471,12 +478,6 @@ rte_event_dev_configure(uint8_t dev_id, /* Event queue specific APIs */ /* Event queue configuration bitmap flags */ -#define RTE_EVENT_QUEUE_CFG_DEFAULT(0) -/**< Default value of *event_queue_cfg* when rte_event_queue_setup() invoked - * with queue_conf == NULL - * - * @see rte_event_queue_setup() - */ #define RTE_EVENT_QUEUE_CFG_TYPE_MASK (3ULL << 0) /**< Mask for event queue schedule type configuration request */ #define RTE_EVENT_QUEUE_CFG_ALL_TYPES (0ULL << 0) -- 2.7.4
Re: [dpdk-dev] [PATCH v3 01/17] eventdev: fix API docs and test for timeout ticks
> -Original Message- > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com] > Sent: Monday, March 6, 2017 10:34 AM > To: Van Haaren, Harry > Subject: Re: [PATCH v3 01/17] eventdev: fix API docs and test for timeout > ticks > > ret = rte_event_dequeue_timeout_ticks(TEST_DEV_ID, 100, &timeout_ticks); > > - TEST_ASSERT_SUCCESS(ret, "Fail to get timeout_ticks"); > > + /* -ENOTSUP is a valid return if timeout is not supported by device */ > > + if (ret != -ENOTSUP) > > + TEST_ASSERT_SUCCESS(ret, "Fail to get timeout_ticks"); > > Header file change looks good. IMO, In the test case, We can introduce > TEST_UNSUPPORTED in addition to TEST_SUCCESS and TEST_FAILED to reflect > the actual status. I guess it will useful for future tests as well. Adding TEST_UNSUPPORTED requires a larger changeset than software eventdev should make to the test infrastructure; my preference is to use the error check solution as above for v4 patchset. Rework of testing infrastructure should be done on mainline dpdk if deemed required, and enable on Eventdev branch after a rebase.
[dpdk-dev] [RFC] New CLI for DPDK
I would like to request for comments on a new CLI design and get any feedback. I have attached the cli.rst text, which is still a work in progress for you review. I have also ported the CLI to a version of Pktgen on the ‘dev’ branch of the repo in DPDK.org. http://dpdk.org/browse/apps/pktgen-dpdk/refs/?h=dev I would like to submit the CLI library to be used in DPDK, if that seems reasonable to everyone. I need more testing of the API and Pktgen, but I feel it has a simpler design, easier to understand and hopefully make it easier for developers to add commands. As an example I quickly converted over testpmd from CMDLINE to CLI (I just add a -I option to select CLI instead) and reduced the test-pmd/cmdline.c file from 12.6K lines to about 4.5K lines. I did not fully test the code, but the ones I did test seem to work. I do not expect DPDK to convert to the new CLI only if it makes sense and I am not suggesting to replace CMDLINE library. If you play with the new CLI in pktgen and see any problems or want to suggest new features or changes please let me know. Comments on the cli.rst text is also welcome, but the cli.rst is not complete. I think this file needs to be broken into two one to explain the example and another to explain CLI internals. --- .. BSD LICENSE Copyright(c) 2017 Intel Corporation. All rights reserved. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CLI Sample Application === CLI stands for "Command Line Interface". This chapter describes the CLI sample application that is part of the Data Plane Development Kit (DPDK). The CLI is a workalike replacement for cmdline library in DPDK and has a simpler interface and programming model. The primary goal of CLI is to allow the developer to create commands quickly and with very little compile or runtime configuration. Using standard Unix* like constructs which are very familar to the developer. Allowing the developer to construct a set of commands for development or deployment of the application. The CLI design uses a directory like design instead of a single level command line interface. Allowing the developer to use a directory style solution to controlling a DPDK application. The directory style design is nothing new, but it does have some advantages. Overview The CLI sample application is a simple application that demonstrates the use of the command line interface in the DPDK. This application is a readline-like interface that can be used to control a DPDK application. One of the big advantages of CLI over Cmdline is it is dynamic, which means nodes or items can be added and removed on the fly. Which allows adding new directories, file or commands as needed or removing these items at runtime. The CLI has no global modifiable variable as the one global pointer is a thread based variable. Which allows the developer to have multiple CLI commands per thread if needed. Another big advantage is the calling of the backend function to support a command is very familar to developers as it is basically just a argc/argv style command and the developer gets the complete command line. One other big advantage is the use of MAP structures, to help identify commands quickly plus allowing the developer to define new versions of commands and be able to identify these new versions using a simple identifier value. Look at the sample application to see a simple usage. Another advantage of CLI is how simple it is to add new directroies, files and comma
[dpdk-dev] [PATCH 1/2] net/thunderx: fix build issues with 32bit target
Fixes: e438796617dc ("net/thunderx: add PMD skeleton") Signed-off-by: Jerin Jacob --- drivers/net/thunderx/nicvf_struct.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/thunderx/nicvf_struct.h b/drivers/net/thunderx/nicvf_struct.h index c900e12..0e4e1dd 100644 --- a/drivers/net/thunderx/nicvf_struct.h +++ b/drivers/net/thunderx/nicvf_struct.h @@ -58,8 +58,8 @@ struct nicvf_txq { union sq_entry_t *desc; nicvf_phys_addr_t phys; struct rte_mbuf **txbuffs; - uint64_t sq_head; - uint64_t sq_door; + uintptr_t sq_head; + uintptr_t sq_door; struct rte_mempool *pool; struct nicvf *nic; void (*pool_free)(struct nicvf_txq *sq); @@ -74,8 +74,8 @@ struct nicvf_txq { struct nicvf_rxq { uint64_t mbuf_phys_off; - uint64_t cq_status; - uint64_t cq_door; + uintptr_t cq_status; + uintptr_t cq_door; nicvf_phys_addr_t phys; union cq_entry_t *desc; struct nicvf_rbdr *shared_rbdr; -- 2.5.5
[dpdk-dev] [PATCH 2/2] config: enable the thunderx nicvf driver
Enable Thunderx nicvf PMD driver in the common config as it does not have any build dependency with any external library and/or architecture. Signed-off-by: Jerin Jacob --- config/common_base | 2 +- config/defconfig_arm64-thunderx-linuxapp-gcc | 10 -- doc/guides/nics/thunderx.rst | 3 +-- 3 files changed, 2 insertions(+), 13 deletions(-) diff --git a/config/common_base b/config/common_base index aeee13e..e48417d 100644 --- a/config/common_base +++ b/config/common_base @@ -279,7 +279,7 @@ CONFIG_RTE_LIBRTE_PMD_SZEDATA2_AS=0 # # Compile burst-oriented Cavium Thunderx NICVF PMD driver # -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD=n +CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD=y CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_INIT=n CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_RX=n CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_TX=n diff --git a/config/defconfig_arm64-thunderx-linuxapp-gcc b/config/defconfig_arm64-thunderx-linuxapp-gcc index a5b1e24..9818a2e 100644 --- a/config/defconfig_arm64-thunderx-linuxapp-gcc +++ b/config/defconfig_arm64-thunderx-linuxapp-gcc @@ -36,13 +36,3 @@ CONFIG_RTE_MACHINE="thunderx" CONFIG_RTE_CACHE_LINE_SIZE=128 CONFIG_RTE_MAX_NUMA_NODES=2 CONFIG_RTE_MAX_LCORE=96 - -# -# Compile Cavium Thunderx NICVF PMD driver -# -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD=y -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_INIT=n -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_RX=n -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_TX=n -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_DRIVER=n -CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_MBOX=n diff --git a/doc/guides/nics/thunderx.rst b/doc/guides/nics/thunderx.rst index 1314ee9..a95b701 100644 --- a/doc/guides/nics/thunderx.rst +++ b/doc/guides/nics/thunderx.rst @@ -77,9 +77,8 @@ Config File Options The following options can be modified in the ``config`` file. Please note that enabling debugging options may affect system performance. -- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD`` (default ``n``) +- ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD`` (default ``y``) - By default it is enabled only for defconfig_arm64-thunderx-* config. Toggle compilation of the ``librte_pmd_thunderx_nicvf`` driver. - ``CONFIG_RTE_LIBRTE_THUNDERX_NICVF_DEBUG_INIT`` (default ``n``) -- 2.5.5
Re: [dpdk-dev] [PATCH v9 04/18] lib: add new distributor code
On Mon, Mar 06, 2017 at 09:10:19AM +, David Hunt wrote: > This patch includes public header file which will be used once > we add in the symbol versioning for v20 and v1705 APIs. > > Also includes v1702 header file, and code for new Now 1705. > burst-capable distributor library. This will be re-named as > rte_distributor.h later in the patch-set > > The new distributor code contains a very similar API to the legacy code, > but now sends bursts of up to 8 mbufs to each worker. Flow ID's are > reduced to 15 bits for an optimal flow matching algorithm. > > Signed-off-by: David Hunt > --- > lib/librte_distributor/Makefile | 1 + > lib/librte_distributor/rte_distributor.c | 628 > +++ > lib/librte_distributor/rte_distributor_private.h | 7 +- > lib/librte_distributor/rte_distributor_v1705.h | 269 ++ > 4 files changed, 904 insertions(+), 1 deletion(-) > create mode 100644 lib/librte_distributor/rte_distributor.c > create mode 100644 lib/librte_distributor/rte_distributor_v1705.h > Minor nit, I think this patch might be squashed into the previous one, to have new structures and code together. /Bruce
Re: [dpdk-dev] [RFC] New CLI for DPDK
On Fri, 10 Mar 2017 15:25:31 + "Wiles, Keith" wrote: > I would like to request for comments on a new CLI design and get any > feedback. I have attached the cli.rst text, which is still a work in progress > for you review. > > I have also ported the CLI to a version of Pktgen on the ‘dev’ branch of the > repo in DPDK.org. > > http://dpdk.org/browse/apps/pktgen-dpdk/refs/?h=dev > > I would like to submit the CLI library to be used in DPDK, if that seems > reasonable to everyone. I need more testing of the API and Pktgen, but I feel > it has a simpler design, easier to understand and hopefully make it easier > for developers to add commands. > > As an example I quickly converted over testpmd from CMDLINE to CLI (I just > add a -I option to select CLI instead) and reduced the test-pmd/cmdline.c > file from 12.6K lines to about 4.5K lines. I did not fully test the code, but > the ones I did test seem to work. > > I do not expect DPDK to convert to the new CLI only if it makes sense and I > am not suggesting to replace CMDLINE library. > > If you play with the new CLI in pktgen and see any problems or want to > suggest new features or changes please let me know. > > Comments on the cli.rst text is also welcome, but the cli.rst is not > complete. I think this file needs to be broken into two one to explain the > example and another to explain CLI internals. > It would be great if all DPDK examples used a similar architecture. And having a common infrastructure would help. But not sure it needs to be special. Why should this be DPDK specific? What you are building really ends up being an application framework at some point. Surely, there are lots of others already in open source. Heck even VPP has its own CLI inside.
Re: [dpdk-dev] dpdk 0005-net-bonding-reconfigure-all-slave-queues-every-time.patch issue
On Thu, Mar 9, 2017 at 3:22 AM, Wen Chiu wrote: > Hi, > > 0005-net-bonding-reconfigure-all-slave-queues-every-time.patch is now > officially in dpdk 17.02. This is commit 1e2eff64f554 ("net/bonding: reconfigure all slave queues every time"). > But, it caused segmentation fault every time when > I configured bonding. In slave_configure(), "Setup Tx Queues" logic change > from for q_id=old_nb_tx_queues to qid=0 which always enters the for loop and > calls rte_eth_tx_queue_setup. After that, rte_eth_dev_start() is called to > start the device. In rte_eth_dev_start(), vmxnet3_dev_start() is called > which calls vmxnet3_dev_rxtx_init(). In vmxnet3_dev_rxtx_init(), after for > loop for rx_queues; dev->data->tx_queues[0] is override with value like > 0x121b20600 which is an invalid memory address that caused the fault. > > Without this 0005 patch, looks like rte_eth_tx_queue_setup() is never called > as q_id=old_nb_tx_queues never < nb_tx_queues. So, I suspect the calls to > queue_setup() somehow makes the queues to be setup incorrectly or > incompletely which causes the fault. So did the slave_configure() actually return an error? > Has anyone else encounters the same > issue? No, not with the device I tested with. > Regards, > > Wen Chiu > > >
Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
On Mon, Mar 06, 2017 at 09:10:24AM +, David Hunt wrote: > Also bumped up the ABI version number in the Makefile > > Signed-off-by: David Hunt > --- > lib/librte_distributor/Makefile| 2 +- > lib/librte_distributor/rte_distributor.c | 57 +++--- > lib/librte_distributor/rte_distributor_v1705.h | 89 > ++ A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn. /Bruce
Re: [dpdk-dev] [RFC] New CLI for DPDK
> On Mar 10, 2017, at 10:06 AM, Stephen Hemminger > wrote: > > On Fri, 10 Mar 2017 15:25:31 + > "Wiles, Keith" wrote: > >> I would like to request for comments on a new CLI design and get any >> feedback. I have attached the cli.rst text, which is still a work in >> progress for you review. >> >> I have also ported the CLI to a version of Pktgen on the ‘dev’ branch of the >> repo in DPDK.org. >> >> http://dpdk.org/browse/apps/pktgen-dpdk/refs/?h=dev >> >> I would like to submit the CLI library to be used in DPDK, if that seems >> reasonable to everyone. I need more testing of the API and Pktgen, but I >> feel it has a simpler design, easier to understand and hopefully make it >> easier for developers to add commands. >> >> As an example I quickly converted over testpmd from CMDLINE to CLI (I just >> add a -I option to select CLI instead) and reduced the test-pmd/cmdline.c >> file from 12.6K lines to about 4.5K lines. I did not fully test the code, >> but the ones I did test seem to work. >> >> I do not expect DPDK to convert to the new CLI only if it makes sense and I >> am not suggesting to replace CMDLINE library. >> >> If you play with the new CLI in pktgen and see any problems or want to >> suggest new features or changes please let me know. >> >> Comments on the cli.rst text is also welcome, but the cli.rst is not >> complete. I think this file needs to be broken into two one to explain the >> example and another to explain CLI internals. >> > > It would be great if all DPDK examples used a similar architecture. And > having a common > infrastructure would help. > > But not sure it needs to be special. Why should this be DPDK specific? > What you are building really ends up being an application > framework at some point. Surely, there are lots of others already in open > source. > > Heck even VPP has its own CLI inside. I have been looking for one for years and never found one that met my needs of easy to use and easy to understand. If you can find a better one then please let me know. This code does use some DPDK APIs but they can be removed to make it standalone and the first version I did was standalone. Some of the ones I found were similar to cmdline and some took it a step farther by trying to do everything one would ever need in a CLI. Those are way too big and difficult to use, then you have the ones that are barely a step above readline or just writing you own. The cmdline interface falls closer to the trying to do everything for you, by converting strings into values with structures/macros difficult to understand at a glance. IMHO this one is simple and easy to understand. But in truth the cmdline interface in DPDK is difficult to use and to write code for, takes way to many lines of code to make a simple command. The current Cmdline is also not dynamic, which makes it difficult to add features on the fly. All of the commands are at the same level and using a directory structure allows the developer to use what directory path he takes to denote a context for the command. As the example of converting test-pmd to use CLI the number of lines dropped from 12.6K to 4.5K lines. The cmdline code is also not consider to be production quality (from the docs) and I would like to fix that problem for DPDK. Regards, Keith
Re: [dpdk-dev] [PATCH 1/1] net/mlx4: add port parameter
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gaetan Rivet > Sent: Friday, March 03, 2017 10:40 AM ... > + errno = 0; > + tmp = strtoul(val, NULL, 0); The robustness of the strtoul() could be improved with something like the following to catch non-integer characters following the port number. char *end = NULL; tmp = strtoull(val, &end, 0); if ((val[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) > + if (errno) { > + WARN("%s: \"%s\" is not a valid integer", key, val); > + return -errno; > + } > + if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) { > + if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) { > + ERROR("invalid port index %lu (max: %u)", > + tmp, MLX4_PMD_MAX_PHYS_PORTS - 1); > + return -EINVAL; > + } > + conf->active_ports |= 1 << tmp; > + } else { > + WARN("%s: unknown parameter", key); > + return -EINVAL; > + } > + return 0; > +} The usage of strtoul() should be moved to be within the strcmp(MLX4_PMD_PORT_KVARG, key) IF statement. That way the "val" would only be parsed if "key" is "port" and it is expected that "val" is an integer. > + if (mlx4_args(pci_dev->device.devargs, &conf)) { > + ERROR("failed to process device arguments"); > + goto error; > + } It would be helpful for debugging if the error message included the devargs so that we can see what is wrong with the input. > + /* Use all ports when none are defined */ > + if (conf.active_ports == 0) { > + for (i = 0; i < MLX4_PMD_MAX_PHYS_PORTS; i++) > + conf.active_ports |= 1 << i; > + } Rather than use a loop to populate all active fields would a #define with an all ports mask be better suited to this. Or alternatively just change the IF statement below to use the following and avoid the need for this loop altogether: if (conf.active_ports & !(conf.active_ports & (1 << i))) continue;
Re: [dpdk-dev] [PATCH 1/1] net/mlx4: add port parameter
On Fri, 10 Mar 2017 16:24:32 + "Legacy, Allain" wrote: > The robustness of the strtoul() could be improved with something like the > following to catch non-integer characters following the port number. > > char *end = NULL; > tmp = strtoull(val, &end, 0); > if ((val[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) Extra () no necessary here. Also errno is not set unless the return value is ULLONG_MAX. It will be last value. Something like: tmp = strtoull(val, &end, 0); if (!*val || !*end || (tmp == ULLONG_MAX && errno)) ... If endptr is not NULL, strtoul() stores the address of the first invalid character in *endptr. If there were no digits at all, str‐ toul() stores the original value of nptr in *endptr (and returns 0). In particular, if *nptr is not '\0' but **endptr is '\0' on return, the entire string is valid. ... RETURN VALUE The strtoul() function returns either the result of the conversion or, if there was a leading minus sign, the negation of the result of the conversion represented as an unsigned value, unless the original (non‐ negated) value would overflow; in the latter case, strtoul() returns ULONG_MAX and sets errno to ERANGE. Precisely the same holds for str‐ toull() (with ULLONG_MAX instead of ULONG_MAX).
Re: [dpdk-dev] [RFC] New CLI for DPDK
On Fri, 10 Mar 2017 16:22:49 + "Wiles, Keith" wrote: > > On Mar 10, 2017, at 10:06 AM, Stephen Hemminger > > wrote: > > > > On Fri, 10 Mar 2017 15:25:31 + > > "Wiles, Keith" wrote: > > > >> I would like to request for comments on a new CLI design and get any > >> feedback. I have attached the cli.rst text, which is still a work in > >> progress for you review. > >> > >> I have also ported the CLI to a version of Pktgen on the ‘dev’ branch of > >> the repo in DPDK.org. > >> > >> http://dpdk.org/browse/apps/pktgen-dpdk/refs/?h=dev > >> > >> I would like to submit the CLI library to be used in DPDK, if that seems > >> reasonable to everyone. I need more testing of the API and Pktgen, but I > >> feel it has a simpler design, easier to understand and hopefully make it > >> easier for developers to add commands. > >> > >> As an example I quickly converted over testpmd from CMDLINE to CLI (I just > >> add a -I option to select CLI instead) and reduced the test-pmd/cmdline.c > >> file from 12.6K lines to about 4.5K lines. I did not fully test the code, > >> but the ones I did test seem to work. > >> > >> I do not expect DPDK to convert to the new CLI only if it makes sense and > >> I am not suggesting to replace CMDLINE library. > >> > >> If you play with the new CLI in pktgen and see any problems or want to > >> suggest new features or changes please let me know. > >> > >> Comments on the cli.rst text is also welcome, but the cli.rst is not > >> complete. I think this file needs to be broken into two one to explain the > >> example and another to explain CLI internals. > >> > > > > It would be great if all DPDK examples used a similar architecture. And > > having a common > > infrastructure would help. > > > > But not sure it needs to be special. Why should this be DPDK specific? > > What you are building really ends up being an application > > framework at some point. Surely, there are lots of others already in open > > source. > > > > Heck even VPP has its own CLI inside. > > I have been looking for one for years and never found one that met my needs > of easy to use and easy to understand. If you can find a better one then > please let me know. > > This code does use some DPDK APIs but they can be removed to make it > standalone and the first version I did was standalone. Some of the ones I > found were similar to cmdline and some took it a step farther by trying to do > everything one would ever need in a CLI. Those are way too big and difficult > to use, then you have the ones that are barely a step above readline or just > writing you own. The cmdline interface falls closer to the trying to do > everything for you, by converting strings into values with structures/macros > difficult to understand at a glance. IMHO this one is simple and easy to > understand. > > But in truth the cmdline interface in DPDK is difficult to use and to write > code for, takes way to many lines of code to make a simple command. The > current Cmdline is also not dynamic, which makes it difficult to add > features on the fly. > > All of the commands are at the same level and using a directory structure > allows the developer to use what directory path he takes to denote a context > for the command. As the example of converting test-pmd to use CLI the number > of lines dropped from 12.6K to 4.5K lines. The cmdline code is also not > consider to be production quality (from the docs) and I would like to fix > that problem for DPDK. > If you look beyond the simple C model there are a lot. TCL http://wanderinghorse.net/computing/shellish/eshell.html Though these may not match what is needed. Agree that the current cmdline() method is not suitable.
Re: [dpdk-dev] [PATCH v9 12/18] examples/distributor: allow for extra stats
On Mon, Mar 06, 2017 at 09:10:27AM +, David Hunt wrote: > This will allow us to see what's going on at various stages > throughout the sample app, with per-second visibility > > Signed-off-by: David Hunt > --- > examples/distributor/main.c | 139 > +++- > 1 file changed, 123 insertions(+), 16 deletions(-) > > diff --git a/examples/distributor/main.c b/examples/distributor/main.c > index cc3bdb0..3657e5d 100644 > --- a/examples/distributor/main.c > +++ b/examples/distributor/main.c > @@ -54,24 +54,53 @@ > > #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1 > > +#define ANSI_COLOR_RED "\x1b[31m" > +#define ANSI_COLOR_RESET "\x1b[0m" > + > /* mask of enabled ports */ > static uint32_t enabled_port_mask; > volatile uint8_t quit_signal; > volatile uint8_t quit_signal_rx; > +volatile uint8_t quit_signal_dist; > > static volatile struct app_stats { > struct { > uint64_t rx_pkts; > uint64_t returned_pkts; > uint64_t enqueued_pkts; > + uint64_t enqdrop_pkts; > } rx __rte_cache_aligned; > + int pad1 __rte_cache_aligned; > + > + struct { > + uint64_t in_pkts; > + uint64_t ret_pkts; > + uint64_t sent_pkts; > + uint64_t enqdrop_pkts; > + } dist __rte_cache_aligned; > + int pad2 __rte_cache_aligned; > > struct { > uint64_t dequeue_pkts; > uint64_t tx_pkts; > + uint64_t enqdrop_pkts; > } tx __rte_cache_aligned; > + int pad3 __rte_cache_aligned; > + > + uint64_t worker_pkts[64] __rte_cache_aligned; > + > + int pad4 __rte_cache_aligned; > + > + uint64_t worker_bursts[64][8] __rte_cache_aligned; > + > + int pad5 __rte_cache_aligned; > + > + uint64_t port_rx_pkts[64] __rte_cache_aligned; > + uint64_t port_tx_pkts[64] __rte_cache_aligned; > } app_stats; > > +struct app_stats prev_app_stats; > + > static const struct rte_eth_conf port_conf_default = { > .rxmode = { > .mq_mode = ETH_MQ_RX_RSS, > @@ -93,6 +122,8 @@ struct output_buffer { > struct rte_mbuf *mbufs[BURST_SIZE]; > }; > > +static void print_stats(void); > + > /* > * Initialises a given port using global settings and with the rx buffers > * coming from the mbuf_pool passed as parameter > @@ -378,25 +409,91 @@ static void > print_stats(void) > { > struct rte_eth_stats eth_stats; > - unsigned i; > - > - printf("\nRX thread stats:\n"); > - printf(" - Received:%"PRIu64"\n", app_stats.rx.rx_pkts); > - printf(" - Processed: %"PRIu64"\n", app_stats.rx.returned_pkts); > - printf(" - Enqueued:%"PRIu64"\n", app_stats.rx.enqueued_pkts); > - > - printf("\nTX thread stats:\n"); > - printf(" - Dequeued:%"PRIu64"\n", app_stats.tx.dequeue_pkts); > - printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts); > + unsigned int i, j; > + const unsigned int num_workers = rte_lcore_count() - 4; > > for (i = 0; i < rte_eth_dev_count(); i++) { > rte_eth_stats_get(i, ð_stats); > - printf("\nPort %u stats:\n", i); > - printf(" - Pkts in: %"PRIu64"\n", eth_stats.ipackets); > - printf(" - Pkts out: %"PRIu64"\n", eth_stats.opackets); > - printf(" - In Errs: %"PRIu64"\n", eth_stats.ierrors); > - printf(" - Out Errs: %"PRIu64"\n", eth_stats.oerrors); > - printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf); > + app_stats.port_rx_pkts[i] = eth_stats.ipackets; > + app_stats.port_tx_pkts[i] = eth_stats.opackets; > + } > + > + printf("\n\nRX Thread:\n"); > + for (i = 0; i < rte_eth_dev_count(); i++) { > + printf("Port %u Pktsin : %5.2f\n", i, > + (app_stats.port_rx_pkts[i] - > + prev_app_stats.port_rx_pkts[i])/100.0); > + prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i]; > + } > + printf(" - Received:%5.2f\n", > + (app_stats.rx.rx_pkts - > + prev_app_stats.rx.rx_pkts)/100.0); > + printf(" - Returned:%5.2f\n", > + (app_stats.rx.returned_pkts - > + prev_app_stats.rx.returned_pkts)/100.0); > + printf(" - Enqueued:%5.2f\n", > + (app_stats.rx.enqueued_pkts - > + prev_app_stats.rx.enqueued_pkts)/100.0); > + printf(" - Dropped: %s%5.2f%s\n", ANSI_COLOR_RED, > + (app_stats.rx.enqdrop_pkts - > + prev_app_stats.rx.enqdrop_pkts)/100.0, > + ANSI_COLOR_RESET); > + > + printf("Distributor thread:\n"); > + printf(" - In: %5.2f\n", > + (app_stats.dist.in_pkts - > + prev_app_stats.dist.in_pkts)/100.0); > + printf(" - Returned:
Re: [dpdk-dev] [PATCH v9 13/18] sample: distributor: wait for ports to come up
On Mon, Mar 06, 2017 at 09:10:28AM +, David Hunt wrote: > On some machines, ports take several seconds to come up. This > patch causes the app to wait. > > Signed-off-by: David Hunt Title prefix should match other patches. /Bruce
Re: [dpdk-dev] [PATCH v9 14/18] examples/distributor: give distributor a core
On Mon, Mar 06, 2017 at 09:10:29AM +, David Hunt wrote: > Signed-off-by: David Hunt > --- Title could do with some rewording - e.g. "make distributor API calls on dedicated core" This also requires an explanation as to why the change is being made. Does it not also need an update to the sample app guide about how the app works? /Bruce
Re: [dpdk-dev] [PATCH v9 15/18] examples/distributor: limit number of Tx rings
On Mon, Mar 06, 2017 at 09:10:30AM +, David Hunt wrote: > Signed-off-by: David Hunt > --- Please explain reason for change. /Bruce
Re: [dpdk-dev] [PATCH] app/testpmd: add default MAC set cmd
> -Original Message- > From: Pascal Mazon [mailto:pascal.ma...@6wind.com] > Sent: Friday, March 3, 2017 3:20 AM > To: Wu, Jingjing > Cc: dev@dpdk.org; Pascal Mazon > Subject: [PATCH] app/testpmd: add default MAC set cmd > > Signed-off-by: Pascal Mazon Acked-by: Jingjing Wu Just don't forget the doc update. Thanks.
Re: [dpdk-dev] [PATCH v9 16/18] examples/distributor: give Rx thread a core
On Mon, Mar 06, 2017 at 09:10:31AM +, David Hunt wrote: > This so that with the increased amount of stats we are counting, > we don't interfere with the rx core. > Where are the stats being counted in the current code and how would they interfere? /Bruce
Re: [dpdk-dev] [PATCH v9 00/18] distributor lib performance enhancements
On Mon, Mar 06, 2017 at 09:10:15AM +, David Hunt wrote: > This patch aims to improve the throughput of the distributor library. > > It uses a similar handshake mechanism to the previous version of > the library, in that bits are used to indicate when packets are ready > to be sent to a worker and ready to be returned from a worker. One main > difference is that instead of sending one packet in a cache line, it makes > use of the 7 free spaces in the same cache line in order to send up to > 8 packets at a time to/from a worker. > > The flow matching algorithm has had significant re-work, and now keeps an > array of inflight flows and an array of backlog flows, and matches incoming > flows to the inflight/backlog flows of all workers so that flow pinning to > workers can be maintained. > > The Flow Match algorithm has both scalar and a vector versions, and a > function pointer is used to select the post appropriate function at run time, > depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the > the scalar match function is selected, which should still gives a good boost > in performance over the non-burst API. > > v9 changes: >* fixed symbol versioning so it will compile on CentOS and RedHat > I've flagged a number of things that could do with being cleaned up in the patchset. However, the idea itself of adding a new burst-mode to improve distributor performance - and using vector matching to further boost it - is a good improvement. Therefore Series-Acked-by: Bruce Richardson
Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: fix port stop
> > To change minor, we can stop port, then configure VMDQ and then start port. > > > > You make port started in VMDQ config, the Symmetry of stop/start command > is broken and it is not easy to maintain. > > Should we close this patch in patchwork? Yes, I think so. Thanks Jingjing
Re: [dpdk-dev] [PATCH v6 2/2] app/testpmd: fix port stop
> -Original Message- > From: Wu, Jingjing > Sent: Friday, March 10, 2017 4:56 PM > To: Thomas Monjalon ; Iremonger, Bernard > > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH v6 2/2] app/testpmd: fix port stop > > > > To change minor, we can stop port, then configure VMDQ and then start > port. > > > > > > You make port started in VMDQ config, the Symmetry of stop/start > command > > is broken and it is not easy to maintain. > > > > Should we close this patch in patchwork? > > Yes, I think so. > > Thanks > Jingjing Agreed. Regards, Bernard.
Re: [dpdk-dev] [RFC] New CLI for DPDK
> On Mar 10, 2017, at 10:41 AM, Stephen Hemminger > wrote: > > On Fri, 10 Mar 2017 16:22:49 + > "Wiles, Keith" wrote: > >>> On Mar 10, 2017, at 10:06 AM, Stephen Hemminger >>> wrote: >>> >>> On Fri, 10 Mar 2017 15:25:31 + >>> "Wiles, Keith" wrote: >>> I would like to request for comments on a new CLI design and get any feedback. I have attached the cli.rst text, which is still a work in progress for you review. I have also ported the CLI to a version of Pktgen on the ‘dev’ branch of the repo in DPDK.org. http://dpdk.org/browse/apps/pktgen-dpdk/refs/?h=dev I would like to submit the CLI library to be used in DPDK, if that seems reasonable to everyone. I need more testing of the API and Pktgen, but I feel it has a simpler design, easier to understand and hopefully make it easier for developers to add commands. As an example I quickly converted over testpmd from CMDLINE to CLI (I just add a -I option to select CLI instead) and reduced the test-pmd/cmdline.c file from 12.6K lines to about 4.5K lines. I did not fully test the code, but the ones I did test seem to work. I do not expect DPDK to convert to the new CLI only if it makes sense and I am not suggesting to replace CMDLINE library. If you play with the new CLI in pktgen and see any problems or want to suggest new features or changes please let me know. Comments on the cli.rst text is also welcome, but the cli.rst is not complete. I think this file needs to be broken into two one to explain the example and another to explain CLI internals. >>> >>> It would be great if all DPDK examples used a similar architecture. And >>> having a common >>> infrastructure would help. >>> >>> But not sure it needs to be special. Why should this be DPDK specific? >>> What you are building really ends up being an application >>> framework at some point. Surely, there are lots of others already in open >>> source. >>> >>> Heck even VPP has its own CLI inside. >> >> I have been looking for one for years and never found one that met my needs >> of easy to use and easy to understand. If you can find a better one then >> please let me know. >> >> This code does use some DPDK APIs but they can be removed to make it >> standalone and the first version I did was standalone. Some of the ones I >> found were similar to cmdline and some took it a step farther by trying to >> do everything one would ever need in a CLI. Those are way too big and >> difficult to use, then you have the ones that are barely a step above >> readline or just writing you own. The cmdline interface falls closer to the >> trying to do everything for you, by converting strings into values with >> structures/macros difficult to understand at a glance. IMHO this one is >> simple and easy to understand. >> >> But in truth the cmdline interface in DPDK is difficult to use and to write >> code for, takes way to many lines of code to make a simple command. The >> current Cmdline is also not dynamic, which makes it difficult to add >> features on the fly. >> >> All of the commands are at the same level and using a directory structure >> allows the developer to use what directory path he takes to denote a context >> for the command. As the example of converting test-pmd to use CLI the number >> of lines dropped from 12.6K to 4.5K lines. The cmdline code is also not >> consider to be production quality (from the docs) and I would like to fix >> that problem for DPDK. >> > > If you look beyond the simple C model there are a lot. > TCL > http://wanderinghorse.net/computing/shellish/eshell.html I have looked at this one I believe and TCL is pretty big of a code base and not a fan of TCL in the first place :-) In VxWorks it used Tcl, which is good language to quickly write code, but not very friendly on a small memory machine in the case of embedded devices. This one is written in C++, which I am not again not a fan of C++ for these type of product. Also not a big C++ coder could be the other reason :-) > > Though these may not match what is needed. Agree that the current cmdline() > method is not suitable. Regards, Keith
Re: [dpdk-dev] [PATCH 1/1] net/mlx4: add port parameter
Hey, thanks for reading. On Fri, Mar 10, 2017 at 04:24:32PM +, Legacy, Allain wrote: -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gaetan Rivet Sent: Friday, March 03, 2017 10:40 AM ... + errno = 0; + tmp = strtoul(val, NULL, 0); The robustness of the strtoul() could be improved with something like the following to catch non-integer characters following the port number. char *end = NULL; tmp = strtoull(val, &end, 0); if ((val[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) Thanks for the suggestion, I'd keep the strtoul though ;). I will see into it for the v2, keeping in mind Stephen's remarks as well. + if (errno) { + WARN("%s: \"%s\" is not a valid integer", key, val); + return -errno; + } + if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) { + if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) { + ERROR("invalid port index %lu (max: %u)", + tmp, MLX4_PMD_MAX_PHYS_PORTS - 1); + return -EINVAL; + } + conf->active_ports |= 1 << tmp; + } else { + WARN("%s: unknown parameter", key); + return -EINVAL; + } + return 0; +} The usage of strtoul() should be moved to be within the strcmp(MLX4_PMD_PORT_KVARG, key) IF statement. That way the "val" would only be parsed if "key" is "port" and it is expected that "val" is an integer. This function was aimed at being generic. If we consider that no other parameter would ever be added, then the strcmp should be scraped altogether, as this callback is only called upon parsing this parameter in the kvlist in the first place. But we are in the control path, avoiding a useless strtoul at the price of making the function less useful seems an unnecessary tradeoff to me. + if (mlx4_args(pci_dev->device.devargs, &conf)) { + ERROR("failed to process device arguments"); + goto error; + } It would be helpful for debugging if the error message included the devargs so that we can see what is wrong with the input. Agreed. + /* Use all ports when none are defined */ + if (conf.active_ports == 0) { + for (i = 0; i < MLX4_PMD_MAX_PHYS_PORTS; i++) + conf.active_ports |= 1 << i; + } Rather than use a loop to populate all active fields would a #define with an all ports mask be better suited to this. Or alternatively just change the IF statement below to use the following and avoid the need for this loop altogether: if (conf.active_ports & !(conf.active_ports & (1 << i))) continue; I do not agree with removing this loop. Your second solution will scatter the relevant bits concerning the default value of the active_port configuration option. While being slightly slicker it hides it unnecessarily from the reader. The first solution might be interesting, however it makes this option dependent on two defines instead of one. If one had to change the default MAX_PHYS_PORT value for mlx4 (however unlikely it might be), then they would have to change the valid ALL_PORTS mask as well. In principle this contradicts DRY[1]. [1]: https://en.wikipedia.org/wiki/Don't_repeat_yourself -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH 1/1] net/mlx4: add port parameter
slight additional remark below. On Fri, Mar 10, 2017 at 06:11:59PM +0100, Gaëtan Rivet wrote: Hey, thanks for reading. On Fri, Mar 10, 2017 at 04:24:32PM +, Legacy, Allain wrote: -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gaetan Rivet Sent: Friday, March 03, 2017 10:40 AM ... + errno = 0; + tmp = strtoul(val, NULL, 0); The robustness of the strtoul() could be improved with something like the following to catch non-integer characters following the port number. char *end = NULL; tmp = strtoull(val, &end, 0); if ((val[0] == '\0') || (end == NULL) || (*end != '\0') || (errno != 0)) Thanks for the suggestion, I'd keep the strtoul though ;). I will see into it for the v2, keeping in mind Stephen's remarks as well. + if (errno) { + WARN("%s: \"%s\" is not a valid integer", key, val); + return -errno; + } + if (strcmp(MLX4_PMD_PORT_KVARG, key) == 0) { + if (tmp >= MLX4_PMD_MAX_PHYS_PORTS) { + ERROR("invalid port index %lu (max: %u)", + tmp, MLX4_PMD_MAX_PHYS_PORTS - 1); + return -EINVAL; + } + conf->active_ports |= 1 << tmp; + } else { + WARN("%s: unknown parameter", key); + return -EINVAL; + } + return 0; +} The usage of strtoul() should be moved to be within the strcmp(MLX4_PMD_PORT_KVARG, key) IF statement. That way the "val" would only be parsed if "key" is "port" and it is expected that "val" is an integer. This function was aimed at being generic. If we consider that no other parameter would ever be added, then the strcmp should be scraped altogether, as this callback is only called upon parsing this parameter in the kvlist in the first place. But we are in the control path, avoiding a useless strtoul at the price of making the function less useful seems an unnecessary tradeoff to me. + if (mlx4_args(pci_dev->device.devargs, &conf)) { + ERROR("failed to process device arguments"); + goto error; + } It would be helpful for debugging if the error message included the devargs so that we can see what is wrong with the input. Agreed. Actually, on second thought, here the devargs that was problematic has already been shown with a warning while it was being parsed. + /* Use all ports when none are defined */ + if (conf.active_ports == 0) { + for (i = 0; i < MLX4_PMD_MAX_PHYS_PORTS; i++) + conf.active_ports |= 1 << i; + } Rather than use a loop to populate all active fields would a #define with an all ports mask be better suited to this. Or alternatively just change the IF statement below to use the following and avoid the need for this loop altogether: if (conf.active_ports & !(conf.active_ports & (1 << i))) continue; I do not agree with removing this loop. Your second solution will scatter the relevant bits concerning the default value of the active_port configuration option. While being slightly slicker it hides it unnecessarily from the reader. The first solution might be interesting, however it makes this option dependent on two defines instead of one. If one had to change the default MAX_PHYS_PORT value for mlx4 (however unlikely it might be), then they would have to change the valid ALL_PORTS mask as well. In principle this contradicts DRY[1]. [1]: https://en.wikipedia.org/wiki/Don't_repeat_yourself -- Gaëtan Rivet 6WIND -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH v6 00/26] linux/eal: Remove most causes of panic on init
"Richardson, Bruce" writes: >> -Original Message- >> From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] >> Sent: Thursday, March 9, 2017 9:26 AM >> To: Richardson, Bruce >> Cc: Aaron Conole ; dev@dpdk.org; Stephen Hemminger >> >> Subject: Re: [dpdk-dev] [PATCH v6 00/26] linux/eal: Remove most causes of >> panic on init >> >> 2017-03-09 09:11, Bruce Richardson: >> > On Wed, Mar 08, 2017 at 10:58:27PM +0100, Thomas Monjalon wrote: >> > > Hi, >> > > >> > > Thanks for the work. >> > > I think it needs to be completed to have the same behaviour on bsdapp. >> > >> > Ideally, yes, but I also don't think the lack of BSD changes should >> > block the inclusion of this set. In terms of application writers, the >> > apps don't need to be written differently for BSD compared to Linux >> > because of this change. All that is different is that the BSD version >> > will panic rather than return the error code. >> >> So you do not have any issue about having a different behaviour on Linux >> and BSD? >> You are the bsdapp maintainer, so it is your call. > > I would infinitely prefer to have the same behavior. However, so long > as this does not require a user to change their app to be different on > BSD, I don't think lack of BSD support should block improving Linux. > > Aaron - will you be able to do equivalent changes to BSD within the 17.05 > timeframe? I'm going to spend some time next week looking into it. If it looks like it's akin to s/linuxapp/bsdapp/ and everything works, then I'll do it. If I get hung up on anything, I'll probably update with my status. -Aaron
Re: [dpdk-dev] [PATCH v3 2/2] ethdev: add hierarchical scheduler API
> -Original Message- > From: O'Driscoll, Tim > Sent: Wednesday, March 8, 2017 9:52 AM > To: Dumitrescu, Cristian ; Thomas Monjalon > > Cc: dev@dpdk.org; jerin.ja...@caviumnetworks.com; > balasubramanian.manoha...@cavium.com; hemant.agra...@nxp.com; > shreyansh.j...@nxp.com; Wiles, Keith ; Richardson, > Bruce > Subject: RE: [PATCH v3 2/2] ethdev: add hierarchical scheduler API > > > From: Dumitrescu, Cristian > > ... > > > OK I better understand now. > > > You should add this level of explanation in your patch. > > > > > > However I am reluctant to add an API if there is no user. > > > I think we should wait to have at least one existing driver > > implementing > > > this API before integrating it. > > > It was the approach of eventdev which has a dedicated next- tree. > > > > The next-tree solution could work, but IMO is not the best for this > > case, as this is purely driver development. This is just a TX offload > > feature that is well understood, as opposed to a new library with a huge > > design effort required like eventdev. > > > > I think we are reasonably close to get agreement on the API from Cavium, > > Intel and NXP. When this is done, how about including it in DPDK with > > the experimental tag attached to it until several drivers implement it? > > > > From Intel side, there are solid plans to implement it for ixgbe and > > i40e drivers in next DPDK releases, I am CC-ing Tim to confirm this. > > That's correct. We plan to add support for this in the ixgbe and i40e drivers > in > 17.08. Thomas, given Tim's confirmation of Intel's plans to implement this API for the ixgbe and i40e drivers in DPDK release 17.8, are you in favour of including this API in 17.5 with experimental tag (subject to full API agreement being reached)? IMO this approach has the advantage of showing that API agreement has been reached and driver development is in progress. Having it in DPDK is also a better way to advertise this API to the developers that would otherwise be unaware about this effort. > > > On > > Cavium and NXP side, Jerin and Hemant can comment on the plans to > > implement this API.
[dpdk-dev] [PATCH v4 01/17] eventdev: increase size of enq deq conf variables
Large port enqueue sizes were not supported as the value it was stored in was a uint8_t. Using uint8_ts to save space in config apis makes no sense - increasing the 3 instances of uint8_t enqueue / dequeue depths to more appropriate values (based on the context around them). Signed-off-by: Harry van Haaren Acked-by: Jerin Jacob --- lib/librte_eventdev/rte_eventdev.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h index e2fdf72..4876353 100644 --- a/lib/librte_eventdev/rte_eventdev.h +++ b/lib/librte_eventdev/rte_eventdev.h @@ -426,7 +426,7 @@ struct rte_event_dev_config { * This value cannot exceed the *max_event_queue_flows* which previously * provided in rte_event_dev_info_get() */ - uint8_t nb_event_port_dequeue_depth; + uint32_t nb_event_port_dequeue_depth; /**< Maximum number of events can be dequeued at a time from an * event port by this device. * This value cannot exceed the *max_event_port_dequeue_depth* @@ -637,12 +637,12 @@ struct rte_event_port_conf { * which was previously supplied to rte_event_dev_configure(). * This should be set to '-1' for *open system*. */ - uint8_t dequeue_depth; + uint16_t dequeue_depth; /**< Configure number of bulk dequeues for this event port. * This value cannot exceed the *nb_event_port_dequeue_depth* * which previously supplied to rte_event_dev_configure() */ - uint8_t enqueue_depth; + uint16_t enqueue_depth; /**< Configure number of bulk enqueues for this event port. * This value cannot exceed the *nb_event_port_enqueue_depth* * which previously supplied to rte_event_dev_configure() -- 2.7.4
[dpdk-dev] [PATCH v4 00/17] next-eventdev: event/sw software eventdev
The following patchset adds software eventdev implementation to the next-eventdev tree. Note that certain tests in this patchset require the queue overriding patch to pass, but it is not a build-time dependency. The main change is the reworked xstats_reset() API, which now includes a "mode" to select the Device, Port or Queue to reset. This implementation is based on the previous software eventdev v3 patchset, now with comments addressed: 1) use NULL in test link() functions (Jerin) 2) add CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV_DEBUG=n to config/common_base (Jerin) 3) xstats_names_get() return values improved 4) Fix event/sw bug in re-configuration of ports 5) Rework QUEUE_CFG_DEFAULT overriding of queue type at eventdev layer 6) Improve port unlinking to use optimized pull_port() if possible 7) Fix leak of credits on re-configuration of the device after usage 8) Improved error handling of unsupported queue types 9) Allow more flexible re-configuration of ports (lb and single-link changes) 10) Handle reconfiguring of ordered queues multiple times without errors Git log is clean, while checkpatch issues: 2 Errors on Complex Macro (which I beleive cannot be fixed) 4 Warnings in the scheduler logic, which reduce readability when fixed. 1 Else after return warning, which cannot be easily removed This patchset contains the work of multiple developers, please see signoffs on each patch. Signed-off-by: Harry van Haaren Bruce Richardson (14): eventdev: add APIs for extended stats event/sw: add new software-only eventdev driver event/sw: add device capabilities function event/sw: add configure function event/sw: add fns to return default port/queue config event/sw: add support for event queues event/sw: add support for event ports event/sw: add support for linking queues to ports event/sw: add worker core functions event/sw: add scheduling logic event/sw: add start stop and close functions event/sw: add dump function for easier debugging event/sw: add xstats support app/test: add unit tests for SW eventdev driver Harry van Haaren (3): eventdev: increase size of enq deq conf variables app/test: eventdev link all queues before start test/eventdev: rework timeout ticks test app/test/Makefile |5 +- app/test/autotest_data.py | 26 + app/test/test_eventdev.c | 12 +- app/test/test_eventdev_sw.c | 3179 + config/common_base|6 + drivers/event/Makefile|1 + drivers/event/sw/Makefile | 69 + drivers/event/sw/event_ring.h | 185 ++ drivers/event/sw/iq_ring.h| 176 ++ drivers/event/sw/rte_pmd_evdev_sw_version.map |3 + drivers/event/sw/sw_evdev.c | 818 +++ drivers/event/sw/sw_evdev.h | 318 +++ drivers/event/sw/sw_evdev_scheduler.c | 602 + drivers/event/sw/sw_evdev_worker.c| 188 ++ drivers/event/sw/sw_evdev_xstats.c| 674 ++ lib/librte_eventdev/rte_eventdev.c| 83 + lib/librte_eventdev/rte_eventdev.h| 148 +- lib/librte_eventdev/rte_eventdev_pmd.h| 74 + lib/librte_eventdev/rte_eventdev_version.map |4 + mk/rte.app.mk |1 + 20 files changed, 6567 insertions(+), 5 deletions(-) create mode 100644 app/test/test_eventdev_sw.c create mode 100644 drivers/event/sw/Makefile create mode 100644 drivers/event/sw/event_ring.h create mode 100644 drivers/event/sw/iq_ring.h create mode 100644 drivers/event/sw/rte_pmd_evdev_sw_version.map create mode 100644 drivers/event/sw/sw_evdev.c create mode 100644 drivers/event/sw/sw_evdev.h create mode 100644 drivers/event/sw/sw_evdev_scheduler.c create mode 100644 drivers/event/sw/sw_evdev_worker.c create mode 100644 drivers/event/sw/sw_evdev_xstats.c -- 2.7.4
[dpdk-dev] [PATCH v4 03/17] test/eventdev: rework timeout ticks test
This commit reworks the failing of the timeout_ticks test to pass when -ENOTSUP is returned, as this is a valid return if the PMD does not support waiting on dequeue(). Signed-off-by: Harry van Haaren --- app/test/test_eventdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/app/test/test_eventdev.c b/app/test/test_eventdev.c index 324ef5a..2ab51c3 100644 --- a/app/test/test_eventdev.c +++ b/app/test/test_eventdev.c @@ -519,7 +519,9 @@ test_eventdev_timeout_ticks(void) uint64_t timeout_ticks; ret = rte_event_dequeue_timeout_ticks(TEST_DEV_ID, 100, &timeout_ticks); - TEST_ASSERT_SUCCESS(ret, "Fail to get timeout_ticks"); + /* -ENOTSUP is a valid return if timeout is not supported by device */ + if (ret != -ENOTSUP) + TEST_ASSERT_SUCCESS(ret, "Fail to get timeout_ticks"); return TEST_SUCCESS; } -- 2.7.4
[dpdk-dev] [PATCH v4 02/17] app/test: eventdev link all queues before start
The software eventdev can lock-up if not all queues are linked to a port. For this reason, the software evendev fails to start if queues are not linked to anything. This commit creates dummy links from all queues to port 0 in the eventdev setup function and start/stop test, which would otherwise fail due to unlinked queues. Signed-off-by: Harry van Haaren --- app/test/test_eventdev.c | 8 1 file changed, 8 insertions(+) diff --git a/app/test/test_eventdev.c b/app/test/test_eventdev.c index 042a446..324ef5a 100644 --- a/app/test/test_eventdev.c +++ b/app/test/test_eventdev.c @@ -543,6 +543,10 @@ test_eventdev_start_stop(void) TEST_ASSERT_SUCCESS(ret, "Failed to setup port%d", i); } + ret = rte_event_port_link(TEST_DEV_ID, 0, NULL, NULL, 0); + TEST_ASSERT(ret == rte_event_queue_count(TEST_DEV_ID), + "Failed to link port, device %d", TEST_DEV_ID); + ret = rte_event_dev_start(TEST_DEV_ID); TEST_ASSERT_SUCCESS(ret, "Failed to start device%d", TEST_DEV_ID); @@ -569,6 +573,10 @@ eventdev_setup_device(void) TEST_ASSERT_SUCCESS(ret, "Failed to setup port%d", i); } + ret = rte_event_port_link(TEST_DEV_ID, 0, NULL, NULL, 0); + TEST_ASSERT(ret == rte_event_queue_count(TEST_DEV_ID), + "Failed to link port, device %d", TEST_DEV_ID); + ret = rte_event_dev_start(TEST_DEV_ID); TEST_ASSERT_SUCCESS(ret, "Failed to start device%d", TEST_DEV_ID); -- 2.7.4
[dpdk-dev] [PATCH v4 04/17] eventdev: add APIs for extended stats
From: Bruce Richardson Add in APIs for extended stats so that eventdev implementations can report out information on their internal state. The APIs are based on, but not identical to, the equivalent ethdev functions. Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- v3 -> v4: - xstats API reset allows selection of device/port/queue to reset --- lib/librte_eventdev/rte_eventdev.c | 83 lib/librte_eventdev/rte_eventdev.h | 142 +++ lib/librte_eventdev/rte_eventdev_pmd.h | 74 ++ lib/librte_eventdev/rte_eventdev_version.map | 4 + 4 files changed, 303 insertions(+) diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c index 68bfc3b..0280ef0 100644 --- a/lib/librte_eventdev/rte_eventdev.c +++ b/lib/librte_eventdev/rte_eventdev.c @@ -920,6 +920,89 @@ rte_event_dev_dump(uint8_t dev_id, FILE *f) } +static int +xstats_get_count(uint8_t dev_id, enum rte_event_dev_xstats_mode mode, + uint8_t queue_port_id) +{ + struct rte_eventdev *dev = &rte_eventdevs[dev_id]; + if (dev->dev_ops->xstats_get_names != NULL) + return (*dev->dev_ops->xstats_get_names)(dev, mode, + queue_port_id, + NULL, NULL, 0); + return 0; +} + +int +rte_event_dev_xstats_names_get(uint8_t dev_id, + enum rte_event_dev_xstats_mode mode, uint8_t queue_port_id, + struct rte_event_dev_xstats_name *xstats_names, + unsigned int *ids, unsigned int size) +{ + RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -ENODEV); + const int cnt_expected_entries = xstats_get_count(dev_id, mode, + queue_port_id); + if (xstats_names == NULL || cnt_expected_entries < 0 || + (int)size < cnt_expected_entries) + return cnt_expected_entries; + + /* dev_id checked above */ + const struct rte_eventdev *dev = &rte_eventdevs[dev_id]; + + if (dev->dev_ops->xstats_get_names != NULL) + return (*dev->dev_ops->xstats_get_names)(dev, mode, + queue_port_id, xstats_names, ids, size); + + return -ENOTSUP; +} + +/* retrieve eventdev extended statistics */ +int +rte_event_dev_xstats_get(uint8_t dev_id, enum rte_event_dev_xstats_mode mode, + uint8_t queue_port_id, const unsigned int ids[], + uint64_t values[], unsigned int n) +{ + RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -ENODEV); + const struct rte_eventdev *dev = &rte_eventdevs[dev_id]; + + /* implemented by the driver */ + if (dev->dev_ops->xstats_get != NULL) + return (*dev->dev_ops->xstats_get)(dev, mode, queue_port_id, + ids, values, n); + return -ENOTSUP; +} + +uint64_t +rte_event_dev_xstats_by_name_get(uint8_t dev_id, const char *name, + unsigned int *id) +{ + RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, 0); + const struct rte_eventdev *dev = &rte_eventdevs[dev_id]; + unsigned int temp = -1; + + if (id != NULL) + *id = (unsigned int)-1; + else + id = &temp; /* ensure driver never gets a NULL value */ + + /* implemented by driver */ + if (dev->dev_ops->xstats_get_by_name != NULL) + return (*dev->dev_ops->xstats_get_by_name)(dev, name, id); + return -ENOTSUP; +} + +int rte_event_dev_xstats_reset(uint8_t dev_id, + enum rte_event_dev_xstats_mode mode, int16_t queue_port_id, + const uint32_t ids[], uint32_t nb_ids) +{ + RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL); + struct rte_eventdev *dev = &rte_eventdevs[dev_id]; + + if (dev->dev_ops->xstats_reset != NULL) + return (*dev->dev_ops->xstats_reset)(dev, mode, queue_port_id, + ids, nb_ids); + return -ENOTSUP; +} + int rte_event_dev_start(uint8_t dev_id) { diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h index 4876353..d462047 100644 --- a/lib/librte_eventdev/rte_eventdev.h +++ b/lib/librte_eventdev/rte_eventdev.h @@ -1405,6 +1405,148 @@ rte_event_port_links_get(uint8_t dev_id, uint8_t port_id, int rte_event_dev_dump(uint8_t dev_id, FILE *f); +/** Maximum name length for extended statistics counters */ +#define RTE_EVENT_DEV_XSTATS_NAME_SIZE 64 + +/** + * Selects the component of the eventdev to retrieve statistics from. + */ +enum rte_event_dev_xstats_mode { + RTE_EVENT_DEV_XSTATS_DEVICE, + RTE_EVENT_DEV_XSTATS_PORT, + RTE_EVENT_DEV_XSTATS_QUEUE, +}; + +/** + * A name-key lookup element for extended statistics. + * + * This structure is used to map between names and ID numbers + * for extended e
[dpdk-dev] [PATCH v4 05/17] event/sw: add new software-only eventdev driver
From: Bruce Richardson This adds the minimal changes to allow a SW eventdev implementation to be compiled, linked and created at run time. The eventdev does nothing, but can be created via vdev on commandline, e.g. sudo ./x86_64-native-linuxapp-gcc/app/test --vdev=event_sw0 ... PMD: Creating eventdev sw device event_sw0, numa_node=0, sched_quanta=128 RTE>> Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- config/common_base| 6 + drivers/event/Makefile| 1 + drivers/event/sw/Makefile | 66 ++ drivers/event/sw/rte_pmd_evdev_sw_version.map | 3 + drivers/event/sw/sw_evdev.c | 177 ++ drivers/event/sw/sw_evdev.h | 148 + mk/rte.app.mk | 1 + 7 files changed, 402 insertions(+) create mode 100644 drivers/event/sw/Makefile create mode 100644 drivers/event/sw/rte_pmd_evdev_sw_version.map create mode 100644 drivers/event/sw/sw_evdev.c create mode 100644 drivers/event/sw/sw_evdev.h diff --git a/config/common_base b/config/common_base index 2538f4a..fc4eb79 100644 --- a/config/common_base +++ b/config/common_base @@ -458,6 +458,12 @@ CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV=y CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV_DEBUG=n # +# Compile PMD for software event device +# +CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV=y +CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV_DEBUG=n + +# # Compile librte_ring # CONFIG_RTE_LIBRTE_RING=y diff --git a/drivers/event/Makefile b/drivers/event/Makefile index 678279f..353441c 100644 --- a/drivers/event/Makefile +++ b/drivers/event/Makefile @@ -32,5 +32,6 @@ include $(RTE_SDK)/mk/rte.vars.mk DIRS-$(CONFIG_RTE_LIBRTE_PMD_SKELETON_EVENTDEV) += skeleton +DIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/event/sw/Makefile b/drivers/event/sw/Makefile new file mode 100644 index 000..d6836e3 --- /dev/null +++ b/drivers/event/sw/Makefile @@ -0,0 +1,66 @@ +# BSD LICENSE +# +# Copyright(c) 2016-2017 Intel Corporation. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + + +# library name +LIB = librte_pmd_sw_event.a + +# build flags +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) +# for older GCC versions, allow us to initialize an event using +# designated initializers. +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +ifeq ($(shell test $(GCC_VERSION) -le 50 && echo 1), 1) +CFLAGS += -Wno-missing-field-initializers +endif +endif + +# library version +LIBABIVER := 1 + +# versioning export map +EXPORT_MAP := rte_pmd_evdev_sw_version.map + +# library source files +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev.c + +# export include files +SYMLINK-y-include += + +# library dependencies +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += lib/librte_eal +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += lib/librte_eventdev +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += lib/librte_kvargs +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += lib/librte_ring + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/event/sw/rte_pmd_evdev_sw_version.map b/drivers/event/sw/rte_pmd_evdev_sw_version.map new file mode 100644 index 000..5352e7e --- /dev/null +++ b/drivers/event/sw/rte_pmd_evdev_sw_version.map @@ -0,0 +1,3 @@ +DPDK_17.05 { + local: *; +}; diff --git a/drivers/event/sw/s
[dpdk-dev] [PATCH v4 06/17] event/sw: add device capabilities function
From: Bruce Richardson Add in the info_get function to return details on the queues, flow, prioritization capabilities, etc. that this device has. Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- drivers/event/sw/sw_evdev.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 4de9bc1..9d8517a 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -44,6 +44,28 @@ #define SCHED_QUANTA_ARG "sched_quanta" #define CREDIT_QUANTA_ARG "credit_quanta" +static void +sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info) +{ + RTE_SET_USED(dev); + + static const struct rte_event_dev_info evdev_sw_info = { + .driver_name = SW_PMD_NAME, + .max_event_queues = RTE_EVENT_MAX_QUEUES_PER_DEV, + .max_event_queue_flows = SW_QID_NUM_FIDS, + .max_event_queue_priority_levels = SW_Q_PRIORITY_MAX, + .max_event_priority_levels = SW_IQS_MAX, + .max_event_ports = SW_PORTS_MAX, + .max_event_port_dequeue_depth = MAX_SW_CONS_Q_DEPTH, + .max_event_port_enqueue_depth = MAX_SW_PROD_Q_DEPTH, + .max_num_events = SW_INFLIGHT_EVENTS_TOTAL, + .event_dev_cap = (RTE_EVENT_DEV_CAP_QUEUE_QOS | + RTE_EVENT_DEV_CAP_EVENT_QOS), + }; + + *info = evdev_sw_info; +} + static int assign_numa_node(const char *key __rte_unused, const char *value, void *opaque) { @@ -78,6 +100,7 @@ static int sw_probe(const char *name, const char *params) { static const struct rte_eventdev_ops evdev_sw_ops = { + .dev_infos_get = sw_info_get, }; static const char *const args[] = { -- 2.7.4
[dpdk-dev] [PATCH v4 08/17] event/sw: add fns to return default port/queue config
From: Bruce Richardson Signed-off-by: Bruce Richardson --- drivers/event/sw/sw_evdev.c | 32 1 file changed, 32 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 28a2326..d1fa3a7 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -44,6 +44,35 @@ #define SCHED_QUANTA_ARG "sched_quanta" #define CREDIT_QUANTA_ARG "credit_quanta" +static void +sw_queue_def_conf(struct rte_eventdev *dev, uint8_t queue_id, +struct rte_event_queue_conf *conf) +{ + RTE_SET_USED(dev); + RTE_SET_USED(queue_id); + + static const struct rte_event_queue_conf default_conf = { + .nb_atomic_flows = 4096, + .nb_atomic_order_sequences = 1, + .event_queue_cfg = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY, + .priority = RTE_EVENT_DEV_PRIORITY_NORMAL, + }; + + *conf = default_conf; +} + +static void +sw_port_def_conf(struct rte_eventdev *dev, uint8_t port_id, +struct rte_event_port_conf *port_conf) +{ + RTE_SET_USED(dev); + RTE_SET_USED(port_id); + + port_conf->new_event_threshold = 1024; + port_conf->dequeue_depth = 16; + port_conf->enqueue_depth = 16; +} + static int sw_dev_configure(const struct rte_eventdev *dev) { @@ -116,6 +145,9 @@ sw_probe(const char *name, const char *params) static const struct rte_eventdev_ops evdev_sw_ops = { .dev_configure = sw_dev_configure, .dev_infos_get = sw_info_get, + + .queue_def_conf = sw_queue_def_conf, + .port_def_conf = sw_port_def_conf, }; static const char *const args[] = { -- 2.7.4
[dpdk-dev] [PATCH v4 07/17] event/sw: add configure function
From: Bruce Richardson Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- drivers/event/sw/sw_evdev.c | 15 +++ drivers/event/sw/sw_evdev.h | 11 +++ 2 files changed, 26 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 9d8517a..28a2326 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -44,6 +44,20 @@ #define SCHED_QUANTA_ARG "sched_quanta" #define CREDIT_QUANTA_ARG "credit_quanta" +static int +sw_dev_configure(const struct rte_eventdev *dev) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + const struct rte_eventdev_data *data = dev->data; + const struct rte_event_dev_config *conf = &data->dev_conf; + + sw->qid_count = conf->nb_event_queues; + sw->port_count = conf->nb_event_ports; + sw->nb_events_limit = conf->nb_events_limit; + + return 0; +} + static void sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info) { @@ -100,6 +114,7 @@ static int sw_probe(const char *name, const char *params) { static const struct rte_eventdev_ops evdev_sw_ops = { + .dev_configure = sw_dev_configure, .dev_infos_get = sw_info_get, }; diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index ab315d4..fda57df 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -35,6 +35,7 @@ #include #include +#include #define SW_DEFAULT_CREDIT_QUANTA 32 #define SW_DEFAULT_SCHED_QUANTA 128 @@ -129,7 +130,17 @@ struct sw_qid { struct sw_evdev { struct rte_eventdev_data *data; + uint32_t port_count; + uint32_t qid_count; + + /* +* max events in this instance. Cached here for performance. +* (also available in data->conf.nb_events_limit) +*/ + uint32_t nb_events_limit; + int32_t sched_quanta; + uint32_t credit_update_quanta; }; -- 2.7.4
[dpdk-dev] [PATCH v4 09/17] event/sw: add support for event queues
From: Bruce Richardson Add in the data structures for the event queues, and the eventdev functions to create and destroy those queues. Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- v3 -> v4: - return -ENOTSUP for CFG_ALL_TYPES, as it is not supported - fix bug when reordered queue is re-configured for second time --- drivers/event/sw/iq_ring.h | 176 drivers/event/sw/sw_evdev.c | 166 + drivers/event/sw/sw_evdev.h | 5 ++ 3 files changed, 347 insertions(+) create mode 100644 drivers/event/sw/iq_ring.h diff --git a/drivers/event/sw/iq_ring.h b/drivers/event/sw/iq_ring.h new file mode 100644 index 000..d480d15 --- /dev/null +++ b/drivers/event/sw/iq_ring.h @@ -0,0 +1,176 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Ring structure definitions used for the internal ring buffers of the + * SW eventdev implementation. These are designed for single-core use only. + */ +#ifndef _IQ_RING_ +#define _IQ_RING_ + +#include + +#include +#include +#include +#include + +#define IQ_RING_NAMESIZE 12 +#define QID_IQ_DEPTH 512 +#define QID_IQ_MASK (uint16_t)(QID_IQ_DEPTH - 1) + +struct iq_ring { + char name[IQ_RING_NAMESIZE] __rte_cache_aligned; + uint16_t write_idx; + uint16_t read_idx; + + struct rte_event ring[QID_IQ_DEPTH]; +}; + +#ifndef force_inline +#define force_inline inline __attribute__((always_inline)) +#endif + +static inline struct iq_ring * +iq_ring_create(const char *name, unsigned int socket_id) +{ + struct iq_ring *retval; + + retval = rte_malloc_socket(NULL, sizeof(*retval), 0, socket_id); + if (retval == NULL) + goto end; + + snprintf(retval->name, sizeof(retval->name), "%s", name); + retval->write_idx = retval->read_idx = 0; +end: + return retval; +} + +static inline void +iq_ring_destroy(struct iq_ring *r) +{ + rte_free(r); +} + +static force_inline uint16_t +iq_ring_count(const struct iq_ring *r) +{ + return r->write_idx - r->read_idx; +} + +static force_inline uint16_t +iq_ring_free_count(const struct iq_ring *r) +{ + return QID_IQ_MASK - iq_ring_count(r); +} + +static force_inline uint16_t +iq_ring_enqueue_burst(struct iq_ring *r, struct rte_event *qes, uint16_t nb_qes) +{ + const uint16_t read = r->read_idx; + uint16_t write = r->write_idx; + const uint16_t space = read + QID_IQ_MASK - write; + uint16_t i; + + if (space < nb_qes) + nb_qes = space; + + for (i = 0; i < nb_qes; i++, write++) + r->ring[write & QID_IQ_MASK] = qes[i]; + + r->write_idx = write; + + return nb_qes; +} + +static force_inline uint16_t +iq_ring_dequeue_burst(struct iq_ring *r, struct rte_event *qes, uint16_t nb_qes) +{ + uint16_t read = r->read_idx; + const uint16_t write = r->write_idx; + const uint16_t items = write - read; + uint16_t i; + + for (i = 0; i < nb_qes; i++, read++) + qes[i] = r->ring[read & QID_IQ_MASK]; + + if (items < nb_qes) + nb_qes = items; + + r->read_idx += nb_qes; + + return nb_qes; +} + +/* assumes there is space, from a previous dequeue_burst */ +static force_inline uint16_t +iq_ring_put_back(struct iq_
[dpdk-dev] [PATCH v4 10/17] event/sw: add support for event ports
From: Bruce Richardson Add in the data-structures for the ports used by workers to send packets to/from the scheduler. Also add in the functions to create/destroy those ports. Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- v3 -> v4: - fix port leaking credits on reconfigure --- drivers/event/sw/event_ring.h | 185 ++ drivers/event/sw/sw_evdev.c | 88 drivers/event/sw/sw_evdev.h | 78 ++ 3 files changed, 351 insertions(+) create mode 100644 drivers/event/sw/event_ring.h diff --git a/drivers/event/sw/event_ring.h b/drivers/event/sw/event_ring.h new file mode 100644 index 000..cdaee95 --- /dev/null +++ b/drivers/event/sw/event_ring.h @@ -0,0 +1,185 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Generic ring structure for passing events from one core to another. + * + * Used by the software scheduler for the producer and consumer rings for + * each port, i.e. for passing events from worker cores to scheduler and + * vice-versa. Designed for single-producer, single-consumer use with two + * cores working on each ring. + */ + +#ifndef _EVENT_RING_ +#define _EVENT_RING_ + +#include + +#include +#include +#include + +#define QE_RING_NAMESIZE 32 + +struct qe_ring { + char name[QE_RING_NAMESIZE] __rte_cache_aligned; + uint32_t ring_size; /* size of memory block allocated to the ring */ + uint32_t mask; /* mask for read/write values == ring_size -1 */ + uint32_t size; /* actual usable space in the ring */ + volatile uint32_t write_idx __rte_cache_aligned; + volatile uint32_t read_idx __rte_cache_aligned; + + struct rte_event ring[0] __rte_cache_aligned; +}; + +#ifndef force_inline +#define force_inline inline __attribute__((always_inline)) +#endif + +static inline struct qe_ring * +qe_ring_create(const char *name, unsigned int size, unsigned int socket_id) +{ + struct qe_ring *retval; + const uint32_t ring_size = rte_align32pow2(size + 1); + size_t memsize = sizeof(*retval) + + (ring_size * sizeof(retval->ring[0])); + + retval = rte_zmalloc_socket(NULL, memsize, 0, socket_id); + if (retval == NULL) + goto end; + + snprintf(retval->name, sizeof(retval->name), "EVDEV_RG_%s", name); + retval->ring_size = ring_size; + retval->mask = ring_size - 1; + retval->size = size; +end: + return retval; +} + +static inline void +qe_ring_destroy(struct qe_ring *r) +{ + rte_free(r); +} + +static force_inline unsigned int +qe_ring_count(const struct qe_ring *r) +{ + return r->write_idx - r->read_idx; +} + +static force_inline unsigned int +qe_ring_free_count(const struct qe_ring *r) +{ + return r->size - qe_ring_count(r); +} + +static force_inline unsigned int +qe_ring_enqueue_burst(struct qe_ring *r, const struct rte_event *qes, + unsigned int nb_qes, uint16_t *free_count) +{ + const uint32_t size = r->size; + const uint32_t mask = r->mask; + const uint32_t read = r->read_idx; + uint32_t write = r->write_idx; + const uint32_t space = read + size - write; + uint32_t i; + + if (space < nb_qes) + nb_qes = space; + + for (i = 0; i < nb_qes; i++, wri
[dpdk-dev] [PATCH v4 11/17] event/sw: add support for linking queues to ports
From: Bruce Richardson Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- v3 -> v4 - fix port ordered qid count by adding decrement in unlink() - fix port queue unlink count to allow reconfig from lb to single link --- drivers/event/sw/sw_evdev.c | 81 + 1 file changed, 81 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 4b8370d..82ac3bd 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "sw_evdev.h" #include "iq_ring.h" @@ -50,6 +51,84 @@ static void sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info); static int +sw_port_link(struct rte_eventdev *dev, void *port, const uint8_t queues[], + const uint8_t priorities[], uint16_t num) +{ + struct sw_port *p = (void *)port; + struct sw_evdev *sw = sw_pmd_priv(dev); + int i; + + RTE_SET_USED(priorities); + for (i = 0; i < num; i++) { + struct sw_qid *q = &sw->qids[queues[i]]; + + /* check for qid map overflow */ + if (q->cq_num_mapped_cqs >= RTE_DIM(q->cq_map)) + break; + + if (p->is_directed && p->num_qids_mapped > 0) + break; + + if (q->type == SW_SCHED_TYPE_DIRECT) { + /* check directed qids only map to one port */ + if (p->num_qids_mapped > 0) { + rte_errno = -EDQUOT; + break; + } + /* check port only takes a directed flow */ + if (num > 1) { + rte_errno = -EDQUOT; + break; + } + + p->is_directed = 1; + p->num_qids_mapped = 1; + } else if (q->type == RTE_SCHED_TYPE_ORDERED) { + p->num_ordered_qids++; + p->num_qids_mapped++; + } else if (q->type == RTE_SCHED_TYPE_ATOMIC) { + p->num_qids_mapped++; + } + + q->cq_map[q->cq_num_mapped_cqs] = p->id; + rte_smp_wmb(); + q->cq_num_mapped_cqs++; + } + return i; +} + +static int +sw_port_unlink(struct rte_eventdev *dev, void *port, uint8_t queues[], + uint16_t nb_unlinks) +{ + struct sw_port *p = (void *)port; + struct sw_evdev *sw = sw_pmd_priv(dev); + unsigned int i, j; + + int unlinked = 0; + for (i = 0; i < nb_unlinks; i++) { + struct sw_qid *q = &sw->qids[queues[i]]; + for (j = 0; j < q->cq_num_mapped_cqs; j++) { + if (q->cq_map[j] == p->id) { + q->cq_map[j] = + q->cq_map[q->cq_num_mapped_cqs - 1]; + rte_smp_wmb(); + q->cq_num_mapped_cqs--; + unlinked++; + + p->num_qids_mapped--; + + if (q->type == RTE_SCHED_TYPE_ORDERED) + p->num_ordered_qids--; + + continue; + } + } + } + return unlinked; +} + +static int sw_port_setup(struct rte_eventdev *dev, uint8_t port_id, const struct rte_event_port_conf *conf) { @@ -402,6 +481,8 @@ sw_probe(const char *name, const char *params) .port_def_conf = sw_port_def_conf, .port_setup = sw_port_setup, .port_release = sw_port_release, + .port_link = sw_port_link, + .port_unlink = sw_port_unlink, }; static const char *const args[] = { -- 2.7.4
[dpdk-dev] [PATCH v4 12/17] event/sw: add worker core functions
From: Bruce Richardson add the event enqueue, dequeue and release functions to the eventdev. These also include tracking of stats for observability in the load of the scheduler. Internally in the enqueue function, the various types of enqueue operations, to forward an existing event, to send a new event, to drop a previous event, are converted to a series of flags which will be used by the scheduler code to perform the needed actions for that event. Signed-off-by: Bruce Richardson Signed-off-by: Gage Eads Signed-off-by: Harry van Haaren --- drivers/event/sw/Makefile | 1 + drivers/event/sw/sw_evdev.c| 5 + drivers/event/sw/sw_evdev.h| 34 +++ drivers/event/sw/sw_evdev_worker.c | 188 + 4 files changed, 228 insertions(+) create mode 100644 drivers/event/sw/sw_evdev_worker.c diff --git a/drivers/event/sw/Makefile b/drivers/event/sw/Makefile index d6836e3..b6ecd91 100644 --- a/drivers/event/sw/Makefile +++ b/drivers/event/sw/Makefile @@ -53,6 +53,7 @@ EXPORT_MAP := rte_pmd_evdev_sw_version.map # library source files SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_worker.c # export include files SYMLINK-y-include += diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 82ac3bd..9b2816d 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -412,6 +412,7 @@ sw_dev_configure(const struct rte_eventdev *dev) sw->qid_count = conf->nb_event_queues; sw->port_count = conf->nb_event_ports; sw->nb_events_limit = conf->nb_events_limit; + rte_atomic32_set(&sw->inflights, 0); return 0; } @@ -550,6 +551,10 @@ sw_probe(const char *name, const char *params) return -EFAULT; } dev->dev_ops = &evdev_sw_ops; + dev->enqueue = sw_event_enqueue; + dev->enqueue_burst = sw_event_enqueue_burst; + dev->dequeue = sw_event_dequeue; + dev->dequeue_burst = sw_event_dequeue_burst; sw = dev->data->dev_private; sw->data = dev->data; diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index 1bedd63..ab372fd 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -55,12 +55,36 @@ #define SCHED_DEQUEUE_BURST_SIZE 32 #define SW_PORT_HIST_LIST (MAX_SW_PROD_Q_DEPTH) /* size of our history list */ +#define NUM_SAMPLES 64 /* how many data points use for average stats */ #define EVENTDEV_NAME_SW_PMD event_sw #define SW_PMD_NAME RTE_STR(event_sw) #define SW_SCHED_TYPE_DIRECT (RTE_SCHED_TYPE_PARALLEL + 1) +enum { + QE_FLAG_VALID_SHIFT = 0, + QE_FLAG_COMPLETE_SHIFT, + QE_FLAG_NOT_EOP_SHIFT, + _QE_FLAG_COUNT +}; + +#define QE_FLAG_VALID(1 << QE_FLAG_VALID_SHIFT)/* for NEW FWD, FRAG */ +#define QE_FLAG_COMPLETE (1 << QE_FLAG_COMPLETE_SHIFT) /* set for FWD, DROP */ +#define QE_FLAG_NOT_EOP (1 << QE_FLAG_NOT_EOP_SHIFT) /* set for FRAG only */ + +static const uint8_t sw_qe_flag_map[] = { + QE_FLAG_VALID /* NEW Event */, + QE_FLAG_VALID | QE_FLAG_COMPLETE /* FWD Event */, + QE_FLAG_COMPLETE /* RELEASE Event */, + + /* Values which can be used for future support for partial +* events, i.e. where one event comes back to the scheduler +* as multiple which need to be tracked together +*/ + QE_FLAG_VALID | QE_FLAG_COMPLETE | QE_FLAG_NOT_EOP, +}; + #ifdef RTE_LIBRTE_PMD_EVDEV_SW_DEBUG #define SW_LOG_INFO(fmt, args...) \ RTE_LOG(INFO, EVENTDEV, "[%s] %s() line %u: " fmt "\n", \ @@ -210,6 +234,8 @@ struct sw_evdev { /* Contains all ports - load balanced and directed */ struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned; + rte_atomic32_t inflights __rte_cache_aligned; + /* * max events in this instance. Cached here for performance. * (also available in data->conf.nb_events_limit) @@ -239,4 +265,12 @@ sw_pmd_priv_const(const struct rte_eventdev *eventdev) return eventdev->data->dev_private; } +uint16_t sw_event_enqueue(void *port, const struct rte_event *ev); +uint16_t sw_event_enqueue_burst(void *port, const struct rte_event ev[], + uint16_t num); + +uint16_t sw_event_dequeue(void *port, struct rte_event *ev, uint64_t wait); +uint16_t sw_event_dequeue_burst(void *port, struct rte_event *ev, uint16_t num, + uint64_t wait); + #endif /* _SW_EVDEV_H_ */ diff --git a/drivers/event/sw/sw_evdev_worker.c b/drivers/event/sw/sw_evdev_worker.c new file mode 100644 index 000..aed1597 --- /dev/null +++ b/drivers/event/sw/sw_evdev_worker.c @@ -0,0 +1,188 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted
[dpdk-dev] [PATCH v4 13/17] event/sw: add scheduling logic
From: Bruce Richardson Add in the scheduling function which takes the events from the producer queues and buffers them before scheduling them to consumer queues. The scheduling logic includes support for atomic, reordered, and parallel scheduling of flows. Signed-off-by: Bruce Richardson Signed-off-by: Gage Eads Signed-off-by: Harry van Haaren --- drivers/event/sw/Makefile | 1 + drivers/event/sw/sw_evdev.c | 1 + drivers/event/sw/sw_evdev.h | 11 + drivers/event/sw/sw_evdev_scheduler.c | 602 ++ 4 files changed, 615 insertions(+) create mode 100644 drivers/event/sw/sw_evdev_scheduler.c diff --git a/drivers/event/sw/Makefile b/drivers/event/sw/Makefile index b6ecd91..a7f5b3d 100644 --- a/drivers/event/sw/Makefile +++ b/drivers/event/sw/Makefile @@ -54,6 +54,7 @@ EXPORT_MAP := rte_pmd_evdev_sw_version.map # library source files SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_worker.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_scheduler.c # export include files SYMLINK-y-include += diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 9b2816d..b1ae2b6 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -555,6 +555,7 @@ sw_probe(const char *name, const char *params) dev->enqueue_burst = sw_event_enqueue_burst; dev->dequeue = sw_event_dequeue; dev->dequeue_burst = sw_event_dequeue_burst; + dev->schedule = sw_event_schedule; sw = dev->data->dev_private; sw->data = dev->data; diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index ab372fd..7c157c7 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -248,8 +248,18 @@ struct sw_evdev { /* Cache how many packets are in each cq */ uint16_t cq_ring_space[SW_PORTS_MAX] __rte_cache_aligned; + /* Array of pointers to load-balanced QIDs sorted by priority level */ + struct sw_qid *qids_prioritized[RTE_EVENT_MAX_QUEUES_PER_DEV]; + + /* Stats */ + struct sw_point_stats stats __rte_cache_aligned; + uint64_t sched_called; int32_t sched_quanta; + uint64_t sched_no_iq_enqueues; + uint64_t sched_no_cq_enqueues; + uint64_t sched_cq_qid_called; + uint8_t started; uint32_t credit_update_quanta; }; @@ -272,5 +282,6 @@ uint16_t sw_event_enqueue_burst(void *port, const struct rte_event ev[], uint16_t sw_event_dequeue(void *port, struct rte_event *ev, uint64_t wait); uint16_t sw_event_dequeue_burst(void *port, struct rte_event *ev, uint16_t num, uint64_t wait); +void sw_event_schedule(struct rte_eventdev *dev); #endif /* _SW_EVDEV_H_ */ diff --git a/drivers/event/sw/sw_evdev_scheduler.c b/drivers/event/sw/sw_evdev_scheduler.c new file mode 100644 index 000..2aecc95 --- /dev/null +++ b/drivers/event/sw/sw_evdev_scheduler.c @@ -0,0 +1,602 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include "sw_evdev.h" +#include "iq_ring.h" +#include "event_ring.h" + +#define SW_IQS_MASK (SW_IQS_MAX-1) + +/* Retrieve the highest priority IQ or -1 if no pkts available. Doing the + * CLZ twice is faster than caching the value due to
[dpdk-dev] [PATCH v4 15/17] event/sw: add dump function for easier debugging
From: Bruce Richardson Segfault issue resolved when only partially configured and rte_event_dev_dump() is called before start(), Reported-by: Vipin Varghese Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- drivers/event/sw/sw_evdev.c | 148 1 file changed, 148 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index d4d6d7f..2e43461 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -439,6 +439,153 @@ sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info) *info = evdev_sw_info; } +static void +sw_dump(struct rte_eventdev *dev, FILE *f) +{ + const struct sw_evdev *sw = sw_pmd_priv(dev); + + static const char * const q_type_strings[] = { + "Ordered", "Atomic", "Parallel", "Directed" + }; + uint32_t i; + fprintf(f, "EventDev %s: ports %d, qids %d\n", "todo-fix-name", + sw->port_count, sw->qid_count); + + fprintf(f, "\trx %"PRIu64"\n\tdrop %"PRIu64"\n\ttx %"PRIu64"\n", + sw->stats.rx_pkts, sw->stats.rx_dropped, sw->stats.tx_pkts); + fprintf(f, "\tsched calls: %"PRIu64"\n", sw->sched_called); + fprintf(f, "\tsched cq/qid call: %"PRIu64"\n", sw->sched_cq_qid_called); + fprintf(f, "\tsched no IQ enq: %"PRIu64"\n", sw->sched_no_iq_enqueues); + fprintf(f, "\tsched no CQ enq: %"PRIu64"\n", sw->sched_no_cq_enqueues); + uint32_t inflights = rte_atomic32_read(&sw->inflights); + uint32_t credits = sw->nb_events_limit - inflights; + fprintf(f, "\tinflight %d, credits: %d\n", inflights, credits); + +#define COL_RED "\x1b[31m" +#define COL_RESET "\x1b[0m" + + for (i = 0; i < sw->port_count; i++) { + int max, j; + const struct sw_port *p = &sw->ports[i]; + if (!p->initialized) { + fprintf(f, " %sPort %d not initialized.%s\n", + COL_RED, i, COL_RESET); + continue; + } + fprintf(f, " Port %d %s\n", i, + p->is_directed ? " (SingleCons)" : ""); + fprintf(f, "\trx %"PRIu64"\tdrop %"PRIu64"\ttx %"PRIu64 + "\t%sinflight %d%s\n", sw->ports[i].stats.rx_pkts, + sw->ports[i].stats.rx_dropped, + sw->ports[i].stats.tx_pkts, + (p->inflights == p->inflight_max) ? + COL_RED : COL_RESET, + sw->ports[i].inflights, COL_RESET); + + fprintf(f, "\tMax New: %u" + "\tAvg cycles PP: %"PRIu64"\tCredits: %u\n", + sw->ports[i].inflight_max, + sw->ports[i].avg_pkt_ticks, + sw->ports[i].inflight_credits); + fprintf(f, "\tReceive burst distribution:\n"); + float zp_percent = p->zero_polls * 100.0 / p->total_polls; + fprintf(f, zp_percent < 10 ? "\t\t0:%.02f%% " : "\t\t0:%.0f%% ", + zp_percent); + for (max = (int)RTE_DIM(p->poll_buckets); max-- > 0;) + if (p->poll_buckets[max] != 0) + break; + for (j = 0; j <= max; j++) { + if (p->poll_buckets[j] != 0) { + float poll_pc = p->poll_buckets[j] * 100.0 / + p->total_polls; + fprintf(f, "%u-%u:%.02f%% ", + ((j << SW_DEQ_STAT_BUCKET_SHIFT) + 1), + ((j+1) << SW_DEQ_STAT_BUCKET_SHIFT), + poll_pc); + } + } + fprintf(f, "\n"); + + if (p->rx_worker_ring) { + uint64_t used = qe_ring_count(p->rx_worker_ring); + uint64_t space = qe_ring_free_count(p->rx_worker_ring); + const char *col = (space == 0) ? COL_RED : COL_RESET; + fprintf(f, "\t%srx ring used: %4"PRIu64"\tfree: %4" + PRIu64 COL_RESET"\n", col, used, space); + } else + fprintf(f, "\trx ring not initialized.\n"); + + if (p->cq_worker_ring) { + uint64_t used = qe_ring_count(p->cq_worker_ring); + uint64_t space = qe_ring_free_count(p->cq_worker_ring); + const char *col = (space == 0) ? COL_RED : COL_RESET; + fprintf(f, "\t%scq ring used: %4"PRIu64"\tfree: %4" + PRIu64 COL_RESET"\n", col, used, space); + } else + fprintf(f, "\tcq ring not initialized.\n"); + } + +
[dpdk-dev] [PATCH v4 14/17] event/sw: add start stop and close functions
From: Bruce Richardson Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- drivers/event/sw/sw_evdev.c | 74 + 1 file changed, 74 insertions(+) diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index b1ae2b6..d4d6d7f 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -440,6 +440,77 @@ sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info) } static int +sw_start(struct rte_eventdev *dev) +{ + unsigned int i, j; + struct sw_evdev *sw = sw_pmd_priv(dev); + /* check all ports are set up */ + for (i = 0; i < sw->port_count; i++) + if (sw->ports[i].rx_worker_ring == NULL) { + printf("%s %d: port %d not configured\n", + __func__, __LINE__, i); + return -1; + } + + /* check all queues are configured and mapped to ports*/ + for (i = 0; i < sw->qid_count; i++) + if (sw->qids[i].iq[0] == NULL || + sw->qids[i].cq_num_mapped_cqs == 0) { + printf("%s %d: queue %d not configured\n", + __func__, __LINE__, i); + return -1; + } + + /* build up our prioritized array of qids */ + /* We don't use qsort here, as if all/multiple entries have the same +* priority, the result is non-deterministic. From "man 3 qsort": +* "If two members compare as equal, their order in the sorted +* array is undefined." +*/ + uint32_t qidx = 0; + for (j = 0; j <= RTE_EVENT_DEV_PRIORITY_LOWEST; j++) { + for (i = 0; i < sw->qid_count; i++) { + if (sw->qids[i].priority == j) { + sw->qids_prioritized[qidx] = &sw->qids[i]; + qidx++; + } + } + } + sw->started = 1; + return 0; +} + +static void +sw_stop(struct rte_eventdev *dev) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + sw->started = 0; +} + +static int +sw_close(struct rte_eventdev *dev) +{ + struct sw_evdev *sw = sw_pmd_priv(dev); + uint32_t i; + + for (i = 0; i < sw->qid_count; i++) + sw_queue_release(dev, i); + sw->qid_count = 0; + + for (i = 0; i < sw->port_count; i++) + sw_port_release(&sw->ports[i]); + sw->port_count = 0; + + memset(&sw->stats, 0, sizeof(sw->stats)); + sw->sched_called = 0; + sw->sched_no_iq_enqueues = 0; + sw->sched_no_cq_enqueues = 0; + sw->sched_cq_qid_called = 0; + + return 0; +} + +static int assign_numa_node(const char *key __rte_unused, const char *value, void *opaque) { int *socket_id = opaque; @@ -475,6 +546,9 @@ sw_probe(const char *name, const char *params) static const struct rte_eventdev_ops evdev_sw_ops = { .dev_configure = sw_dev_configure, .dev_infos_get = sw_info_get, + .dev_close = sw_close, + .dev_start = sw_start, + .dev_stop = sw_stop, .queue_def_conf = sw_queue_def_conf, .queue_setup = sw_queue_setup, -- 2.7.4
[dpdk-dev] [PATCH v4 17/17] app/test: add unit tests for SW eventdev driver
From: Bruce Richardson Since the sw driver is a standalone lookaside device that has no HW requirements, we can provide a set of unit tests that test its functionality across the different queue types and with different input scenarios. This also adds the tests to be automatically run by autotest.py Signed-off-by: Bruce Richardson Signed-off-by: David Hunt Signed-off-by: Harry van Haaren --- v3 -> v4: - add xstats reset test for device/port/queues - add xstats id abuse test - add xstats brute force test - add ordered queue reconfigure test - add port reconfig credits test - add port queue lb unlink to dir relink --- app/test/Makefile |5 +- app/test/autotest_data.py | 26 + app/test/test_eventdev_sw.c | 3179 +++ 3 files changed, 3209 insertions(+), 1 deletion(-) create mode 100644 app/test/test_eventdev_sw.c diff --git a/app/test/Makefile b/app/test/Makefile index a426548..dc92d9c 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -197,7 +197,10 @@ SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_perf.c SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c -SRCS-$(CONFIG_RTE_LIBRTE_EVENTDEV) += test_eventdev.c +ifeq ($(CONFIG_RTE_LIBRTE_EVENTDEV),y) +SRCS-y += test_eventdev.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += test_eventdev_sw.c +endif SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py index 0cd598b..165ed6c 100644 --- a/app/test/autotest_data.py +++ b/app/test/autotest_data.py @@ -346,6 +346,32 @@ def per_sockets(num): non_parallel_test_group_list = [ { +"Prefix":"eventdev", +"Memory":"512", +"Tests": +[ +{ +"Name":"Eventdev common autotest", +"Command": "eventdev_common_autotest", +"Func":default_autotest, +"Report": None, +}, +] +}, +{ +"Prefix":"eventdev_sw", +"Memory":"512", +"Tests": +[ +{ +"Name":"Eventdev sw autotest", +"Command": "eventdev_sw_autotest", +"Func":default_autotest, +"Report": None, +}, +] +}, +{ "Prefix":"kni", "Memory":"512", "Tests": diff --git a/app/test/test_eventdev_sw.c b/app/test/test_eventdev_sw.c new file mode 100644 index 000..f449e9e --- /dev/null +++ b/app/test/test_eventdev_sw.c @@ -0,0 +1,3179 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include "test.h" + +#define MAX_PORTS 16 +#define MAX_QIDS 16 +#define NUM_PACKETS (1<<18) + +static int evdev; + +struct test { + struct rte_mempool *mbuf_pool; + uint8_t port[MAX_PORTS]; + uint8_t qid[MAX_QIDS]; + int nb_qids; +}; + +static struct rte_event release_ev = {.op = RTE_EVENT_OP_RELEASE }; + +static inline struct rte_mbuf * +rte_gen_arp(int portid, struct rte_mempool *mp) +{ + /* +
[dpdk-dev] [PATCH v4 16/17] event/sw: add xstats support
From: Bruce Richardson Add support for xstats to report out on the state of the eventdev. Useful for debugging and for unit tests, as well as observability at runtime and performance tuning of apps to work well with the scheduler. Signed-off-by: Bruce Richardson Signed-off-by: Harry van Haaren --- v3 -> v4: - xstats reset with ID selection implemented --- drivers/event/sw/Makefile | 1 + drivers/event/sw/sw_evdev.c| 8 + drivers/event/sw/sw_evdev.h| 33 +- drivers/event/sw/sw_evdev_xstats.c | 674 + 4 files changed, 715 insertions(+), 1 deletion(-) create mode 100644 drivers/event/sw/sw_evdev_xstats.c diff --git a/drivers/event/sw/Makefile b/drivers/event/sw/Makefile index a7f5b3d..eb0dc4c 100644 --- a/drivers/event/sw/Makefile +++ b/drivers/event/sw/Makefile @@ -55,6 +55,7 @@ EXPORT_MAP := rte_pmd_evdev_sw_version.map SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_worker.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_scheduler.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_SW_EVENTDEV) += sw_evdev_xstats.c # export include files SYMLINK-y-include += diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c index 2e43461..7d25ab2 100644 --- a/drivers/event/sw/sw_evdev.c +++ b/drivers/event/sw/sw_evdev.c @@ -623,6 +623,8 @@ sw_start(struct rte_eventdev *dev) } } } + if (sw_xstats_init(sw) < 0) + return -1; sw->started = 1; return 0; } @@ -631,6 +633,7 @@ static void sw_stop(struct rte_eventdev *dev) { struct sw_evdev *sw = sw_pmd_priv(dev); + sw_xstats_uninit(sw); sw->started = 0; } @@ -706,6 +709,11 @@ sw_probe(const char *name, const char *params) .port_release = sw_port_release, .port_link = sw_port_link, .port_unlink = sw_port_unlink, + + .xstats_get = sw_xstats_get, + .xstats_get_names = sw_xstats_get_names, + .xstats_get_by_name = sw_xstats_get_by_name, + .xstats_reset = sw_xstats_reset, }; static const char *const args[] = { diff --git a/drivers/event/sw/sw_evdev.h b/drivers/event/sw/sw_evdev.h index 7c157c7..61c671d 100644 --- a/drivers/event/sw/sw_evdev.h +++ b/drivers/event/sw/sw_evdev.h @@ -62,6 +62,8 @@ #define SW_SCHED_TYPE_DIRECT (RTE_SCHED_TYPE_PARALLEL + 1) +#define SW_NUM_POLL_BUCKETS (MAX_SW_CONS_Q_DEPTH >> SW_DEQ_STAT_BUCKET_SHIFT) + enum { QE_FLAG_VALID_SHIFT = 0, QE_FLAG_COMPLETE_SHIFT, @@ -203,7 +205,7 @@ struct sw_port { uint64_t avg_pkt_ticks; /* tracks average over NUM_SAMPLES burst */ uint64_t total_polls;/* how many polls were counted in stats */ uint64_t zero_polls; /* tracks polls returning nothing */ - uint32_t poll_buckets[MAX_SW_CONS_Q_DEPTH >> SW_DEQ_STAT_BUCKET_SHIFT]; + uint32_t poll_buckets[SW_NUM_POLL_BUCKETS]; /* bucket values in 4s for shorter reporting */ /* History list structs, containing info on pkts egressed to worker */ @@ -230,6 +232,11 @@ struct sw_evdev { uint32_t port_count; uint32_t qid_count; + uint32_t xstats_count; + struct sw_xstats_entry *xstats; + uint32_t xstats_count_mode_dev; + uint32_t xstats_count_mode_port; + uint32_t xstats_count_mode_queue; /* Contains all ports - load balanced and directed */ struct sw_port ports[SW_PORTS_MAX] __rte_cache_aligned; @@ -261,6 +268,13 @@ struct sw_evdev { uint8_t started; uint32_t credit_update_quanta; + + /* store num stats and offset of the stats for each port */ + uint16_t xstats_count_per_port[SW_PORTS_MAX]; + uint16_t xstats_offset_for_port[SW_PORTS_MAX]; + /* store num stats and offset of the stats for each queue */ + uint16_t xstats_count_per_qid[RTE_EVENT_MAX_QUEUES_PER_DEV]; + uint16_t xstats_offset_for_qid[RTE_EVENT_MAX_QUEUES_PER_DEV]; }; static inline struct sw_evdev * @@ -283,5 +297,22 @@ uint16_t sw_event_dequeue(void *port, struct rte_event *ev, uint64_t wait); uint16_t sw_event_dequeue_burst(void *port, struct rte_event *ev, uint16_t num, uint64_t wait); void sw_event_schedule(struct rte_eventdev *dev); +int sw_xstats_init(struct sw_evdev *dev); +int sw_xstats_uninit(struct sw_evdev *dev); +int sw_xstats_get_names(const struct rte_eventdev *dev, + enum rte_event_dev_xstats_mode mode, uint8_t queue_port_id, + struct rte_event_dev_xstats_name *xstats_names, + unsigned int *ids, unsigned int size); +int sw_xstats_get(const struct rte_eventdev *dev, + enum rte_event_dev_xstats_mode mode, uint8_t queue_port_id, + const unsigned int ids[], uint64_t values[], unsigned int n
[dpdk-dev] [PATCH v6 0/3] examples/l3fwd: merge l3fwd-acl code into l3fwd
This patchset merges l3fwd-acl and l3fwd code into common directory. Adds config file read option to build LPM and EM tables. Ravi Kerur (3): examples/l3fwd: merge l3fwd-acl code into l3fwd examples/l3fwd: add config file support for lpm examples/l3fwd: add config file support for exact examples/l3fwd-acl/Makefile | 56 - examples/l3fwd-acl/main.c | 2079 - examples/l3fwd/Makefile |2 +- examples/l3fwd/l3fwd.h| 77 ++ examples/l3fwd/l3fwd_acl.c| 1033 ++ examples/l3fwd/l3fwd_acl.h| 234 + examples/l3fwd/l3fwd_acl_scalar.h | 182 examples/l3fwd/l3fwd_em.c | 390 +-- examples/l3fwd/l3fwd_lpm.c| 323 -- examples/l3fwd/main.c | 250 +++-- 10 files changed, 2286 insertions(+), 2340 deletions(-) delete mode 100644 examples/l3fwd-acl/Makefile delete mode 100644 examples/l3fwd-acl/main.c create mode 100644 examples/l3fwd/l3fwd_acl.c create mode 100644 examples/l3fwd/l3fwd_acl.h create mode 100644 examples/l3fwd/l3fwd_acl_scalar.h -- 2.7.4
[dpdk-dev] [PATCH v6 1/3] examples/l3fwd: merge l3fwd-acl code into l3fwd
Merge l3fwd-acl code into l3fwd with '-A' cmdline option to run ACL. Performance critical ACL inline functions are moved from l3fwd-acl/main.c to l3fwd/l3fwd_acl_scalar.h. --- v6: > Change commit message format. v5: > None. v4: > Initialize rss_hf to IP for LPM, EM and ACL. > Update rss_hf with l4 in parse_args for ACL. > Fix pending checkpatch code indentation warning. v3: > Fix additional checkpatch coding style issues. v2: > Fix checkpatch errors and warnings related to non strings greater than 80 characters. > MACRO GET_CB_FIELD and strings greater than 80 characters warnings are not fixed. v1: l3fwd-acl changes: > Merge common init code in l3fwd-acl and l3fwd into main.c. > Move non-critical inline functions to l3fwd_acl.h. > Move critial packet processing inline functions to l3fwd_acl_scalar.h > Move l3fwd-acl init code to l3fwd_acl.c. > Delete l3fwd-acl directory. l3fwd changes: > Add '-A' as an option for ACL processing. > Merge parsing options from l3fwd-acl and l3fwd. Retain l3fwd-acl definitions. > Move specific setup functions (setup_acl, setup_lpm and setup_hash). Testing: > Compiled successfully for x86_64-native-linuxapp-gcc > Tested LPM, EM and ACL basic functionality. Signed-off-by: Ravi Kerur --- examples/l3fwd-acl/Makefile | 56 - examples/l3fwd-acl/main.c | 2079 - examples/l3fwd/Makefile |2 +- examples/l3fwd/l3fwd.h| 49 + examples/l3fwd/l3fwd_acl.c| 1064 +++ examples/l3fwd/l3fwd_acl.h| 263 + examples/l3fwd/l3fwd_acl_scalar.h | 182 examples/l3fwd/l3fwd_em.c | 14 +- examples/l3fwd/l3fwd_lpm.c| 23 +- examples/l3fwd/main.c | 209 ++-- 10 files changed, 1722 insertions(+), 2219 deletions(-) delete mode 100644 examples/l3fwd-acl/Makefile delete mode 100644 examples/l3fwd-acl/main.c create mode 100644 examples/l3fwd/l3fwd_acl.c create mode 100644 examples/l3fwd/l3fwd_acl.h create mode 100644 examples/l3fwd/l3fwd_acl_scalar.h diff --git a/examples/l3fwd-acl/Makefile b/examples/l3fwd-acl/Makefile deleted file mode 100644 index a3473a8..000 --- a/examples/l3fwd-acl/Makefile +++ /dev/null @@ -1,56 +0,0 @@ -# BSD LICENSE -# -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. -# All rights reserved. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# * Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# * Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in -# the documentation and/or other materials provided with the -# distribution. -# * Neither the name of Intel Corporation nor the names of its -# contributors may be used to endorse or promote products derived -# from this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -ifeq ($(RTE_SDK),) -$(error "Please define RTE_SDK environment variable") -endif - -# Default target, can be overriden by command line or environment -RTE_TARGET ?= x86_64-native-linuxapp-gcc - -include $(RTE_SDK)/mk/rte.vars.mk - -# binary name -APP = l3fwd-acl - -# all source are stored in SRCS-y -SRCS-y := main.c - -CFLAGS += -O3 -CFLAGS += $(WERROR_FLAGS) - -# workaround for a gcc bug with noreturn attribute -# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 -ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) -CFLAGS_main.o += -Wno-return-type -endif - -include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/l3fwd-acl/main.c b/examples/l3fwd-acl/main.c deleted file mode 100644 index 0e3daad..000 --- a/examples/l3fwd-a
[dpdk-dev] [PATCH v6 2/3] examples/l3fwd: add config file support for lpm
Add support to read from config file to build ipv4 and ipv6 longest prefix match forwarding tables. --- v6: > Change commit message format. v5: > Change is_bypass_line from inline to non-line. v4: > No changes. v3: > Fix additional checkpatch coding style issues. v2: > Fix checkpatch warnings related to code > MACRO GET_CB_FIELD checkpatch warning not fixed v1: > Remove static array configuration of Destination IP, MASK and IF_OUT for LPM and LPM6 config. > Add reading configuration from a file. > Format of configuration file is as follows #LPM route entries Dest-IP/Mask IF_OUT L1.1.1.0/24 0 L2.1.1.0/24 1 L3.1.1.0/24 2 ... #LPM6 route entries Dest-IP/Mask IF_OUT L:::::::/48 0 L2111:::::::/48 1 L3111:::::::/48 2 ... Signed-off-by: Ravi Kerur --- examples/l3fwd/l3fwd.h | 28 + examples/l3fwd/l3fwd_acl.c | 39 +- examples/l3fwd/l3fwd_acl.h | 29 - examples/l3fwd/l3fwd_lpm.c | 308 + examples/l3fwd/main.c | 47 ++- 5 files changed, 332 insertions(+), 119 deletions(-) diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 93e08f6..aa4bd25 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -94,6 +94,29 @@ #define ACL_LEAD_CHAR ('@') #define ROUTE_LEAD_CHAR('R') #define COMMENT_LEAD_CHAR ('#') +#define LPM_LEAD_CHAR ('L') +#define EM_LEAD_CHAR ('E') + +#defineIPV6_ADDR_LEN 16 +#defineIPV6_ADDR_U16 (IPV6_ADDR_LEN / sizeof(uint16_t)) +#defineIPV6_ADDR_U32 (IPV6_ADDR_LEN / sizeof(uint32_t)) + +#define GET_CB_FIELD(in, fd, base, lim, dlm) do {\ + unsigned long val; \ + char *end; \ + errno = 0; \ + val = strtoul((in), &end, (base)); \ + if (errno != 0 || end[0] != (dlm) || val > (lim)) \ + return -EINVAL; \ + (fd) = (typeof(fd))val; \ + (in) = end + 1; \ +} while (0) + +struct parm_cfg { + const char *rule_ipv4_name; + const char *rule_ipv6_name; + int scalar; +}; struct mbuf_table { uint16_t len; @@ -134,6 +157,8 @@ extern xmm_t val_eth[RTE_MAX_ETHPORTS]; extern struct lcore_conf lcore_conf[RTE_MAX_LCORE]; +extern struct parm_cfg parm_config; + extern int numa_on; /**< NUMA is enabled by default. */ /* Send burst of packets on an output interface */ @@ -287,4 +312,7 @@ l3fwd_acl_set_rule_ipv6_name(const char *optarg); void l3fwd_acl_set_rule_ipv4_name(const char *optarg); +int +is_bypass_line(const char *buff); + #endif /* __L3_FWD_H__ */ diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c index 388b978..66ed23d 100644 --- a/examples/l3fwd/l3fwd_acl.c +++ b/examples/l3fwd/l3fwd_acl.c @@ -147,10 +147,6 @@ struct rte_acl_field_def ipv4_defs[NUM_FIELDS_IPV4] = { }, }; -#defineIPV6_ADDR_LEN 16 -#defineIPV6_ADDR_U16 (IPV6_ADDR_LEN / sizeof(uint16_t)) -#defineIPV6_ADDR_U32 (IPV6_ADDR_LEN / sizeof(uint32_t)) - enum { PROTO_FIELD_IPV6, SRC1_FIELD_IPV6, @@ -297,12 +293,6 @@ static struct { const char cb_port_delim[] = ":"; -static struct { - const char *rule_ipv4_name; - const char *rule_ipv6_name; - int scalar; -} parm_config; - /* * Print and dump ACL/Route rules functions are defined in * following header file. @@ -316,27 +306,6 @@ static struct { #include "l3fwd_acl_scalar.h" /* - * API's called during initialization to setup ACL rules. - */ -void -l3fwd_acl_set_rule_ipv4_name(const char *optarg) -{ - parm_config.rule_ipv4_name = optarg; -} - -void -l3fwd_acl_set_rule_ipv6_name(const char *optarg) -{ - parm_config.rule_ipv6_name = optarg; -} - -void -l3fwd_acl_set_scalar(void) -{ - parm_config.scalar = 1; -} - -/* * Parses IPV6 address, exepcts the following format: * ::::::: (where X - is a hexedecimal digit). */ @@ -566,7 +535,7 @@ parse_cb_ipv4vlan_rule(char *str, struct rte_acl_rule *v, int has_userdata) } static int -add_rules(const char *rule_path, +acl_add_rules(const char *rule_path, struct rte_acl_rule **proute_base, unsigned int *proute_num, struct rte_acl_rule **pacl_base, @@ -764,8 +733,8 @@ setup_acl(const int socket_id __attribute__((unused))) dump_acl_config(); - /* Load rules from the input
[dpdk-dev] [PATCH v6 3/3] examples/l3fwd: add config file support for exact
Add support to read from config file to build ipv4 and ipv6 exact match forwarding tables. --- v6: > Change commit message format v5: > No changes. v4: > No changes. v3: > Fix additional checkpatch coding style issues. v2: > Fix checkpatch warnings. v1: > Remove static array configuration of Dest IP,Src IP, Dest port, Src port, Proto and IF_OUT for EM and EM6 config. > Add reading configuration from a file. > Format of configuration file is as follows #EM route entries, #Dest-IP Src-IP Dest-port Src-port Proto IF_OUT E101.0.0.0 100.10.0.0 101 11 0x06 0 E201.0.0.0 200.20.0.0 102 12 0x06 1 E111.0.0.0 211.30.0.0 101 11 0x06 2 ... #EM6 route entries #Dest-IP Src-IP Dest-port Src-port Proto IF_OUT Efe80::::021e:67ff:fe00: fe80::::021b:21ff:fe91:3805 101 11 0x06 0 Efe90::::021e:67ff:fe00: fe90::::021b:21ff:fe91:3805 102 12 0x06 1 ... Signed-off-by: Ravi Kerur --- examples/l3fwd/l3fwd_em.c | 376 +- 1 file changed, 303 insertions(+), 73 deletions(-) diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 6fdabf7..cd6b443 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -95,8 +95,14 @@ union ipv4_5tuple_host { #define XMM_NUM_IN_IPV6_5TUPLE 3 struct ipv6_5tuple { - uint8_t ip_dst[IPV6_ADDR_LEN]; - uint8_t ip_src[IPV6_ADDR_LEN]; + union { + uint8_t ip_dst[IPV6_ADDR_LEN]; + uint32_t ip32_dst[4]; + }; + union { + uint8_t ip_src[IPV6_ADDR_LEN]; + uint32_t ip32_src[4]; + }; uint16_t port_dst; uint16_t port_src; uint8_t proto; @@ -116,47 +122,24 @@ union ipv6_5tuple_host { xmm_t xmm[XMM_NUM_IN_IPV6_5TUPLE]; }; - - -struct ipv4_l3fwd_em_route { - struct ipv4_5tuple key; - uint8_t if_out; +enum { + CB_FLD_DST_ADDR, + CB_FLD_SRC_ADDR, + CB_FLD_DST_PORT, + CB_FLD_SRC_PORT, + CB_FLD_PROTO, + CB_FLD_IF_OUT, + CB_FLD_MAX }; -struct ipv6_l3fwd_em_route { - struct ipv6_5tuple key; +struct em_rule { + union { + struct ipv4_5tuple v4_key; + struct ipv6_5tuple v6_key; + }; uint8_t if_out; }; -static struct ipv4_l3fwd_em_route ipv4_l3fwd_em_route_array[] = { - {{IPv4(101, 0, 0, 0), IPv4(100, 10, 0, 1), 101, 11, IPPROTO_TCP}, 0}, - {{IPv4(201, 0, 0, 0), IPv4(200, 20, 0, 1), 102, 12, IPPROTO_TCP}, 1}, - {{IPv4(111, 0, 0, 0), IPv4(100, 30, 0, 1), 101, 11, IPPROTO_TCP}, 2}, - {{IPv4(211, 0, 0, 0), IPv4(200, 40, 0, 1), 102, 12, IPPROTO_TCP}, 3}, -}; - -static struct ipv6_l3fwd_em_route ipv6_l3fwd_em_route_array[] = { - {{ - {0xfe, 0x80, 0, 0, 0, 0, 0, 0, 0x02, 0x1e, 0x67, 0xff, 0xfe, 0, 0, 0}, - {0xfe, 0x80, 0, 0, 0, 0, 0, 0, 0x02, 0x1b, 0x21, 0xff, 0xfe, 0x91, 0x38, 0x05}, - 101, 11, IPPROTO_TCP}, 0}, - - {{ - {0xfe, 0x90, 0, 0, 0, 0, 0, 0, 0x02, 0x1e, 0x67, 0xff, 0xfe, 0, 0, 0}, - {0xfe, 0x90, 0, 0, 0, 0, 0, 0, 0x02, 0x1b, 0x21, 0xff, 0xfe, 0x91, 0x38, 0x05}, - 102, 12, IPPROTO_TCP}, 1}, - - {{ - {0xfe, 0xa0, 0, 0, 0, 0, 0, 0, 0x02, 0x1e, 0x67, 0xff, 0xfe, 0, 0, 0}, - {0xfe, 0xa0, 0, 0, 0, 0, 0, 0, 0x02, 0x1b, 0x21, 0xff, 0xfe, 0x91, 0x38, 0x05}, - 101, 11, IPPROTO_TCP}, 2}, - - {{ - {0xfe, 0xb0, 0, 0, 0, 0, 0, 0, 0x02, 0x1e, 0x67, 0xff, 0xfe, 0, 0, 0}, - {0xfe, 0xb0, 0, 0, 0, 0, 0, 0, 0x02, 0x1b, 0x21, 0xff, 0xfe, 0x91, 0x38, 0x05}, - 102, 12, IPPROTO_TCP}, 3}, -}; - struct rte_hash *ipv4_l3fwd_em_lookup_struct[NB_SOCKETS]; struct rte_hash *ipv6_l3fwd_em_lookup_struct[NB_SOCKETS]; @@ -233,12 +216,6 @@ ipv6_hash_crc(const void *data, __rte_unused uint32_t data_len, return init_val; } -#define IPV4_L3FWD_EM_NUM_ROUTES \ - (sizeof(ipv4_l3fwd_em_route_array) / sizeof(ipv4_l3fwd_em_route_array[0])) - -#define IPV6_L3FWD_EM_NUM_ROUTES \ - (sizeof(ipv6_l3fwd_em_route_array) / sizeof(ipv6_l3fwd_em_route_array[0])) - static uint8_t ipv4_l3fwd_out_if[L3FWD_HASH_ENTRIES] __rte_cache_aligned; static uint8_t ipv6_l3fwd_out_if[L3FWD_HASH_ENTRIES] __rte_cache_aligned; @@ -338,6 +315,224 @@ em_get_ipv6_dst_port(void *ipv6_hdr, uint8_t portid, void *lookup_struct) #include "l3fwd_em.h" #endif +static int +em_parse_v6_addr(const char *in, const char **end, uint32_t v[IPV6_ADDR_U32], + char dlm) +{ + uint32_t addr[IPV6_ADDR_U16]; + + GET_CB_FIELD(in, addr[0], 16, UINT16_MAX, ':'); + GET_CB_FIELD(in, addr[1], 16, UINT16_MAX, ':'); + GET_CB_FIELD(in, addr[2], 16, UINT
Re: [dpdk-dev] [PATCH v2 00/13] introduce fail-safe PMD
On Fri, Mar 10, 2017 at 10:13:32AM +0100, Gaëtan Rivet wrote: > On Thu, Mar 09, 2017 at 09:15:14AM +, Bruce Richardson wrote: > > On Wed, Mar 08, 2017 at 11:54:02AM -0500, Neil Horman wrote: > > > On Wed, Mar 08, 2017 at 04:15:33PM +0100, Gaetan Rivet wrote: > > > > This PMD intercepts and manages Ethernet device removal events issued by > > > > slave PMDs and re-initializes them transparently when brought back so > > > > that > > > > existing applications do not need to be modified to benefit from true > > > > hot-plugging support. > > > > > > > > The stacked PMD approach shares many similarities with the bonding PMD > > > > but > > > > with a different purpose. While bonding provides the ability to group > > > > several links into a single logical device for enhanced throughput and > > > > supports fail-over at link level, this one manages the sudden > > > > disappearance > > > > of the underlying device; it guarantees applications face a valid > > > > device in > > > > working order at all times. > > > > > > > Why not just add this feature to the bonding pmd then? A bond is > > > perfectly > > > capable of handling the trivial case of a single underlying device, and > > > adding > > > an option to make the underly slave 'persistent' seem both much simpler > > > in terms > > > of implementation and code size, than adding an entire new pmd, along > > > with its > > > supporting code. > > > > > > Neil > > > > > @Neil > I don't know if you saw my answer to Bruce on the matter [1], it > partially adresses your point. > I did, and I think it asserts your points, but doesn't really address mine. See below. > > +1 > > I don't like the idea of having multiple PMDs in DPDK to handle > > combining multiple other devices into one. > > > > /Bruce > > I understand the concern. Let's first put aside for the moment the link > grouping, which is only part of the fail-safe PMD function. > > The fail-safe PMD at its core, provides an alternative paradigm, a new > proposal for a hot-plug functionality in a lightweight form-factor from a > user standpoint. > Ok, but lets be clear here, you're duplicating alot of functionality. And for that privlidge, there will be an additional 4000 lines of code to maintain. > The central question that I would like to tackle is this: why should we > require from our users declaring a bonding device to have hot-plug support? > We'll, strictly speaking, I suppose we don't have to require it. But by that same token, we don't need to do it in a separate PMD either, there are lots of other options. > I took some time to illustrate a few modes of operation: > > Fig. 1 > >.-. >| application | >`--.--' > | > .'-.-. <-- init, conf, Rx/Tx > | | | > | .---|--.--|--. <--- conf, link check, Rx/Tx > | | | | | | > v | v v v v > .-. | .---. .--. > | bonding | | | ixgbe | | mlx4 | > `.' | `---' `--' > | | > `--' > > Typical link fail-over. > > > Fig. 2 > > .-. > | application | > `--.--' > | < init, conf, Rx/Tx > v > .---. > | fail-safe | > `-.-' > | > .---'. <--- init, conf, dev check, Rx/Tx > || > vv > .---. .--. > | ixgbe | | mlx4 | > `---' `--' > > Typical automatic hot-plug handling with device fail-over. > > > Fig. 3 > >.-. >| application | >`--.--' > | > .'-.-. <-- init, conf, Rx/Tx > | | | > | .---|--.--|--. <--- conf, link check, Rx/Tx > | | | | | | > v | v v v v > .-. | .---. .---. > | bonding | | | fail-safe | | fail-safe | > `.' | `-.-' `-.-' > | | | | <-- init, conf, dev check, Rx/Tx > `--' v v > .---. .--. > | ixgbe | | mlx4 | > `---' `--' > > Combination to provide link fail-over with automatic hot-plug handling. > > > Fig. 4 > >.-. >| application | >`--.--' > | > .'-.-. <-- init, conf, Rx/Tx > | | | > | .---|--.--|--. <--- conf, link check, Rx/Tx > | | | | | | > v | v v v v > .-. | .---. .---. > | bonding | | | fail-safe | | fail-safe | > `.' | `-.-' `-.-' > | | | | <--- init, conf, dev check, Rx/Tx > `--' | | > .--'---. .---'--. > | | | | > v
[dpdk-dev] [PATCH v2 0/2] net/mlx5: add enhanced multi-packet send for ConnectX-5
This patchset is to add the Enhanced Multi-Packet Send feature which is newly introduced for ConnectX-5 families of adaptors. v2: * Resolves conflicts with other patches in review. * Improved performance by relocating code segment. * Changes default values of PMD options. * Fixed comments in the code. Yongseok Koh (2): net/mlx5: add enhanced multi-packet send for ConnectX-5 doc: update PMD options for mlx5 doc/guides/nics/mlx5.rst | 31 +++- drivers/net/mlx5/mlx5.c| 37 +++- drivers/net/mlx5/mlx5.h| 4 +- drivers/net/mlx5/mlx5_defs.h | 7 + drivers/net/mlx5/mlx5_ethdev.c | 6 +- drivers/net/mlx5/mlx5_prm.h| 20 ++ drivers/net/mlx5/mlx5_rxtx.c | 410 + drivers/net/mlx5/mlx5_rxtx.h | 7 +- drivers/net/mlx5/mlx5_txq.c| 28 ++- 9 files changed, 533 insertions(+), 17 deletions(-) -- 2.11.0
[dpdk-dev] [PATCH v2 1/2] net/mlx5: add enhanced multi-packet send for ConnectX-5
ConnectX-5 supports enhanced version of multi-packet send (MPS). An MPS Tx descriptor can carry multiple packets either by including pointers of packets or by inlining packets. Inlining packet data can be helpful to better utilize PCIe bandwidth. In addition, Enhanced MPS supports hybrid mode - mixing inlined packets and pointers in a descriptor. This feature is enabled by default if supported by HW. Signed-off-by: Yongseok Koh --- drivers/net/mlx5/mlx5.c| 37 +++- drivers/net/mlx5/mlx5.h| 4 +- drivers/net/mlx5/mlx5_defs.h | 7 + drivers/net/mlx5/mlx5_ethdev.c | 6 +- drivers/net/mlx5/mlx5_prm.h| 20 ++ drivers/net/mlx5/mlx5_rxtx.c | 410 + drivers/net/mlx5/mlx5_rxtx.h | 7 +- drivers/net/mlx5/mlx5_txq.c| 28 ++- 8 files changed, 506 insertions(+), 13 deletions(-) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 6f42948ab..5293f053e 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -84,6 +84,12 @@ /* Device parameter to enable multi-packet send WQEs. */ #define MLX5_TXQ_MPW_EN "txq_mpw_en" +/* Device parameter to include 2 dsegs in the title WQEBB. */ +#define MLX5_TXQ_MPW_HDR_DSEG_EN "txq_mpw_hdr_dseg_en" + +/* Device parameter to limit the size of inlining packet. */ +#define MLX5_TXQ_MAX_INLINE_LEN "txq_max_inline_len" + /* Device parameter to enable hardware TSO offload. */ #define MLX5_TSO "tso" @@ -292,7 +298,11 @@ mlx5_args_check(const char *key, const char *val, void *opaque) } else if (strcmp(MLX5_TXQS_MIN_INLINE, key) == 0) { priv->txqs_inline = tmp; } else if (strcmp(MLX5_TXQ_MPW_EN, key) == 0) { - priv->mps &= !!tmp; /* Enable MPW only if HW supports */ + priv->mps = !!tmp ? priv->mps : MLX5_MPW_DISABLED; + } else if (strcmp(MLX5_TXQ_MPW_HDR_DSEG_EN, key) == 0) { + priv->mpw_hdr_dseg = !!tmp; + } else if (strcmp(MLX5_TXQ_MAX_INLINE_LEN, key) == 0) { + priv->txq_max_inline_len = tmp; } else if (strcmp(MLX5_TSO, key) == 0) { priv->tso = !!tmp; } else { @@ -321,6 +331,8 @@ mlx5_args(struct priv *priv, struct rte_devargs *devargs) MLX5_TXQ_INLINE, MLX5_TXQS_MIN_INLINE, MLX5_TXQ_MPW_EN, + MLX5_TXQ_MPW_HDR_DSEG_EN, + MLX5_TXQ_MAX_INLINE_LEN, MLX5_TSO, NULL, }; @@ -432,24 +444,27 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) switch (pci_dev->id.device_id) { case PCI_DEVICE_ID_MELLANOX_CONNECTX4: tunnel_en = 1; - mps = 0; + mps = MLX5_MPW_DISABLED; break; case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX: + mps = MLX5_MPW; + break; case PCI_DEVICE_ID_MELLANOX_CONNECTX5: case PCI_DEVICE_ID_MELLANOX_CONNECTX5VF: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EX: case PCI_DEVICE_ID_MELLANOX_CONNECTX5EXVF: - mps = 1; tunnel_en = 1; + mps = MLX5_MPW_ENHANCED; break; default: - mps = 0; + mps = MLX5_MPW_DISABLED; } INFO("PCI information matches, using device \"%s\"" -" (SR-IOV: %s, MPS: %s)", +" (SR-IOV: %s, %sMPS: %s)", list[i]->name, sriov ? "true" : "false", -mps ? "true" : "false"); +mps == MLX5_MPW_ENHANCED ? "Enhanced " : "", +mps != MLX5_MPW_DISABLED ? "true" : "false"); attr_ctx = ibv_open_device(list[i]); err = errno; break; @@ -544,6 +559,13 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) priv->pd = pd; priv->mtu = ETHER_MTU; priv->mps = mps; /* Enable MPW by default if supported. */ + /* Set default values for Enhanced MPW, a.k.a MPWv2. */ + if (mps == MLX5_MPW_ENHANCED) { + priv->mpw_hdr_dseg = 0; + priv->txqs_inline = MLX5_EMPW_MIN_TXQS; + priv->txq_max_inline_len = MLX5_EMPW_MAX_INLINE_LEN; + priv->txq_inline = MLX5_WQE_SIZE_MAX - MLX5_WQE_SIZE; + } priv->cqe_comp = 1; /* Enable compression by default. */ priv->tunnel_en = tunnel_en; err = mlx5_args(priv, pci_dev->device.devargs); @@ -611,6 +633,9 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev) "with TSO. MPS disabled");
[dpdk-dev] [PATCH v2 2/2] doc: update PMD options for mlx5
Enhanced multi-packet send mode is newly introduced for ConnectX-5 families of adaptors. Signed-off-by: Yongseok Koh --- doc/guides/nics/mlx5.rst | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 41f3a472e..0783aebdd 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -183,10 +183,17 @@ Run-time configuration - ``txq_mpw_en`` parameter [int] - A nonzero value enables multi-packet send. This feature allows the TX - burst function to pack up to five packets in two descriptors in order to - save PCI bandwidth and improve performance at the cost of a slightly - higher CPU usage. + A nonzero value enables multi-packet send (MPS) for ConnectX-4 Lx and + enhanced multi-packet send (Enhanced MPS) for ConnectX-5. MPS allows the + TX burst function to pack up multiple packets in a single descriptor + session in order to save PCI bandwidth and improve performance at the + cost of a slightly higher CPU usage. When ``txq_inline`` is set along + with ``txq_mpw_en``, TX burst function tries to copy entire packet data + on to TX descriptor instead of including pointer of packet only if there + is enough room remained in the descriptor. ``txq_inline`` sets + per-descriptor space for either pointers or inlined packets. In addition, + Enhanced MPS supports hybrid mode - mixing inlined packets and pointers + in the same descriptor. This option cannot be used in conjunction with ``tso`` below. When ``tso`` is set, ``txq_mpw_en`` is disabled. @@ -194,6 +201,22 @@ Run-time configuration It is currently only supported on the ConnectX-4 Lx and ConnectX-5 families of adapters. Enabled by default. +- ``txq_mpw_hdr_dseg_en`` parameter [int] + + A nonzero value enables including two pointers in the first block of TX + descriptor. This can be used to lessen CPU load for memory copy. + + Effective only when Enhanced MPS is supported. Disabled by default. + +- ``txq_max_inline_len`` parameter [int] + + Maximum size of packet to be inlined. This limits the size of packet to + be inlined. If the size of a packet is larger than configured value, the + packet isn't inlined even though there's enough space remained in the + descriptor. Instead, the packet is included with pointer. + + Effective only when Enhanced MPS is supported. The default value is 256. + - ``tso`` parameter [int] A nonzero value enables hardware TSO. -- 2.11.0
[dpdk-dev] DPDK support for Emulex OneConnect OCe14000
Hi, I was looking through the dev-dpdk archives to see if there was an answer to my question about support for Emulex. I only see one reference to Emulex and so I wanted to ask -- is there support? I notice that under the list of supported NICs, oce (for Emulex) is listed under 'Attic' but when I click on the oce link, I get a page that references 6Wind. It is my understanding that 6Wind was supposed to develop a PMD for oce back in 2014, but it's unclear to me whether that's included in top-of-tree sources or was ever actually developed. Looking through those sources, I do not see a oce driver. Can someone please clarify what 'Attic' means with regard to 'oce' and whether there is planned support for it or a roadmap for support? Thanks in advance, Jordan Rhody
[dpdk-dev] [PATCH] net/virtio-user: support changing tap interface name
This patch adds a new option 'iface' to change the interface name of tap device with vhost-kernel as backend. Signed-off-by: Wenfeng Liu --- drivers/net/virtio/virtio_user/virtio_user_dev.c | 12 drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +- drivers/net/virtio/virtio_user_ethdev.c | 24 +--- 3 files changed, 30 insertions(+), 8 deletions(-) diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c index 21ed00d..e7fd65f 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c @@ -193,9 +193,6 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) for (i = 0; i < dev->max_queue_pairs; ++i) dev->ops->enable_qp(dev, i, 0); - free(dev->ifname); - dev->ifname = NULL; - return 0; } @@ -268,7 +265,7 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) int virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, -int cq, int queue_size, const char *mac) +int cq, int queue_size, const char *mac, char **ifname) { snprintf(dev->path, PATH_MAX, "%s", path); dev->max_queue_pairs = queues; @@ -277,6 +274,11 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) dev->mac_specified = 0; parse_mac(dev, mac); + if (*ifname) { + dev->ifname = *ifname; + *ifname = NULL; + } + if (virtio_user_dev_setup(dev) < 0) { PMD_INIT_LOG(ERR, "backend set up fails"); return -1; @@ -327,6 +329,8 @@ int virtio_user_stop_device(struct virtio_user_dev *dev) free(dev->vhostfds); free(dev->tapfds); } + + free(dev->ifname); } static uint8_t diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h index 0d39f40..6ecb91e 100644 --- a/drivers/net/virtio/virtio_user/virtio_user_dev.h +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h @@ -69,7 +69,7 @@ struct virtio_user_dev { int virtio_user_start_device(struct virtio_user_dev *dev); int virtio_user_stop_device(struct virtio_user_dev *dev); int virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues, -int cq, int queue_size, const char *mac); +int cq, int queue_size, const char *mac, char **ifname); void virtio_user_dev_uninit(struct virtio_user_dev *dev); void virtio_user_handle_cq(struct virtio_user_dev *dev, uint16_t queue_idx); #endif diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c index 0b226ac..16d1526 100644 --- a/drivers/net/virtio/virtio_user_ethdev.c +++ b/drivers/net/virtio/virtio_user_ethdev.c @@ -243,6 +243,8 @@ VIRTIO_USER_ARG_PATH, #define VIRTIO_USER_ARG_QUEUE_SIZE "queue_size" VIRTIO_USER_ARG_QUEUE_SIZE, +#define VIRTIO_USER_ARG_INTERFACE_NAME "iface" + VIRTIO_USER_ARG_INTERFACE_NAME, NULL }; @@ -259,6 +261,9 @@ *(char **)extra_args = strdup(value); + if (!*(char **)extra_args) + return -ENOMEM; + return 0; } @@ -347,6 +352,7 @@ uint64_t cq = VIRTIO_USER_DEF_CQ_EN; uint64_t queue_size = VIRTIO_USER_DEF_Q_SZ; char *path = NULL; + char *ifname = NULL; char *mac_addr = NULL; int ret = -1; @@ -375,6 +381,15 @@ goto end; } + if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_INTERFACE_NAME) == 1) { + if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_INTERFACE_NAME, + &get_string_arg, &ifname) < 0) { + PMD_INIT_LOG(ERR, "error to parse %s", +VIRTIO_USER_ARG_INTERFACE_NAME); + goto end; + } + } + if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_MAC) == 1) { if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_MAC, &get_string_arg, &mac_addr) < 0) { @@ -413,7 +428,7 @@ cq = 1; } - if (queues > 1 && cq == 0) { + if (queues > 1 && cq == VIRTIO_USER_DEF_CQ_EN) { PMD_INIT_LOG(ERR, "multi-q requires ctrl-q"); goto end; } @@ -426,7 +441,7 @@ hw = eth_dev->data->dev_private; if (virtio_user_dev_init(hw->virtio_user_dev, path, queues, cq, -queue_size, mac_addr) < 0) { +queue_size, mac_addr, &ifname) < 0) { PMD_INIT_LOG(ERR, "virtio_user_dev_init fails"); virtio_user_eth_dev_free(eth_dev); goto end; @@ -447,6 +462,8 @@ free(path); if (mac_addr) free(mac_addr); + if
[dpdk-dev] [PATCH] table: fix hash_ext stats update
Fixed stats double update. Signed-off-by: Aleksey Katargin --- lib/librte_table/rte_table_hash_ext.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/lib/librte_table/rte_table_hash_ext.c b/lib/librte_table/rte_table_hash_ext.c index e283a3d..353f930 100644 --- a/lib/librte_table/rte_table_hash_ext.c +++ b/lib/librte_table/rte_table_hash_ext.c @@ -444,7 +444,6 @@ static int rte_table_hash_ext_lookup_unoptimized( uint64_t pkts_mask_out = 0; __rte_unused uint32_t n_pkts_in = __builtin_popcountll(pkts_mask); - RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); for ( ; pkts_mask; ) { struct bucket *bkt0, *bkt; @@ -490,7 +489,6 @@ static int rte_table_hash_ext_lookup_unoptimized( } *lookup_hit_mask = pkts_mask_out; - RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(pkts_mask_out)); return 0; } @@ -874,9 +872,12 @@ static int rte_table_hash_ext_lookup( RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); /* Cannot run the pipeline with less than 7 packets */ - if (__builtin_popcountll(pkts_mask) < 7) - return rte_table_hash_ext_lookup_unoptimized(table, pkts, + if (__builtin_popcountll(pkts_mask) < 7) { + status = rte_table_hash_ext_lookup_unoptimized(table, pkts, pkts_mask, lookup_hit_mask, entries, 0); + RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(*lookup_hit_mask)); + return status; + } /* Pipeline stage 0 */ lookup2_stage0(t, g, pkts, pkts_mask, pkt00_index, pkt01_index); @@ -1007,9 +1008,12 @@ static int rte_table_hash_ext_lookup_dosig( RTE_TABLE_HASH_EXT_STATS_PKTS_IN_ADD(t, n_pkts_in); /* Cannot run the pipeline with less than 7 packets */ - if (__builtin_popcountll(pkts_mask) < 7) - return rte_table_hash_ext_lookup_unoptimized(table, pkts, + if (__builtin_popcountll(pkts_mask) < 7) { + status = rte_table_hash_ext_lookup_unoptimized(table, pkts, pkts_mask, lookup_hit_mask, entries, 1); + RTE_TABLE_HASH_EXT_STATS_PKTS_LOOKUP_MISS(t, n_pkts_in - __builtin_popcountll(*lookup_hit_mask)); + return status; + } /* Pipeline stage 0 */ lookup2_stage0(t, g, pkts, pkts_mask, pkt00_index, pkt01_index); -- 2.1.4