[dpdk-dev] Newbie question about distributor library
Hi , Distributor was running as part of Rx core. Thanks, Reshma > -Original Message- > From: Wang, Shawn [mailto:xingbow at amazon.com] > Sent: Friday, October 10, 2014 10:06 PM > To: Pattan, Reshma; dev at dpdk.org > Subject: RE: Newbie question about distributor library > > 15.5 mpps is amazing. > Does the RX core run the distributor or there is another distributor core? > > Thanks. > > From: Pattan, Reshma [reshma.pattan at intel.com] > Sent: Friday, October 10, 2014 12:47 AM > To: Wang, Shawn; dev at dpdk.org > Subject: RE: Newbie question about distributor library > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Shawn > > Sent: Thursday, October 9, 2014 7:00 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] Newbie question about distributor library > > > > Hi: > > > > I am reading document about distributor library which is added in DPDK 1.7. > > The document mentioned the packets are dynamically load balanced > > between a set of worker cores. > > So I am wondering is this the only reason we need distributor library? > > What else could it give us? > > Do we have any performance number on this new library? > > > > Thanks. > > Hi, > > Distributor library only takes care of load distribution based on flow type > i.e. > based on rss ( which is calculated based on 5 tuple of the packet) value of > mbuf > to workers. Packets with same rss will be given to same worker. Hence > different > flows goes to different workers. > > 15.5 mpps was the performance we achieved with 8 worker cores, 1 Rx core > and 1 TX core using sample application. > > Thanks, > Reshma >
[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
Cyril, Thanks for your comments! You are right. SSE needs to be splited. The current split is not a completed one. I'll continue to contribute. Best Regards! -- Chao Zhu (??) Research Staff Member Cloud Infrastructure and Technology Group IBM China Research Lab Building 19 Zhongguancun Software Park 8 Dongbeiwang West Road, Haidian District, Beijing, PRC. 100193 Tel: +86-10-58748711 Email: bjzhuc at cn.ibm.com From: Cyril Chemparathy To: Chao CH Zhu/China/IBM at IBMCN, Date: 2014/10/07 05:39 Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK On 9/26/2014 2:33 AM, Chao Zhu wrote: > The set of patches split x86 architecture specific operations from DPDK and put them to the > arch directories of i686 and x86_64 architecture. This will make the adpotion of DPDK much easier > on other computer architecture. For a new architecture, just add an architecture specific > directory and necessary building configuration files, then DPDK can support it. Wouldn't the SSE specifics in rte_common.h and rte_common_vect.h need to be similarly split out into architecture specifics? Thanks -- Cyril.
[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK
David, I agree that your idea may be better for the splitting. However, as Bruce said, I think people would like to see the multi-architecture support feature of DPDK first. We can improve it gradually. Do you have some comments? Best Regards! -- Chao Zhu From: Bruce Richardson To: David Marchand Cc: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" Date: 2014/10/03 21:28 Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK On Fri, Oct 03, 2014 at 03:21:53PM +0200, David Marchand wrote: > Hello Chao, > > On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu wrote: > > > The set of patches split x86 architecture specific operations from DPDK > > and put them to the > > arch directories of i686 and x86_64 architecture. This will make the > > adpotion of DPDK much easier > > on other computer architecture. For a new architecture, just add an > > architecture specific > > directory and necessary building configuration files, then DPDK can > > support it. > > > > > Here is a different approach for the headers splitting. > > If we are going to support multiple architectures, the best would be to > have a specific header for each arch which implements a common API (no need > for any _arch suffix). > These headers would be located in lib/librte_eal/common/include/arch/$arch/ > rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd to > me). > Makefiles can add some -I for dpdk to build itself (and we can remove those > symlinks from the makefiles). > Makefiles only install the specific headers in RTE_SDK/include for use by > applications. > > For common code and documentation, we can add a "generic" directory in > lib/librte_eal/common/include (or "arch-generic", or "shared" ... any > better idea ?). > DPDK makefiles installs the generic headers in RTE_SDK/include/generic. > arch headers (like rte_atomic.h) include the generic one > (). > > These generic headers can be implemented using compiler intrinsics when > possible. > They also include the doxygen stuff in a single place. > > > This would look like something like this, for rte_atomic.h : > - in DPDK sources > $ ls lib/librte_eal/common/include/*/rte_atomic.h > lib/librte_eal/common/include/i686/rte_atomic.h > lib/librte_eal/common/include/x86_64/rte_atomic.h > lib/librte_eal/common/include/generic/rte_atomic.h > > - in installed RTE_SDK > $ ls RTE_SDK/include/{,*/}rte_atomic.h > RTE_SDK/include/rte_atomic.h > RTE_SDK/include/generic/rte_atomic.h > > Comments ? > > > I am only focusing on the first patchset at the moment, but if we can find > consensus here, a respin of the two patchsets would be great. > > Thanks. > > -- > David Marchand I would have no objection to such a scheme. However, I'm not seeing much advantage over the existing way of doing things. I think I'd rather see the proposed patch sets merged first and then any additional cleanup done, rather than holding up a worthwhile submission for a bit of tidy-up. /Bruce
[dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet
Hi Neil, Very appreciate your comments. I add inline reply, will send v3 asap when we get alignment. BRs, Liang Cunming > -Original Message- > From: Neil Horman [mailto:nhorman at tuxdriver.com] > Sent: Saturday, October 11, 2014 1:52 AM > To: Liang, Cunming > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx > cycles/packet > > On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote: > > It provides unit test to measure cycles/packet in NIC loopback mode. > > It simply gives the average cycles of IO used per packet without test > > equipment. > > When doing the test, make sure the link is UP. > > > > Usage Example: > > 1. Run unit test app in interactive mode > > app/test -c f -n 4 -- -i > > 2. Run and wait for the result > > pmd_perf_autotest > > > > There's option to choose rx/tx pair, default is vector. > > set_rxtx_mode [vector|scalar|full|hybrid] > > Note: To get acurate scalar fast, please choose 'vector' or 'hybrid' without > INC_VEC=y in config > > > > Signed-off-by: Cunming Liang > > Acked-by: Bruce Richardson > > Notes inline > > > --- > > app/test/Makefile |1 + > > app/test/commands.c | 38 +++ > > app/test/packet_burst_generator.c |4 +- > > app/test/test.h |4 + > > app/test/test_pmd_perf.c| 626 > +++ > > lib/librte_pmd_ixgbe/ixgbe_ethdev.c |6 + > > 6 files changed, 677 insertions(+), 2 deletions(-) > > create mode 100644 app/test/test_pmd_perf.c > > > > diff --git a/app/test/Makefile b/app/test/Makefile > > index 6af6d76..ebfa0ba 100644 > > --- a/app/test/Makefile > > +++ b/app/test/Makefile > > @@ -56,6 +56,7 @@ SRCS-y += test_memzone.c > > > > SRCS-y += test_ring.c > > SRCS-y += test_ring_perf.c > > +SRCS-y += test_pmd_perf.c > > > > ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) > > SRCS-y += test_table.c > > diff --git a/app/test/commands.c b/app/test/commands.c > > index a9e36b1..f1e746e 100644 > > --- a/app/test/commands.c > > +++ b/app/test/commands.c > > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit = { > > > > +#define NB_ETHPORTS_USED(1) > > +#define NB_SOCKETS (2) > > +#define MEMPOOL_CACHE_SIZE 250 > > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + > RTE_PKTMBUF_HEADROOM) > Don't you want to size this in accordance with the amount of data your sending > (64 Bytes as noted above)? [Liang, Cunming] The case is designed to measure small packet IO cost with normal mbuf size. Even if decreasing the size, it won't gain significant cycles. > > > +static void > > +print_ethaddr(const char *name, const struct ether_addr *eth_addr) > > +{ > > + printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name, > > + eth_addr->addr_bytes[0], > > + eth_addr->addr_bytes[1], > > + eth_addr->addr_bytes[2], > > + eth_addr->addr_bytes[3], > > + eth_addr->addr_bytes[4], > > + eth_addr->addr_bytes[5]); > > +} > > + > This was copieed from print_ethaddr. Seems like a good candidate for a common > function in rte_ether.h [Liang, Cunming] Agree with you, some of samples now use it with the same copy. I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for format the 48bits address output. And leaving other prints for application customization. > > > > +} > > + > > +static void > > +signal_handler(int signum) > > +{ > > + /* When we receive a USR1 signal, print stats */ > I think you mean SIGUSR2, below, SIGUSR1 tears the test down and exits the > program [Liang, Cunming] Thanks, it's a typo. > > > + if (signum == SIGUSR1) { > SIGINT instead. Thats the common practice. [Liang, Cunming] I understood your opinion. The considerations I'm not using SIGINT instead are: 1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in command interactive. It always has to explicitly send signal. No matter SIGUSR1 or SIGINT. 2. By SIGINT semantic, expect to terminate the process. Here I expect to force stop this case, but still alive in command line. After it stopped, it can run again or start to run other test cases. So I keep SIGINT, SIGUSR1 in different behavior. 3. It should be rarely used. Only when exception timeout, I leave this backdoor for automation test control. For manual test, we can easily force kill the process. > > > + printf("Force Stop!\n"); > > + stop = 1; > > + } > > + if (signum == SIGUSR2) > > + stats_display(0); > > +} > > +/* main processing loop */ > > +static int > > +main_loop(__rte_unused void *args) > > +{ > > +#define PACKET_SIZE 64 > > +#define FRAME_GAP 12 > > +#define MAC_PREAMBLE 8 > > + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > > + unsigned lcore_id; > > + unsigned i, portid, nb_rx = 0, nb_tx = 0; > > + struct lcore_conf *conf; > > + uint64_t prev_tsc, cur_tsc; > > + int pkt_per_port; > > + uint64
[dpdk-dev] DPDK - VIRTIO performance problems
Hi I am checking performance DPDK VIRTIO mode running on KVM (Linux ubuntu 3.11.0-15-generic). The maximum throughput I reached was 4Gbps and then I saw an interesting phenomena. Every ~2min traffic stopped completely and then immediately came back. This happened in a periodic fashion. I have never seen such thing in a pass-through mode, where I reached very much higher rates of course. Can you please help in resolving this problem in VIRTIO ? Thank you Yan
[dpdk-dev] DPDK - VIRTIO performance problems
On Sun, Oct 12, 2014 at 12:37:37PM +, Yan Freedland wrote: > Every ~2min traffic stopped completely and then immediately came back. This > happened in a periodic fashion. To me it sounds like it could be similar to what I've seen when I ran out of mbuf's or ran out of RX / TX descriptor entries. It could be worth checking the error counters on the interfaces with DPDK and Linux OS / ethtool to see what might be incrementing during the failed time periods. Matthew.
[dpdk-dev] [PATCH v4 00/10] VM Power Management
Virtual Machine Power Management. The following patches add two DPDK sample applications and an alternate implementation of librte_power for use in virtualized environments. The idea is to provide librte_power functionality from within a VM to address the lack of MSRs to facilitate frequency changes from within a VM. It is ideally suited for Haswell which provides per core frequency scaling. The current librte_power affects frequency changes via the acpi-cpufreq 'userspace' power governor, accessed via sysfs. General Overview:(more information in each patch that follows). The VM Power Management solution provides two components: 1)VM: Allows for the a DPDK application in a VM to reuse the librte_power interface. Each lcore opens a Virto-Serial endpoint channel to the host, where the re-implementation of librte_power simply forwards the requests for frequency change to a host based monitor. The host monitor itself uses librte_power. Each lcore channel corresponds to a serial device '/dev/virtio-ports/virtio.serial.port.poweragent.' which is opened in non-blocking mode. While each Virtual CPU can be mapped to multiple physical CPUs it is recommended that each vCPU should be mapped to a single core only. 2)Host: The host monitor is managed by a CLI, it allows for adding qemu/KVM virtual machines and associated channels to the monitor, manually changing CPU frequency, inspecting the state of VMs, vCPU to pCPU pinning and managing channels. Host channel endpoints are Virto-Serial endpoints configured as AF_UNIX file sockets which follow a specific naming convention i.e /tmp/powermonitor/., each channel has an 1:1 mapping to a VM endpoint i.e. /dev/virtio-ports/virtio.serial.port.poweragent. Host channel endpoints are opened in non-blocking mode and are monitored via epoll. Requests over each channel to change frequency are forwarded to the original librte_power. Channels must be manually configured as qemu-kvm command line arguments or libvirt domain definition(xml) e.g. Where multiple channels can be configured by specifying multiple elements, by replacing , . (port number) should be incremented by 1 for each new channel element. More information on Virtio-Serial can be found here: http://fedoraproject.org/wiki/Features/VirtioSerial To enable the Hypervisor creation of channels, the host endpoint directory must be created with qemu permissions: mkdir /tmp/powermonitor chown qemu:qemu /tmp/powermonitor The host application runs on two separate lcores: Core N) CLI: For management of Virtual Machines adding channels to Monitor thread, inspecting state and manually setting CPU frequency [PATCH 02/09] Core N+1) Monitor Thread: An epoll based infinite loop that waits on channel events from VMs and calls the corresponding librte_power functions. A sample application is also provided to run on Virtual Machines, this application provides a CLI to manually set the frequency of a vCPU[PATCH 08/09] The current l3fwd-power sample application can also be run on a VM. Changes in V4: Fixed double free of channel during VM shutdown. Changes in V3: Fixed crash in Guest CLI when host application is not running. Renamed #defines to be more specific to the module they belong Added vCPU pinning via CLI Changes in V2: Runtime selection of librte_power implementations. Updated Unit tests to cover librte_power changes. PATCH[0/3] was sent twice, again as PATCH[0/4] Miscellaneous fixes. Alan Carew (10): Channel Manager and Monitor for VM Power Management(Host). VM Power Management CLI(Host). CPU Frequency Power Management(Host). VM Power Management application and Makefile. VM Power Management CLI(Guest). VM communication channels for VM Power Management(Guest). librte_power common interface for Guest and Host Packet format for VM Power Management(Host and Guest). Build system integration for VM Power Management(Guest and Host) VM Power Management Unit Tests app/test/Makefile | 3 +- app/test/autotest_data.py | 26 + app/test/test_power.c | 445 +--- app/test/test_power_acpi_cpufreq.c | 544 ++ app/test/test_power_kvm_vm.c | 308 examples/vm_power_manager/Makefile | 57 ++ examples/vm_power_manager/channel_manager.c| 804 + examples/vm_power_manager/channel_manager.h| 314 examples/vm_power_manager/channel_monitor.c| 231 ++ examples/vm_power_manager/channel_monitor.h| 102 +++ examples/vm_power_manager/guest_cli/Makefile | 56 ++ examples/vm_power_manager/guest_cli/main.c | 87 +++ examples/vm_power_manager/guest_cli/main.h | 52 ++ .../guest_cli/vm_power_cli_guest.c | 155 .../guest_cli/vm_power_cli_guest.h | 55 ++ examples/vm_power_manager/main.c
[dpdk-dev] [PATCH v4 02/10] VM Power Management CLI(Host).
The CLI is used for administrating the channel monitor and manager and manually setting the CPU frequency on the host. Supports the following commands: add_vm [Mul-choice STRING]: add_vm|rm_vm , add a VM for subsequent operations with the CLI or remove a previously added VM from the VM Power Manager rm_vm [Mul-choice STRING]: add_vm|rm_vm , add a VM for subsequent operations with the CLI or remove a previously added VM from the VM Power Manager add_channels [Fixed STRING]: add_channels |all, add communication channels for the specified VM, the virtio channels must be enabled in the VM configuration(qemu/libvirt) and the associated VM must be active. is a comma-separated list of channel numbers to add, using the keyword 'all' will attempt to add all channels for the VM set_channel_status [Fixed STRING]: set_channel_status |all enabled|disabled, enable or disable the communication channels in list(comma-seperated) for the specified VM, alternatively list can be replaced with keyword 'all'. Disabled channels will still receive packets on the host, however the commands they specify will be ignored. Set status to 'enabled' to begin processing requests again. show_vm [Fixed STRING]: show_vm , prints the information on the specified VM(s), the information lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with any communication channels associated with each VM show_cpu_freq_mask [Fixed STRING]: show_cpu_freq_mask , Get the current frequency for each core specified in the mask set_cpu_freq_mask [Fixed STRING]: set_cpu_freq , Set the current frequency for the cores specified in by scaling each up/down/min/max. show_cpu_freq [Fixed STRING]: Get the current frequency for the specified core set_cpu_freq [Fixed STRING]: set_cpu_freq , Set the current frequency for the specified core by scaling up/down/min/max quit [Fixed STRING]: close the application Signed-off-by: Alan Carew --- examples/vm_power_manager/vm_power_cli.c | 669 +++ examples/vm_power_manager/vm_power_cli.h | 47 +++ 2 files changed, 716 insertions(+) create mode 100644 examples/vm_power_manager/vm_power_cli.c create mode 100644 examples/vm_power_manager/vm_power_cli.h diff --git a/examples/vm_power_manager/vm_power_cli.c b/examples/vm_power_manager/vm_power_cli.c new file mode 100644 index 000..e162e88 --- /dev/null +++ b/examples/vm_power_manager/vm_power_cli.c @@ -0,0 +1,669 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "vm_power_cli.h" +#include "channel_manager.h" +#include "channel_monitor.h" +#include "power_manager.h" +#include "channel_commands.h" + +struct cmd_quit_result { + cmdline_fixed_string_t quit; +}; + +static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result, + struct cmdline *cl, + __attribute__((unused)) void *data) +{ + channel_monitor_exit(); + channel_manager_exit(); + power_manager_exit(); + cmdline_quit(cl); +} + +cmdline_parse_token_string_t cmd_quit_quit = + TOKEN_STRING_INITIALIZER(struct cmd_quit_result, qu
[dpdk-dev] [PATCH v4 01/10] Channel Manager and Monitor for VM Power Management(Host).
The manager is responsible for adding communications channels to the Monitor thread, tracking and reporting VM state and employs the libvirt API for synchronization with the KVM Hypervisor. The manager interacts with the Hypervisor to discover the mapping of virtual CPUS(vCPUs) to the host physical CPUS(pCPUs) and to inspect the VM running state. The manager provides the following functionality to the CLI: 1) Connect to a libvirtd instance, default: qemu:///system 2) Add a VM to an internal list, each VM is identified by a "name" which must correspond a valid libvirt Domain Name. 3) Add communication channels associated with a VM to the epoll based Monitor thread. The channels must exist and be in the form of: /tmp/powermonitor/.. Each channel is a Virtio-Serial endpoint configured as an AF_UNIX file socket and opened in non-blocking mode. Each VM can have a maximum of 64 channels associated with it. 4) Disable or re-enable VM communication channels, channels once added to the Monitor thread remain in that threads control, however acting on channel requests can be disabled and renabled via CLI. The monitor is an epoll based infinite loop running in a separate thread that waits on channel events from VMs and calls the corresponding functions. Channel definitions from the manager are registered via the epoll event opaque pointer when calling epoll_ctl(EPOLL_CTL_ADD), this allows for obtaining the channels file descriptor for reading EPOLLIN events and mapping the vCPU to pCPU(s) associated with a request from a particular VM. Signed-off-by: Alan Carew --- examples/vm_power_manager/channel_manager.c | 804 examples/vm_power_manager/channel_manager.h | 314 +++ examples/vm_power_manager/channel_monitor.c | 231 examples/vm_power_manager/channel_monitor.h | 102 4 files changed, 1451 insertions(+) create mode 100644 examples/vm_power_manager/channel_manager.c create mode 100644 examples/vm_power_manager/channel_manager.h create mode 100644 examples/vm_power_manager/channel_monitor.c create mode 100644 examples/vm_power_manager/channel_monitor.h diff --git a/examples/vm_power_manager/channel_manager.c b/examples/vm_power_manager/channel_manager.c new file mode 100644 index 000..a14f191 --- /dev/null +++ b/examples/vm_power_manager/channel_manager.c @@ -0,0 +1,804 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "channel_manager.h" +#include "channel_commands.h" +#include "channel_monitor.h" + + +#define RTE_LOGTYPE_CHANNEL_MANAGER RTE_LOGTYPE_USER1 + +#define ITERATIVE_BITMASK_CHECK_64(mask_u64b, i) \ + for (i = 0; mask_u64b; mask_u64b &= ~(1ULL << i++)) \ + if ((mask_u64b >> i) & 1) \ + +/* Global pointer to libvirt connection */ +static virConnectPtr global_vir_conn_ptr; + +static unsigned char *global_cpumaps; +static virVcpuInfo *global_vircpuinfo; +static size_t global_maplen; + +static unsigned global_n_host_cpus; + +/* + * Represents a single Virtual Machine + */ +struct virtual_machine_info { + cha
[dpdk-dev] [PATCH v4 03/10] CPU Frequency Power Management(Host).
A wrapper around librte_power(using ACPI cpufreq), providing locking around the non-threadsafe library, allowing for frequency changes based on core masks and core numbers from both the CLI thread and epoll monitor thread. Signed-off-by: Alan Carew --- examples/vm_power_manager/power_manager.c | 244 ++ examples/vm_power_manager/power_manager.h | 188 +++ 2 files changed, 432 insertions(+) create mode 100644 examples/vm_power_manager/power_manager.c create mode 100644 examples/vm_power_manager/power_manager.h diff --git a/examples/vm_power_manager/power_manager.c b/examples/vm_power_manager/power_manager.c new file mode 100644 index 000..b7b1fca --- /dev/null +++ b/examples/vm_power_manager/power_manager.c @@ -0,0 +1,244 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include + +#include "power_manager.h" + +#define RTE_LOGTYPE_POWER_MANAGER RTE_LOGTYPE_USER1 + +#define POWER_SCALE_CORE(DIRECTION, core_num , ret) do { \ + if (core_num >= POWER_MGR_MAX_CPUS) \ + return -1; \ + if (!(global_enabled_cpus & (1ULL << core_num))) \ + return -1; \ + rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); \ + ret = rte_power_freq_##DIRECTION(core_num); \ + rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); \ +} while (0) + +#define POWER_SCALE_MASK(DIRECTION, core_mask, ret) do { \ + int i; \ + for (i = 0; core_mask; core_mask &= ~(1 << i++)) { \ + if ((core_mask >> i) & 1) { \ + if (!(global_enabled_cpus & (1ULL << i))) \ + continue; \ + rte_spinlock_lock(&global_core_freq_info[i].power_sl); \ + if (rte_power_freq_##DIRECTION(i) != 1) \ + ret = -1; \ + rte_spinlock_unlock(&global_core_freq_info[i].power_sl); \ + } \ + } \ +} while (0) + +struct freq_info { + rte_spinlock_t power_sl; + uint32_t freqs[RTE_MAX_LCORE_FREQS]; + unsigned num_freqs; +} __rte_cache_aligned; + +static struct freq_info global_core_freq_info[POWER_MGR_MAX_CPUS]; + +static uint64_t global_enabled_cpus; + +#define SYSFS_CPU_PATH "/sys/devices/system/cpu/cpu%u/topology/core_id" + +static unsigned +set_host_cpus_mask(void) +{ + char path[PATH_MAX]; + unsigned i; + unsigned num_cpus = 0; + for (i = 0; i < POWER_MGR_MAX_CPUS; i++) { + snprintf(path, sizeof(path), SYSFS_CPU_PATH, i); + if (access(path, F_OK) == 0) { + global_enabled_cpus |= 1ULL << i; + num_cpus++; + } else + return num_cpus; + } + return num_cpus; +} + +int +power_manager_init(void) +{ + unsigned i, num_cpus; + uint64_t cpu_mask; + int ret = 0; + + num_cpus = set_host_cpus_mask(); + if (num_cpus == 0) { + RTE_LOG(ERR, POWER_MANAGER, "Unable to detected host CPUs, please " + "ensure that sufficient privileges exist to inspect sysfs\n"); + return -1; +
[dpdk-dev] [PATCH v4 04/10] VM Power Management application and Makefile.
For launching CLI thread and Monitor thread and initialising resources. Requires a minimum of two lcores to run, additional cores specified by eal core mask are not used. Signed-off-by: Alan Carew --- examples/vm_power_manager/Makefile | 57 ++ examples/vm_power_manager/main.c | 117 + examples/vm_power_manager/main.h | 52 + 3 files changed, 226 insertions(+) create mode 100644 examples/vm_power_manager/Makefile create mode 100644 examples/vm_power_manager/main.c create mode 100644 examples/vm_power_manager/main.h diff --git a/examples/vm_power_manager/Makefile b/examples/vm_power_manager/Makefile new file mode 100644 index 000..7d6f943 --- /dev/null +++ b/examples/vm_power_manager/Makefile @@ -0,0 +1,57 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-default-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = vm_power_mgr + +# all source are stored in SRCS-y +SRCS-y := main.c vm_power_cli.c power_manager.c channel_manager.c +SRCS-y += channel_monitor.c + +CFLAGS += -O3 -lvirt -I$(RTE_SDK)/lib/librte_power/ +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c new file mode 100644 index 000..875274e --- /dev/null +++ b/examples/vm_power_manager/main.c @@ -0,0 +1,117 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTH
[dpdk-dev] [PATCH v4 06/10] VM communication channels for VM Power Management(Guest).
Allows for the opening of Virtio-Serial devices on a VM, where a DPDK application can send packets to the host based monitor. The packet formatted is specified in channel_commands.h Each device appears as a serial device in path /dev/virtio-ports/virtio.serial.port.. where each lcore in a DPDK application has exclusive to a device/channel. Each channel is opened in non-blocking mode, after a successful open a test packet is send to the host to ensure the host side is monitoring. Signed-off-by: Alan Carew --- lib/librte_power/guest_channel.c | 162 +++ lib/librte_power/guest_channel.h | 89 + 2 files changed, 251 insertions(+) create mode 100644 lib/librte_power/guest_channel.c create mode 100644 lib/librte_power/guest_channel.h diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c new file mode 100644 index 000..2295665 --- /dev/null +++ b/lib/librte_power/guest_channel.c @@ -0,0 +1,162 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include +#include +#include +#include +#include +#include +#include +#include + + +#include +#include + +#include "guest_channel.h" +#include "channel_commands.h" + +#define RTE_LOGTYPE_GUEST_CHANNEL RTE_LOGTYPE_USER1 + +static int global_fds[RTE_MAX_LCORE]; + +int +guest_channel_host_connect(const char *path, unsigned lcore_id) +{ + int flags, ret; + struct channel_packet pkt; + char fd_path[PATH_MAX]; + int fd = -1; + + if (lcore_id >= RTE_MAX_LCORE) { + RTE_LOG(ERR, GUEST_CHANNEL, "Channel(%u) is out of range 0...%d\n", + lcore_id, RTE_MAX_LCORE-1); + return -1; + } + /* check if path is already open */ + if (global_fds[lcore_id] != 0) { + RTE_LOG(ERR, GUEST_CHANNEL, "Channel(%u) is already open with fd %d\n", + lcore_id, global_fds[lcore_id]); + return -1; + } + + snprintf(fd_path, PATH_MAX, "%s.%u", path, lcore_id); + RTE_LOG(INFO, GUEST_CHANNEL, "Opening channel '%s' for lcore %u\n", + fd_path, lcore_id); + fd = open(fd_path, O_RDWR); + if (fd < 0) { + RTE_LOG(ERR, GUEST_CHANNEL, "Unable to to connect to '%s' with error " + "%s\n", fd_path, strerror(errno)); + return -1; + } + + flags = fcntl(fd, F_GETFL, 0); + if (flags < 0) { + RTE_LOG(ERR, GUEST_CHANNEL, "Failed on fcntl get flags for file %s\n", + fd_path); + goto error; + } + + flags |= O_NONBLOCK; + if (fcntl(fd, F_SETFL, flags) < 0) { + RTE_LOG(ERR, GUEST_CHANNEL, "Failed on setting non-blocking mode for " + "file %s", fd_path); + goto error; + } + /* QEMU needs a delay after connection */ + sleep(1); + + /* Send a test packet, this command is ignored by the host, but a successful +* send indicates that the host endpoint is monitoring. +*/ + pkt.command = CPU_POWER_CONNECT; + global_fds[lcore_id] = fd; + ret = guest_channel_send_msg(&pkt, lcore_id); + if (ret != 0) { +
[dpdk-dev] [PATCH v4 05/10] VM Power Management CLI(Guest).
Provides a small sample application(guest_vm_power_mgr) to run on a VM. The application is run by providing a core mask(-c) and number of memory channels(-n). The core mask corresponds to the number of lcore channels to attempt to open. A maximum of 64 channels per VM is allowed. The channels must be monitored by the host. After successful initialisation a CPU frequency command can be sent to the host using: set_cpu_freq . Signed-off-by: Alan Carew --- examples/vm_power_manager/guest_cli/Makefile | 56 examples/vm_power_manager/guest_cli/main.c | 87 examples/vm_power_manager/guest_cli/main.h | 52 +++ .../guest_cli/vm_power_cli_guest.c | 155 + .../guest_cli/vm_power_cli_guest.h | 55 5 files changed, 405 insertions(+) create mode 100644 examples/vm_power_manager/guest_cli/Makefile create mode 100644 examples/vm_power_manager/guest_cli/main.c create mode 100644 examples/vm_power_manager/guest_cli/main.h create mode 100644 examples/vm_power_manager/guest_cli/vm_power_cli_guest.c create mode 100644 examples/vm_power_manager/guest_cli/vm_power_cli_guest.h diff --git a/examples/vm_power_manager/guest_cli/Makefile b/examples/vm_power_manager/guest_cli/Makefile new file mode 100644 index 000..167a7ed --- /dev/null +++ b/examples/vm_power_manager/guest_cli/Makefile @@ -0,0 +1,56 @@ +# BSD LICENSE +# +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +ifeq ($(RTE_SDK),) +$(error "Please define RTE_SDK environment variable") +endif + +# Default target, can be overriden by command line or environment +RTE_TARGET ?= x86_64-default-linuxapp-gcc + +include $(RTE_SDK)/mk/rte.vars.mk + +# binary name +APP = guest_vm_power_mgr + +# all source are stored in SRCS-y +SRCS-y := main.c vm_power_cli_guest.c + +CFLAGS += -O3 -I$(RTE_SDK)/lib/librte_power/ +CFLAGS += $(WERROR_FLAGS) + +# workaround for a gcc bug with noreturn attribute +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603 +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) +CFLAGS_main.o += -Wno-return-type +endif + +include $(RTE_SDK)/mk/rte.extapp.mk diff --git a/examples/vm_power_manager/guest_cli/main.c b/examples/vm_power_manager/guest_cli/main.c new file mode 100644 index 000..1e4767a --- /dev/null +++ b/examples/vm_power_manager/guest_cli/main.c @@ -0,0 +1,87 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPL
[dpdk-dev] [PATCH v4 08/10] Packet format for VM Power Management(Host and Guest).
Provides a command packet format for host and guest. Signed-off-by: Alan Carew --- lib/librte_power/channel_commands.h | 77 + 1 file changed, 77 insertions(+) create mode 100644 lib/librte_power/channel_commands.h diff --git a/lib/librte_power/channel_commands.h b/lib/librte_power/channel_commands.h new file mode 100644 index 000..7e78a8b --- /dev/null +++ b/lib/librte_power/channel_commands.h @@ -0,0 +1,77 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef CHANNEL_COMMANDS_H_ +#define CHANNEL_COMMANDS_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +/* Maximum number of CPUs */ +#define CHANNEL_CMDS_MAX_CPUS64 +#if CHANNEL_CMDS_MAX_CPUS > 64 +#error Maximum number of cores is 64, overflow is guaranteed to \ + cause problems with VM Power Management +#endif + +/* Maximum number of channels per VM */ +#define CHANNEL_CMDS_MAX_VM_CHANNELS 64 + +/* Maximum number of channels per VM */ +#define CHANNEL_CMDS_MAX_VM_CHANNELS 64 + +/* Valid Commands */ +#define CPU_POWER 1 +#define CPU_POWER_CONNECT 2 + +/* CPU Power Command Scaling */ +#define CPU_POWER_SCALE_UP 1 +#define CPU_POWER_SCALE_DOWN2 +#define CPU_POWER_SCALE_MAX 3 +#define CPU_POWER_SCALE_MIN 4 + +struct channel_packet { + uint64_t resource_id; /**< core_num, device */ + uint32_t unit;/**< scale down/up/min/max */ + uint32_t command; /**< Power, IO, etc */ +}; + + +#ifdef __cplusplus +} +#endif + +#endif /* CHANNEL_COMMANDS_H_ */ -- 1.9.3
[dpdk-dev] [PATCH v4 09/10] Build system integration for VM Power Management(Guest and Host)
librte_power now contains both rte_power_acpi_cpufreq and rte_power_kvm_vm implementations. Signed-off-by: Alan Carew --- lib/librte_power/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile index 6185812..d672a5a 100644 --- a/lib/librte_power/Makefile +++ b/lib/librte_power/Makefile @@ -37,7 +37,8 @@ LIB = librte_power.a CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing # all source are stored in SRCS-y -SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c +SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c rte_power_acpi_cpufreq.c +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_power_kvm_vm.c guest_channel.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h -- 1.9.3
[dpdk-dev] [PATCH v4 07/10] librte_power common interface for Guest and Host
Moved the current librte_power implementation to rte_power_acpi_cpufreq, with renaming of functions only. Added rte_power_kvm_vm implmentation to support Power Management from a VM. librte_power now hides the implementation based on the environment used. A new call rte_power_set_env() can explicidly set the environment, if not called then auto-detection takes place. rte_power_kvm_vm is subset of the librte_power APIs, the following is supported: rte_power_init(unsigned lcore_id) rte_power_exit(unsigned lcore_id) rte_power_freq_up(unsigned lcore_id) rte_power_freq_down(unsigned lcore_id) rte_power_freq_min(unsigned lcore_id) rte_power_freq_max(unsigned lcore_id) The other unsupported APIs return -ENOTSUP Signed-off-by: Alan Carew --- lib/librte_power/rte_power.c | 540 - lib/librte_power/rte_power.h | 120 +-- lib/librte_power/rte_power_acpi_cpufreq.c | 545 ++ lib/librte_power/rte_power_acpi_cpufreq.h | 192 +++ lib/librte_power/rte_power_common.h | 39 +++ lib/librte_power/rte_power_kvm_vm.c | 135 lib/librte_power/rte_power_kvm_vm.h | 179 ++ 7 files changed, 1248 insertions(+), 502 deletions(-) create mode 100644 lib/librte_power/rte_power_acpi_cpufreq.c create mode 100644 lib/librte_power/rte_power_acpi_cpufreq.h create mode 100644 lib/librte_power/rte_power_common.h create mode 100644 lib/librte_power/rte_power_kvm_vm.c create mode 100644 lib/librte_power/rte_power_kvm_vm.h diff --git a/lib/librte_power/rte_power.c b/lib/librte_power/rte_power.c index 856da9a..998ed1c 100644 --- a/lib/librte_power/rte_power.c +++ b/lib/librte_power/rte_power.c @@ -31,515 +31,113 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include #include #include "rte_power.h" +#include "rte_power_acpi_cpufreq.h" +#include "rte_power_kvm_vm.h" +#include "rte_power_common.h" -#ifdef RTE_LIBRTE_POWER_DEBUG -#define POWER_DEBUG_TRACE(fmt, args...) do { \ - RTE_LOG(ERR, POWER, "%s: " fmt, __func__, ## args); \ - } while (0) -#else -#define POWER_DEBUG_TRACE(fmt, args...) -#endif - -#define FOPEN_OR_ERR_RET(f, retval) do { \ - if ((f) == NULL) { \ - RTE_LOG(ERR, POWER, "File not openned\n"); \ - return (retval); \ - } \ -} while(0) - -#define FOPS_OR_NULL_GOTO(ret, label) do { \ - if ((ret) == NULL) { \ - RTE_LOG(ERR, POWER, "fgets returns nothing\n"); \ - goto label; \ - } \ -} while(0) - -#define FOPS_OR_ERR_GOTO(ret, label) do { \ - if ((ret) < 0) { \ - RTE_LOG(ERR, POWER, "File operations failed\n"); \ - goto label; \ - } \ -} while(0) - -#define STR_SIZE 1024 -#define POWER_CONVERT_TO_DECIMAL 10 +enum power_management_env global_default_env = PM_ENV_NOT_SET; -#define POWER_GOVERNOR_USERSPACE "userspace" -#define POWER_SYSFILE_GOVERNOR \ - "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor" -#define POWER_SYSFILE_AVAIL_FREQ \ - "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_available_frequencies" -#define POWER_SYSFILE_SETSPEED \ - "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed" +volatile uint32_t global_env_cfg_status = 0; -enum power_state { - POWER_IDLE = 0, - POWER_ONGOING, - POWER_USED, - POWER_UNKNOWN -}; +/* function pointers */ +rte_power_freqs_t rte_power_freqs = NULL; +rte_power_get_freq_t rte_power_get_freq = NULL; +rte_power_set_freq_t rte_power_set_freq = NULL; +rte_power_freq_change_t rte_power_freq_up = NULL; +rte_power_freq_change_t rte_power_freq_down = NULL; +rte_power_freq_change_t rte_power_freq_max = NULL; +rte_power_freq_change_t rte_power_freq_min = NULL; -/** - * Power info per lcore. - */ -struct rte_power_info { - unsigned lcore_id; /**< Logical core id */ - uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */ - uint32_t nb_freqs; /**< number of available freqs */ - FILE *f; /**< FD of scaling_setspeed */ - char governor_ori[32]; /**< Original governor name */ - uint32_t curr_idx; /**< Freq index in freqs array */ - volatile uint32_t state; /**< Power in use state */ -} __rte_cache_aligned; - -static struct rte_power_info lcore_power_info[RTE_MAX_LCORE]; - -/** - * It is to set specific freq for specific logical core, according to the index - * of supported frequencies. - */ -static int -set_freq_internal(struct rte_power_info *pi, uint32_t idx) +int +rte_power_set_env(enum power_management_env env) { - if (idx >= RTE_MAX_LCORE_FREQS || idx >= pi->nb_freqs) { - RTE_LOG(ERR, POWER, "Invalid frequency index %u, which " -
[dpdk-dev] [PATCH v4 10/10] VM Power Management Unit Tests
Updated the unit tests to cover both librte_power implementations as well as the external API. Signed-off-by: Alan Carew --- app/test/Makefile | 3 +- app/test/autotest_data.py | 26 ++ app/test/test_power.c | 445 +++--- app/test/test_power_acpi_cpufreq.c | 544 + app/test/test_power_kvm_vm.c | 308 + 5 files changed, 917 insertions(+), 409 deletions(-) create mode 100644 app/test/test_power_acpi_cpufreq.c create mode 100644 app/test/test_power_kvm_vm.c diff --git a/app/test/Makefile b/app/test/Makefile index 6af6d76..9417eda 100644 --- a/app/test/Makefile +++ b/app/test/Makefile @@ -119,7 +119,8 @@ endif SRCS-$(CONFIG_RTE_LIBRTE_METER) += test_meter.c SRCS-$(CONFIG_RTE_LIBRTE_KNI) += test_kni.c -SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power.c +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power.c test_power_acpi_cpufreq.c +SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power_kvm_vm.c SRCS-y += test_common.c SRCS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += test_ivshmem.c diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py index 878c72e..618a946 100644 --- a/app/test/autotest_data.py +++ b/app/test/autotest_data.py @@ -425,6 +425,32 @@ non_parallel_test_group_list = [ ] }, { + "Prefix" : "power_acpi_cpufreq", + "Memory" : all_sockets(512), + "Tests" : + [ + { +"Name" : "Power ACPI cpufreq autotest", +"Command" :"power_acpi_cpufreq_autotest", +"Func" : default_autotest, +"Report" : None, + }, + ] +}, +{ + "Prefix" : "power_kvm_vm", + "Memory" : "512", + "Tests" : + [ + { +"Name" : "Power KVM VM autotest", +"Command" :"power_kvm_vm_autotest", +"Func" : default_autotest, +"Report" : None, + }, + ] +}, +{ "Prefix" : "lpm6", "Memory" : "512", "Tests" : diff --git a/app/test/test_power.c b/app/test/test_power.c index d9eb420..64a2305 100644 --- a/app/test/test_power.c +++ b/app/test/test_power.c @@ -41,437 +41,66 @@ #include -#define TEST_POWER_LCORE_ID 2U -#define TEST_POWER_LCORE_INVALID ((unsigned)RTE_MAX_LCORE) -#define TEST_POWER_FREQS_NUM_MAX ((unsigned)RTE_MAX_LCORE_FREQS) - -#define TEST_POWER_SYSFILE_CUR_FREQ \ - "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq" - -static uint32_t total_freq_num; -static uint32_t freqs[TEST_POWER_FREQS_NUM_MAX]; - -static int -check_cur_freq(unsigned lcore_id, uint32_t idx) -{ -#define TEST_POWER_CONVERT_TO_DECIMAL 10 - FILE *f; - char fullpath[PATH_MAX]; - char buf[BUFSIZ]; - uint32_t cur_freq; - int ret = -1; - - if (snprintf(fullpath, sizeof(fullpath), - TEST_POWER_SYSFILE_CUR_FREQ, lcore_id) < 0) { - return 0; - } - f = fopen(fullpath, "r"); - if (f == NULL) { - return 0; - } - if (fgets(buf, sizeof(buf), f) == NULL) { - goto fail_get_cur_freq; - } - cur_freq = strtoul(buf, NULL, TEST_POWER_CONVERT_TO_DECIMAL); - ret = (freqs[idx] == cur_freq ? 0 : -1); - -fail_get_cur_freq: - fclose(f); - - return ret; -} - -/* Check rte_power_freqs() */ -static int -check_power_freqs(void) -{ - uint32_t ret; - - total_freq_num = 0; - memset(freqs, 0, sizeof(freqs)); - - /* test with an invalid lcore id */ - ret = rte_power_freqs(TEST_POWER_LCORE_INVALID, freqs, - TEST_POWER_FREQS_NUM_MAX); - if (ret > 0) { - printf("Unexpectedly get available freqs successfully on " - "lcore %u\n", TEST_POWER_LCORE_INVALID); - return -1; - } - - /* test with NULL buffer to save available freqs */ - ret = rte_power_freqs(TEST_POWER_LCORE_ID, NULL, - TEST_POWER_FREQS_NUM_MAX); - if (ret > 0) { - printf("Unexpectedly get available freqs successfully with " - "NULL buffer on lcore %u\n", TEST_POWER_LCORE_ID); - return -1; - } - - /* test of getting zero number of freqs */ - ret = rte_power_freqs(TEST_POWER_LCORE_ID, freqs, 0); - if (ret > 0) { - printf("Unexpectedly get available freqs successfully with " - "zero buffer size on lcore %u\n", TEST_POWER_LCORE_ID); - return -1; - } - - /* test with all valid input parameters */ - ret = rte_power_freqs(TEST_POWER_LCORE_ID, freqs, - TEST_POWER_FREQS_NUM_MAX); - if (ret == 0 || ret > TEST_POWER_FREQS_NUM_MAX) { - printf("Fail to get available fre
[dpdk-dev] [PATCH 1/5] vmxnet3: Fix VLAN Rx stripping
Shouldn't reset vlan_tci to 0 if a valid VLAN tag is stripped. Signed-off-by: Yong Wang --- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c index 263f9ce..986e5e5 100644 --- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c +++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c @@ -540,21 +540,19 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) /* Check for hardware stripped VLAN tag */ if (rcd->ts) { - PMD_RX_LOG(ERR, "Received packet with vlan ID: %d.", rcd->tci); rxm->ol_flags = PKT_RX_VLAN_PKT; - #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER VMXNET3_ASSERT(rxm && rte_pktmbuf_mtod(rxm, void *)); #endif /* Copy vlan tag in packet buffer */ - rxm->vlan_tci = rte_le_to_cpu_16( - (uint16_t)rcd->tci); - - } else + rxm->vlan_tci = rte_le_to_cpu_16((uint16_t)rcd->tci); + } else { rxm->ol_flags = 0; + rxm->vlan_tci = 0; + } /* Initialize newly received packet buffer */ rxm->port = rxq->port_id; @@ -563,11 +561,9 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) rxm->pkt_len = (uint16_t)rcd->len; rxm->data_len = (uint16_t)rcd->len; rxm->port = rxq->port_id; - rxm->vlan_tci = 0; rxm->data_off = RTE_PKTMBUF_HEADROOM; rx_pkts[nb_rx++] = rxm; - rcd_done: rxq->cmd_ring[ring_idx].next2comp = idx; VMXNET3_INC_RING_IDX_ONLY(rxq->cmd_ring[ring_idx].next2comp, rxq->cmd_ring[ring_idx].size); -- 1.9.1
[dpdk-dev] [PATCH 2/5] vmxnet3: Add VLAN Tx offload
Signed-off-by: Yong Wang --- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c index 986e5e5..0b6363f 100644 --- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c +++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c @@ -319,6 +319,12 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, txd->cq = 1; txd->eop = 1; + /* Add VLAN tag if requested */ + if (txm->ol_flags & PKT_TX_VLAN_PKT) { + txd->ti = 1; + txd->tci = rte_cpu_to_le_16(txm->vlan_tci); + } + /* Record current mbuf for freeing it later in tx complete */ #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER VMXNET3_ASSERT(txm); -- 1.9.1
[dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement
This patch series include various fixes and improvement to the vmxnet3 pmd driver. Yong Wang (5): vmxnet3: Fix VLAN Rx stripping vmxnet3: Add VLAN Tx offload vmxnet3: Fix dev stop/restart bug vmxnet3: Add rx pkt check offloads vmxnet3: Some perf improvement on the rx path lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 310 +- 1 file changed, 195 insertions(+), 115 deletions(-) -- 1.9.1
[dpdk-dev] [PATCH 3/5] vmxnet3: Fix dev stop/restart bug
This change makes vmxnet3 consistent with other pmds in terms of dev_stop behavior: rather than releasing tx/rx rings, it only resets the ring structure and release the pending mbufs. Verified with various tests (test-pmd and pktgen) over vmxnet3 that dev stop/restart works fine. Signed-off-by: Yong Wang --- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 78 --- 1 file changed, 73 insertions(+), 5 deletions(-) diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c index 0b6363f..2017d4b 100644 --- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c +++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c @@ -157,7 +157,7 @@ vmxnet3_txq_dump(struct vmxnet3_tx_queue *txq) #endif static inline void -vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring) +vmxnet3_cmd_ring_release_mbufs(vmxnet3_cmd_ring_t *ring) { while (ring->next2comp != ring->next2fill) { /* No need to worry about tx desc ownership, device is quiesced by now. */ @@ -171,16 +171,23 @@ vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring) } vmxnet3_cmd_ring_adv_next2comp(ring); } +} + +static void +vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring) +{ + vmxnet3_cmd_ring_release_mbufs(ring); rte_free(ring->buf_info); ring->buf_info = NULL; } + void vmxnet3_dev_tx_queue_release(void *txq) { vmxnet3_tx_queue_t *tq = txq; - if (txq != NULL) { + if (tq != NULL) { /* Release the cmd_ring */ vmxnet3_cmd_ring_release(&tq->cmd_ring); } @@ -192,13 +199,74 @@ vmxnet3_dev_rx_queue_release(void *rxq) int i; vmxnet3_rx_queue_t *rq = rxq; - if (rxq != NULL) { + if (rq != NULL) { /* Release both the cmd_rings */ for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++) vmxnet3_cmd_ring_release(&rq->cmd_ring[i]); } } +static void +vmxnet3_dev_tx_queue_reset(void *txq) +{ + vmxnet3_tx_queue_t *tq = txq; + struct vmxnet3_cmd_ring *ring = &tq->cmd_ring; + struct vmxnet3_comp_ring *comp_ring = &tq->comp_ring; + int size; + + if (tq != NULL) { + /* Release the cmd_ring mbufs */ + vmxnet3_cmd_ring_release_mbufs(&tq->cmd_ring); + } + + /* Tx vmxnet rings structure initialization*/ + ring->next2fill = 0; + ring->next2comp = 0; + ring->gen = VMXNET3_INIT_GEN; + comp_ring->next2proc = 0; + comp_ring->gen = VMXNET3_INIT_GEN; + + size = sizeof(struct Vmxnet3_TxDesc) * ring->size; + size += sizeof(struct Vmxnet3_TxCompDesc) * comp_ring->size; + + memset(ring->base, 0, size); +} + +static void +vmxnet3_dev_rx_queue_reset(void *rxq) +{ + int i; + vmxnet3_rx_queue_t *rq = rxq; + struct vmxnet3_cmd_ring *ring0, *ring1; + struct vmxnet3_comp_ring *comp_ring; + int size; + + if (rq != NULL) { + /* Release both the cmd_rings mbufs */ + for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++) + vmxnet3_cmd_ring_release_mbufs(&rq->cmd_ring[i]); + } + + ring0 = &rq->cmd_ring[0]; + ring1 = &rq->cmd_ring[1]; + comp_ring = &rq->comp_ring; + + /* Rx vmxnet rings structure initialization */ + ring0->next2fill = 0; + ring1->next2fill = 0; + ring0->next2comp = 0; + ring1->next2comp = 0; + ring0->gen = VMXNET3_INIT_GEN; + ring1->gen = VMXNET3_INIT_GEN; + comp_ring->next2proc = 0; + comp_ring->gen = VMXNET3_INIT_GEN; + + size = sizeof(struct Vmxnet3_RxDesc) * (ring0->size + ring1->size); + size += sizeof(struct Vmxnet3_RxCompDesc) * comp_ring->size; + + memset(ring0->base, 0, size); +} + void vmxnet3_dev_clear_queues(struct rte_eth_dev *dev) { @@ -211,7 +279,7 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev) if (txq != NULL) { txq->stopped = TRUE; - vmxnet3_dev_tx_queue_release(txq); + vmxnet3_dev_tx_queue_reset(txq); } } @@ -220,7 +288,7 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev) if (rxq != NULL) { rxq->stopped = TRUE; - vmxnet3_dev_rx_queue_release(rxq); + vmxnet3_dev_rx_queue_reset(rxq); } } } -- 1.9.1
[dpdk-dev] [PATCH 5/5] vmxnet3: Some perf improvement on the rx path
Signed-off-by: Yong Wang --- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 242 -- 1 file changed, 116 insertions(+), 126 deletions(-) diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c index e2fb8a8..4799f4d 100644 --- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c +++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c @@ -451,6 +451,19 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t ring_id) uint32_t i = 0, val = 0; struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id]; + if (ring_id == 0) { + /* Usually: One HEAD type buf per packet +* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ? +* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD; +*/ + + /* We use single packet buffer so all heads here */ + val = VMXNET3_RXD_BTYPE_HEAD; + } else { + /* All BODY type buffers for 2nd ring */ + val = VMXNET3_RXD_BTYPE_BODY; + } + while (vmxnet3_cmd_ring_desc_avail(ring) > 0) { struct Vmxnet3_RxDesc *rxd; struct rte_mbuf *mbuf; @@ -458,22 +471,9 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t ring_id) rxd = (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill); - if (ring->rid == 0) { - /* Usually: One HEAD type buf per packet -* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ? -* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD; -*/ - - /* We use single packet buffer so all heads here */ - val = VMXNET3_RXD_BTYPE_HEAD; - } else { - /* All BODY type buffers for 2nd ring; which won't be used at all by ESXi */ - val = VMXNET3_RXD_BTYPE_BODY; - } - /* Allocate blank mbuf for the current Rx Descriptor */ mbuf = rte_rxmbuf_alloc(rxq->mp); - if (mbuf == NULL) { + if (unlikely(mbuf == NULL)) { PMD_RX_LOG(ERR, "Error allocating mbuf in %s", __func__); rxq->stats.rx_buf_alloc_failure++; err = ENOMEM; @@ -536,151 +536,141 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) rcd = &rxq->comp_ring.base[rxq->comp_ring.next2proc].rcd; - if (rxq->stopped) { + if (unlikely(rxq->stopped)) { PMD_RX_LOG(DEBUG, "Rx queue is stopped."); return 0; } while (rcd->gen == rxq->comp_ring.gen) { - if (nb_rx >= nb_pkts) break; + idx = rcd->rxdIdx; ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1); rxd = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base + idx; rbi = rxq->cmd_ring[ring_idx].buf_info + idx; - if (rcd->sop != 1 || rcd->eop != 1) { + if (unlikely(rcd->sop != 1 || rcd->eop != 1)) { rte_pktmbuf_free_seg(rbi->m); - PMD_RX_LOG(DEBUG, "Packet spread across multiple buffers\n)"); goto rcd_done; + } - } else { - - PMD_RX_LOG(DEBUG, "rxd idx: %d ring idx: %d.", idx, ring_idx); + PMD_RX_LOG(DEBUG, "rxd idx: %d ring idx: %d.", idx, ring_idx); #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER - VMXNET3_ASSERT(rcd->len <= rxd->len); - VMXNET3_ASSERT(rbi->m); + VMXNET3_ASSERT(rcd->len <= rxd->len); + VMXNET3_ASSERT(rbi->m); #endif - if (rcd->len == 0) { - PMD_RX_LOG(DEBUG, "Rx buf was skipped. rxring[%d][%d]\n)", - ring_idx, idx); + if (unlikely(rcd->len == 0)) { + PMD_RX_LOG(DEBUG, "Rx buf was skipped. rxring[%d][%d]\n)", + ring_idx, idx); #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER - VMXNET3_ASSERT(rcd->sop && rcd->eop); + VMXNET3_ASSERT(rcd->sop && rcd->eop); #endif - rte_pktmbuf_free_seg(rbi->m); - - goto rcd_done; - } + rte_pktmbuf_free_seg(rbi->m); + goto rcd_done; + } - /* Assuming a packet is coming in a single packet buffer */ - if (rxd->btype != VMXNET3_RXD_BTYPE_HEAD) { - PMD_RX_LOG(DEBUG, - "Alert : Misbehaving device, incorrect " - " buffer type used. iPacket dropped."
[dpdk-dev] [PATCH 4/5] vmxnet3: Add rx pkt check offloads
Only supports IPv4 so far. Signed-off-by: Yong Wang --- lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c index 2017d4b..e2fb8a8 100644 --- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c +++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include #include @@ -614,7 +615,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) /* Check for hardware stripped VLAN tag */ if (rcd->ts) { - PMD_RX_LOG(ERR, "Received packet with vlan ID: %d.", + PMD_RX_LOG(DEBUG, "Received packet with vlan ID: %d.", rcd->tci); rxm->ol_flags = PKT_RX_VLAN_PKT; #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER @@ -637,6 +638,25 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) rxm->port = rxq->port_id; rxm->data_off = RTE_PKTMBUF_HEADROOM; + /* Check packet types, rx checksum errors, etc. Only support IPv4 so far. */ + if (rcd->v4) { + struct ether_hdr *eth = rte_pktmbuf_mtod(rxm, struct ether_hdr *); + struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 1); + + if (((ip->version_ihl & 0xf) << 2) > (int)sizeof(struct ipv4_hdr)) + rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT; + else + rxm->ol_flags |= PKT_RX_IPV4_HDR; + + if (!rcd->cnc) { + if (!rcd->ipc) + rxm->ol_flags |= PKT_RX_IP_CKSUM_BAD; + + if ((rcd->tcp || rcd->udp) && !rcd->tuc) + rxm->ol_flags |= PKT_RX_L4_CKSUM_BAD; + } + } + rx_pkts[nb_rx++] = rxm; rcd_done: rxq->cmd_ring[ring_idx].next2comp = idx; -- 1.9.1