[dpdk-dev] Newbie question about distributor library

2014-10-12 Thread Pattan, Reshma
Hi ,

Distributor was running as part of Rx core.

Thanks,
Reshma


> -Original Message-
> From: Wang, Shawn [mailto:xingbow at amazon.com]
> Sent: Friday, October 10, 2014 10:06 PM
> To: Pattan, Reshma; dev at dpdk.org
> Subject: RE: Newbie question about distributor library
> 
> 15.5 mpps is amazing.
> Does the RX core run the distributor or there is another distributor core?
> 
> Thanks.
> 
> From: Pattan, Reshma [reshma.pattan at intel.com]
> Sent: Friday, October 10, 2014 12:47 AM
> To: Wang, Shawn; dev at dpdk.org
> Subject: RE: Newbie question about distributor library
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang, Shawn
> > Sent: Thursday, October 9, 2014 7:00 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Newbie question about distributor library
> >
> > Hi:
> >
> > I am reading document about distributor library which is added in DPDK 1.7.
> > The document mentioned the packets are dynamically load balanced
> > between a set of worker cores.
> > So I am wondering is this the only reason we need distributor library?
> > What else could it give us?
> > Do we have any performance number on this new library?
> >
> > Thanks.
> 
> Hi,
> 
> Distributor library  only takes care of load distribution based on flow type 
> i.e.
> based on rss ( which is calculated based on 5 tuple of the packet) value of 
> mbuf
> to workers. Packets with same rss will be given to same worker. Hence 
> different
> flows goes to different workers.
> 
> 15.5 mpps was the performance  we achieved  with 8 worker cores, 1 Rx core
> and 1 TX core using sample application.
> 
> Thanks,
> Reshma
> 



[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK

2014-10-12 Thread Chao CH Zhu
Cyril,

Thanks for your comments! You are right. SSE needs to be splited. The 
current split is not a completed one. I'll continue to contribute.

Best Regards!
--
Chao Zhu (??)
Research Staff Member
Cloud Infrastructure and Technology Group
IBM China Research Lab
Building 19 Zhongguancun Software Park
8 Dongbeiwang West Road, Haidian District,
Beijing, PRC. 100193
Tel: +86-10-58748711
Email: bjzhuc at cn.ibm.com




From:   Cyril Chemparathy 
To: Chao CH Zhu/China/IBM at IBMCN, 
Date:   2014/10/07 05:39
Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture 
specific operations from DPDK



On 9/26/2014 2:33 AM, Chao Zhu wrote:
> The set of patches split x86 architecture specific operations from DPDK 
and put them to the
> arch directories of i686 and x86_64 architecture. This will make the 
adpotion of DPDK much easier
> on other computer architecture. For a new architecture, just add an 
architecture specific
> directory and necessary building configuration files, then DPDK can 
support it.

Wouldn't the SSE specifics in rte_common.h and rte_common_vect.h need to 
be similarly split out into architecture specifics?

Thanks
-- Cyril.





[dpdk-dev] [PATCH 0/7] Patches to split architecture specific operations from DPDK

2014-10-12 Thread Chao CH Zhu
David,

I agree that your idea may be better for the splitting. However, as Bruce 
said, I think people would like to see the multi-architecture support 
feature of DPDK first. We can improve it gradually. Do you have some 
comments?

Best Regards!
--
Chao Zhu 




From:   Bruce Richardson 
To: David Marchand 
Cc: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" 
Date:   2014/10/03 21:28
Subject:Re: [dpdk-dev] [PATCH 0/7] Patches to split architecture 
specific operations from DPDK



On Fri, Oct 03, 2014 at 03:21:53PM +0200, David Marchand wrote:
> Hello Chao,
> 
> On Fri, Sep 26, 2014 at 11:33 AM, Chao Zhu  wrote:
> 
> > The set of patches split x86 architecture specific operations from 
DPDK
> > and put them to the
> > arch directories of i686 and x86_64 architecture. This will make the
> > adpotion of DPDK much easier
> > on other computer architecture. For a new architecture, just add an
> > architecture specific
> > directory and necessary building configuration files, then DPDK can
> > support it.
> >
> >
> Here is a different approach for the headers splitting.
> 
> If we are going to support multiple architectures, the best would be to
> have a specific header for each arch which implements a common API (no 
need
> for any _arch suffix).
> These headers would be located in 
lib/librte_eal/common/include/arch/$arch/
> rather than lib/librte_eal/common/include/$arch/arch/ (which looks odd 
to
> me).
> Makefiles can add some -I for dpdk to build itself (and we can remove 
those
> symlinks from the makefiles).
> Makefiles only install the specific headers in RTE_SDK/include for use 
by
> applications.
> 
> For common code and documentation, we can add a "generic" directory in
> lib/librte_eal/common/include (or "arch-generic", or "shared" ... any
> better idea ?).
> DPDK makefiles installs the generic headers in RTE_SDK/include/generic.
> arch headers (like rte_atomic.h) include the generic one
> ().
> 
> These generic headers can be implemented using compiler intrinsics when
> possible.
> They also include the doxygen stuff in a single place.
> 
> 
> This would look like something like this, for rte_atomic.h :
> - in DPDK sources
> $ ls lib/librte_eal/common/include/*/rte_atomic.h
> lib/librte_eal/common/include/i686/rte_atomic.h
> lib/librte_eal/common/include/x86_64/rte_atomic.h
> lib/librte_eal/common/include/generic/rte_atomic.h
> 
> - in installed RTE_SDK
> $ ls RTE_SDK/include/{,*/}rte_atomic.h
> RTE_SDK/include/rte_atomic.h
> RTE_SDK/include/generic/rte_atomic.h
> 
> Comments ?
> 
> 
> I am only focusing on the first patchset at the moment, but if we can 
find
> consensus here, a respin of the two patchsets would be great.
> 
> Thanks.
> 
> -- 
> David Marchand


I would have no objection to such a scheme. However, I'm not seeing much 
advantage over the existing way of doing things. I think I'd rather see 
the 
proposed patch sets merged first and then any additional cleanup done, 
rather than holding up a worthwhile submission for a bit of tidy-up.

/Bruce




[dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet

2014-10-12 Thread Liang, Cunming
Hi Neil,

Very appreciate your comments.
I add inline reply, will send v3 asap when we get alignment.

BRs,
Liang Cunming

> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Saturday, October 11, 2014 1:52 AM
> To: Liang, Cunming
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx 
> cycles/packet
> 
> On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote:
> > It provides unit test to measure cycles/packet in NIC loopback mode.
> > It simply gives the average cycles of IO used per packet without test 
> > equipment.
> > When doing the test, make sure the link is UP.
> >
> > Usage Example:
> > 1. Run unit test app in interactive mode
> > app/test -c f -n 4 -- -i
> > 2. Run and wait for the result
> > pmd_perf_autotest
> >
> > There's option to choose rx/tx pair, default is vector.
> > set_rxtx_mode [vector|scalar|full|hybrid]
> > Note: To get acurate scalar fast, please choose 'vector' or 'hybrid' without
> INC_VEC=y in config
> >
> > Signed-off-by: Cunming Liang 
> > Acked-by: Bruce Richardson 
> 
> Notes inline
> 
> > ---
> >  app/test/Makefile   |1 +
> >  app/test/commands.c |   38 +++
> >  app/test/packet_burst_generator.c   |4 +-
> >  app/test/test.h |4 +
> >  app/test/test_pmd_perf.c|  626
> +++
> >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |6 +
> >  6 files changed, 677 insertions(+), 2 deletions(-)
> >  create mode 100644 app/test/test_pmd_perf.c
> >
> > diff --git a/app/test/Makefile b/app/test/Makefile
> > index 6af6d76..ebfa0ba 100644
> > --- a/app/test/Makefile
> > +++ b/app/test/Makefile
> > @@ -56,6 +56,7 @@ SRCS-y += test_memzone.c
> >
> >  SRCS-y += test_ring.c
> >  SRCS-y += test_ring_perf.c
> > +SRCS-y += test_pmd_perf.c
> >
> >  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> >  SRCS-y += test_table.c
> > diff --git a/app/test/commands.c b/app/test/commands.c
> > index a9e36b1..f1e746e 100644
> > --- a/app/test/commands.c
> > +++ b/app/test/commands.c
> > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit = {
> >
> > +#define NB_ETHPORTS_USED(1)
> > +#define NB_SOCKETS  (2)
> > +#define MEMPOOL_CACHE_SIZE 250
> > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) +
> RTE_PKTMBUF_HEADROOM)
> Don't you want to size this in accordance with the amount of data your sending
> (64 Bytes as noted above)?
[Liang, Cunming] The case is designed to measure small packet IO cost with 
normal mbuf size.
Even if decreasing the size, it won't gain significant cycles.
> 
> > +static void
> > +print_ethaddr(const char *name, const struct ether_addr *eth_addr)
> > +{
> > +   printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name,
> > +   eth_addr->addr_bytes[0],
> > +   eth_addr->addr_bytes[1],
> > +   eth_addr->addr_bytes[2],
> > +   eth_addr->addr_bytes[3],
> > +   eth_addr->addr_bytes[4],
> > +   eth_addr->addr_bytes[5]);
> > +}
> > +
> This was copieed from print_ethaddr.  Seems like a good candidate for a common
> function in rte_ether.h
[Liang, Cunming] Agree with you, some of samples now use it with the same copy.
I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for format the 
48bits address output.
And leaving other prints for application customization.
> 
> 
> > +}
> > +
> > +static void
> > +signal_handler(int signum)
> > +{
> > +   /* When we receive a USR1 signal, print stats */
> I think you mean SIGUSR2, below, SIGUSR1 tears the test down and exits the
> program
[Liang, Cunming] Thanks, it's a typo.
> 
> > +   if (signum == SIGUSR1) {
> SIGINT instead.  Thats the common practice.
[Liang, Cunming] I understood your opinion. 
The considerations I'm not using SIGINT instead are:
1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in command 
interactive.
  It always has to explicitly send signal. No matter SIGUSR1 or SIGINT.
2. By SIGINT semantic, expect to terminate the process.
  Here I expect to force stop this case, but still alive in command line.
  After it stopped, it can run again or start to run other test cases.
  So I keep SIGINT, SIGUSR1 in different behavior.
3. It should be rarely used. 
  Only when exception timeout, I leave this backdoor for automation test 
control.
  For manual test, we can easily force kill the process.

> 
> > +   printf("Force Stop!\n");
> > +   stop = 1;
> > +   }
> > +   if (signum == SIGUSR2)
> > +   stats_display(0);
> > +}
> > +/* main processing loop */
> > +static int
> > +main_loop(__rte_unused void *args)
> > +{
> > +#define PACKET_SIZE 64
> > +#define FRAME_GAP 12
> > +#define MAC_PREAMBLE 8
> > +   struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> > +   unsigned lcore_id;
> > +   unsigned i, portid, nb_rx = 0, nb_tx = 0;
> > +   struct lcore_conf *conf;
> > +   uint64_t prev_tsc, cur_tsc;
> > +   int pkt_per_port;
> > +   uint64

[dpdk-dev] DPDK - VIRTIO performance problems

2014-10-12 Thread Yan Freedland
Hi

I am checking performance DPDK VIRTIO mode running on KVM (Linux ubuntu 
3.11.0-15-generic).
The maximum throughput I reached was 4Gbps and then I saw an interesting 
phenomena.

Every ~2min traffic stopped completely and then immediately came back. This 
happened in a periodic fashion.

I have never seen such thing in a pass-through mode, where I reached very much 
higher rates of course.

Can you please help in resolving this problem in VIRTIO ?


Thank you
Yan




[dpdk-dev] DPDK - VIRTIO performance problems

2014-10-12 Thread Matthew Hall
On Sun, Oct 12, 2014 at 12:37:37PM +, Yan Freedland wrote:
> Every ~2min traffic stopped completely and then immediately came back. This 
> happened in a periodic fashion.

To me it sounds like it could be similar to what I've seen when I ran out of 
mbuf's or ran out of RX / TX descriptor entries. It could be worth checking 
the error counters on the interfaces with DPDK and Linux OS / ethtool to see 
what might be incrementing during the failed time periods.

Matthew.


[dpdk-dev] [PATCH v4 00/10] VM Power Management

2014-10-12 Thread Alan Carew
Virtual Machine Power Management.

The following patches add two DPDK sample applications and an alternate
implementation of librte_power for use in virtualized environments.
The idea is to provide librte_power functionality from within a VM to address
the lack of MSRs to facilitate frequency changes from within a VM.
It is ideally suited for Haswell which provides per core frequency scaling.

The current librte_power affects frequency changes via the acpi-cpufreq
'userspace' power governor, accessed via sysfs.

General Overview:(more information in each patch that follows).
The VM Power Management solution provides two components:

 1)VM: Allows for the a DPDK application in a VM to reuse the librte_power
 interface. Each lcore opens a Virto-Serial endpoint channel to the host,
 where the re-implementation of librte_power simply forwards the requests for
 frequency change to a host based monitor. The host monitor itself uses
 librte_power.
 Each lcore channel corresponds to a
 serial device '/dev/virtio-ports/virtio.serial.port.poweragent.'
 which is opened in non-blocking mode.
 While each Virtual CPU can be mapped to multiple physical CPUs it is
 recommended that each vCPU should be mapped to a single core only.

 2)Host: The host monitor is managed by a CLI, it allows for adding qemu/KVM
 virtual machines and associated channels to the monitor, manually changing
 CPU frequency, inspecting the state of VMs, vCPU to pCPU pinning and managing
 channels.
 Host channel endpoints are Virto-Serial endpoints configured as AF_UNIX file
 sockets which follow a specific naming convention
 i.e /tmp/powermonitor/.,
 each channel has an 1:1 mapping to a VM endpoint
 i.e. /dev/virtio-ports/virtio.serial.port.poweragent.
 Host channel endpoints are opened in non-blocking mode and are monitored via 
epoll.
 Requests over each channel to change frequency are forwarded to the original
 librte_power.

Channels must be manually configured as qemu-kvm command line arguments or
libvirt domain definition(xml) e.g.

 


  
  


Where multiple channels can be configured by specifying multiple 
elements, by replacing , .
(port number) should be incremented by 1 for each new channel element.
More information on Virtio-Serial can be found here:
http://fedoraproject.org/wiki/Features/VirtioSerial
To enable the Hypervisor creation of channels, the host endpoint directory
must be created with qemu permissions:
mkdir /tmp/powermonitor
chown qemu:qemu /tmp/powermonitor

The host application runs on two separate lcores:
Core N) CLI: For management of Virtual Machines adding channels to Monitor 
thread,
 inspecting state and manually setting CPU frequency [PATCH 02/09]
Core N+1) Monitor Thread: An epoll based infinite loop that waits on channel 
events
 from VMs and calls the corresponding librte_power functions.

A sample application is also provided to run on Virtual Machines, this
application provides a CLI to manually set the frequency of a 
vCPU[PATCH 08/09]

The current l3fwd-power sample application can also be run on a VM.

Changes in V4:
 Fixed double free of channel during VM shutdown.

Changes in V3:
 Fixed crash in Guest CLI when host application is not running.
 Renamed #defines to be more specific to the module they belong
 Added vCPU pinning via CLI

Changes in V2:
 Runtime selection of librte_power implementations.
 Updated Unit tests to cover librte_power changes.
 PATCH[0/3] was sent twice, again as PATCH[0/4]
 Miscellaneous fixes.

Alan Carew (10):
  Channel Manager and Monitor for VM Power Management(Host).
  VM Power Management CLI(Host).
  CPU Frequency Power Management(Host).
  VM Power Management application and Makefile.
  VM Power Management CLI(Guest).
  VM communication channels for VM Power Management(Guest).
  librte_power common interface for Guest and Host
  Packet format for VM Power Management(Host and Guest).
  Build system integration for VM Power Management(Guest and Host)
  VM Power Management Unit Tests

 app/test/Makefile  |   3 +-
 app/test/autotest_data.py  |  26 +
 app/test/test_power.c  | 445 +---
 app/test/test_power_acpi_cpufreq.c | 544 ++
 app/test/test_power_kvm_vm.c   | 308 
 examples/vm_power_manager/Makefile |  57 ++
 examples/vm_power_manager/channel_manager.c| 804 +
 examples/vm_power_manager/channel_manager.h| 314 
 examples/vm_power_manager/channel_monitor.c| 231 ++
 examples/vm_power_manager/channel_monitor.h| 102 +++
 examples/vm_power_manager/guest_cli/Makefile   |  56 ++
 examples/vm_power_manager/guest_cli/main.c |  87 +++
 examples/vm_power_manager/guest_cli/main.h |  52 ++
 .../guest_cli/vm_power_cli_guest.c | 155 
 .../guest_cli/vm_power_cli_guest.h |  55 ++
 examples/vm_power_manager/main.c  

[dpdk-dev] [PATCH v4 02/10] VM Power Management CLI(Host).

2014-10-12 Thread Alan Carew
The CLI is used for administrating the channel monitor and manager and
manually setting the CPU frequency on the host.

Supports the following commands:
 add_vm [Mul-choice STRING]: add_vm|rm_vm , add a VM for subsequent
  operations with the CLI or remove a previously added VM from the VM Power
  Manager

 rm_vm [Mul-choice STRING]: add_vm|rm_vm , add a VM for subsequent
  operations with the CLI or remove a previously added VM from the VM Power
  Manager

 add_channels [Fixed STRING]: add_channels  |all, add
  communication channels for the specified VM, the virtio channels must be
  enabled in the VM configuration(qemu/libvirt) and the associated VM must be
  active.  is a comma-separated list of channel numbers to add, using the
  keyword 'all' will attempt to add all channels for the VM

 set_channel_status [Fixed STRING]:
  set_channel_status  |all enabled|disabled,  enable or disable
  the communication channels in list(comma-seperated) for the specified VM,
  alternatively list can be replaced with keyword 'all'. Disabled channels will
  still receive packets on the host, however the commands they specify will be
  ignored. Set status to 'enabled' to begin processing requests again.

 show_vm [Fixed STRING]: show_vm , prints the information on the
  specified VM(s), the information lists the number of vCPUS, the pinning to
  pCPU(s) as a bit mask, along with any communication channels associated with
  each VM

 show_cpu_freq_mask [Fixed STRING]: show_cpu_freq_mask , Get the current
  frequency for each core specified in the mask

 set_cpu_freq_mask [Fixed STRING]: set_cpu_freq  ,
  Set the current frequency for the cores specified in  by scaling
  each up/down/min/max.

 show_cpu_freq [Fixed STRING]: Get the current frequency for the specified core

 set_cpu_freq [Fixed STRING]: set_cpu_freq  ,
  Set the current frequency for the specified core by scaling up/down/min/max

 quit [Fixed STRING]: close the application

Signed-off-by: Alan Carew 
---
 examples/vm_power_manager/vm_power_cli.c | 669 +++
 examples/vm_power_manager/vm_power_cli.h |  47 +++
 2 files changed, 716 insertions(+)
 create mode 100644 examples/vm_power_manager/vm_power_cli.c
 create mode 100644 examples/vm_power_manager/vm_power_cli.h

diff --git a/examples/vm_power_manager/vm_power_cli.c 
b/examples/vm_power_manager/vm_power_cli.c
new file mode 100644
index 000..e162e88
--- /dev/null
+++ b/examples/vm_power_manager/vm_power_cli.c
@@ -0,0 +1,669 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vm_power_cli.h"
+#include "channel_manager.h"
+#include "channel_monitor.h"
+#include "power_manager.h"
+#include "channel_commands.h"
+
+struct cmd_quit_result {
+   cmdline_fixed_string_t quit;
+};
+
+static void cmd_quit_parsed(__attribute__((unused)) void *parsed_result,
+   struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   channel_monitor_exit();
+   channel_manager_exit();
+   power_manager_exit();
+   cmdline_quit(cl);
+}
+
+cmdline_parse_token_string_t cmd_quit_quit =
+   TOKEN_STRING_INITIALIZER(struct cmd_quit_result, qu

[dpdk-dev] [PATCH v4 01/10] Channel Manager and Monitor for VM Power Management(Host).

2014-10-12 Thread Alan Carew
The manager is responsible for adding communications channels to the Monitor
thread, tracking and reporting VM state and employs the libvirt API for
synchronization with the KVM Hypervisor. The manager interacts with the
Hypervisor to discover the mapping of virtual CPUS(vCPUs) to the host
physical CPUS(pCPUs) and to inspect the VM running state.

The manager provides the following functionality to the CLI:
1) Connect to a libvirtd instance, default: qemu:///system
2) Add a VM to an internal list, each VM is identified by a "name" which must
   correspond a valid libvirt Domain Name.
3) Add communication channels associated with a VM to the epoll based Monitor
   thread.
   The channels must exist and be in the form of:
   /tmp/powermonitor/.. Each channel is a
   Virtio-Serial endpoint configured as an AF_UNIX file socket and opened in
   non-blocking mode.
   Each VM can have a maximum of 64 channels associated with it.
4) Disable or re-enable VM communication channels, channels once added to the
   Monitor thread remain in that threads control, however acting on channel
   requests can be disabled and renabled via CLI.

The monitor is an epoll based infinite loop running in a separate thread that
waits on channel events from VMs and calls the corresponding functions. Channel
definitions from the manager are registered via the epoll event opaque pointer
when calling epoll_ctl(EPOLL_CTL_ADD), this allows for obtaining the channels
file descriptor for reading EPOLLIN events and mapping the vCPU to pCPU(s)
associated with a request from a particular VM.

Signed-off-by: Alan Carew 
---
 examples/vm_power_manager/channel_manager.c | 804 
 examples/vm_power_manager/channel_manager.h | 314 +++
 examples/vm_power_manager/channel_monitor.c | 231 
 examples/vm_power_manager/channel_monitor.h | 102 
 4 files changed, 1451 insertions(+)
 create mode 100644 examples/vm_power_manager/channel_manager.c
 create mode 100644 examples/vm_power_manager/channel_manager.h
 create mode 100644 examples/vm_power_manager/channel_monitor.c
 create mode 100644 examples/vm_power_manager/channel_monitor.h

diff --git a/examples/vm_power_manager/channel_manager.c 
b/examples/vm_power_manager/channel_manager.c
new file mode 100644
index 000..a14f191
--- /dev/null
+++ b/examples/vm_power_manager/channel_manager.c
@@ -0,0 +1,804 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "channel_manager.h"
+#include "channel_commands.h"
+#include "channel_monitor.h"
+
+
+#define RTE_LOGTYPE_CHANNEL_MANAGER RTE_LOGTYPE_USER1
+
+#define ITERATIVE_BITMASK_CHECK_64(mask_u64b, i) \
+   for (i = 0; mask_u64b; mask_u64b &= ~(1ULL << i++)) \
+   if ((mask_u64b >> i) & 1) \
+
+/* Global pointer to libvirt connection */
+static virConnectPtr global_vir_conn_ptr;
+
+static unsigned char *global_cpumaps;
+static virVcpuInfo *global_vircpuinfo;
+static size_t global_maplen;
+
+static unsigned global_n_host_cpus;
+
+/*
+ * Represents a single Virtual Machine
+ */
+struct virtual_machine_info {
+   cha

[dpdk-dev] [PATCH v4 03/10] CPU Frequency Power Management(Host).

2014-10-12 Thread Alan Carew
A wrapper around librte_power(using ACPI cpufreq), providing locking around the
non-threadsafe library, allowing for frequency changes based on core masks and
core numbers from both the CLI thread and epoll monitor thread.

Signed-off-by: Alan Carew 
---
 examples/vm_power_manager/power_manager.c | 244 ++
 examples/vm_power_manager/power_manager.h | 188 +++
 2 files changed, 432 insertions(+)
 create mode 100644 examples/vm_power_manager/power_manager.c
 create mode 100644 examples/vm_power_manager/power_manager.h

diff --git a/examples/vm_power_manager/power_manager.c 
b/examples/vm_power_manager/power_manager.c
new file mode 100644
index 000..b7b1fca
--- /dev/null
+++ b/examples/vm_power_manager/power_manager.c
@@ -0,0 +1,244 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "power_manager.h"
+
+#define RTE_LOGTYPE_POWER_MANAGER RTE_LOGTYPE_USER1
+
+#define POWER_SCALE_CORE(DIRECTION, core_num , ret) do { \
+   if (core_num >= POWER_MGR_MAX_CPUS) \
+   return -1; \
+   if (!(global_enabled_cpus & (1ULL << core_num))) \
+   return -1; \
+   rte_spinlock_lock(&global_core_freq_info[core_num].power_sl); \
+   ret = rte_power_freq_##DIRECTION(core_num); \
+   rte_spinlock_unlock(&global_core_freq_info[core_num].power_sl); \
+} while (0)
+
+#define POWER_SCALE_MASK(DIRECTION, core_mask, ret) do { \
+   int i; \
+   for (i = 0; core_mask; core_mask &= ~(1 << i++)) { \
+   if ((core_mask >> i) & 1) { \
+   if (!(global_enabled_cpus & (1ULL << i))) \
+   continue; \
+   rte_spinlock_lock(&global_core_freq_info[i].power_sl); \
+   if (rte_power_freq_##DIRECTION(i) != 1) \
+   ret = -1; \
+   rte_spinlock_unlock(&global_core_freq_info[i].power_sl); \
+   } \
+   } \
+} while (0)
+
+struct freq_info {
+   rte_spinlock_t power_sl;
+   uint32_t freqs[RTE_MAX_LCORE_FREQS];
+   unsigned num_freqs;
+} __rte_cache_aligned;
+
+static struct freq_info global_core_freq_info[POWER_MGR_MAX_CPUS];
+
+static uint64_t global_enabled_cpus;
+
+#define SYSFS_CPU_PATH "/sys/devices/system/cpu/cpu%u/topology/core_id"
+
+static unsigned
+set_host_cpus_mask(void)
+{
+   char path[PATH_MAX];
+   unsigned i;
+   unsigned num_cpus = 0;
+   for (i = 0; i < POWER_MGR_MAX_CPUS; i++) {
+   snprintf(path, sizeof(path), SYSFS_CPU_PATH, i);
+   if (access(path, F_OK) == 0) {
+   global_enabled_cpus |= 1ULL << i;
+   num_cpus++;
+   } else
+   return num_cpus;
+   }
+   return num_cpus;
+}
+
+int
+power_manager_init(void)
+{
+   unsigned i, num_cpus;
+   uint64_t cpu_mask;
+   int ret = 0;
+
+   num_cpus = set_host_cpus_mask();
+   if (num_cpus == 0) {
+   RTE_LOG(ERR, POWER_MANAGER, "Unable to detected host CPUs, 
please "
+   "ensure that sufficient privileges exist to 
inspect sysfs\n");
+   return -1;
+  

[dpdk-dev] [PATCH v4 04/10] VM Power Management application and Makefile.

2014-10-12 Thread Alan Carew
For launching CLI thread and Monitor thread and initialising
resources.
Requires a minimum of two lcores to run, additional cores specified by eal core
mask are not used.

Signed-off-by: Alan Carew 
---
 examples/vm_power_manager/Makefile |  57 ++
 examples/vm_power_manager/main.c   | 117 +
 examples/vm_power_manager/main.h   |  52 +
 3 files changed, 226 insertions(+)
 create mode 100644 examples/vm_power_manager/Makefile
 create mode 100644 examples/vm_power_manager/main.c
 create mode 100644 examples/vm_power_manager/main.h

diff --git a/examples/vm_power_manager/Makefile 
b/examples/vm_power_manager/Makefile
new file mode 100644
index 000..7d6f943
--- /dev/null
+++ b/examples/vm_power_manager/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-default-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = vm_power_mgr
+
+# all source are stored in SRCS-y
+SRCS-y := main.c vm_power_cli.c power_manager.c channel_manager.c
+SRCS-y += channel_monitor.c
+
+CFLAGS += -O3 -lvirt -I$(RTE_SDK)/lib/librte_power/
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c
new file mode 100644
index 000..875274e
--- /dev/null
+++ b/examples/vm_power_manager/main.c
@@ -0,0 +1,117 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTH

[dpdk-dev] [PATCH v4 06/10] VM communication channels for VM Power Management(Guest).

2014-10-12 Thread Alan Carew
Allows for the opening of Virtio-Serial devices on a VM, where a DPDK
application can send packets to the host based monitor. The packet formatted is
specified in channel_commands.h
Each device appears as a serial device in path
/dev/virtio-ports/virtio.serial.port.. where each lcore
in a DPDK application has exclusive to a device/channel.
Each channel is opened in non-blocking mode, after a successful open a test
packet is send to the host to ensure the host side is monitoring.

Signed-off-by: Alan Carew 
---
 lib/librte_power/guest_channel.c | 162 +++
 lib/librte_power/guest_channel.h |  89 +
 2 files changed, 251 insertions(+)
 create mode 100644 lib/librte_power/guest_channel.c
 create mode 100644 lib/librte_power/guest_channel.h

diff --git a/lib/librte_power/guest_channel.c b/lib/librte_power/guest_channel.c
new file mode 100644
index 000..2295665
--- /dev/null
+++ b/lib/librte_power/guest_channel.c
@@ -0,0 +1,162 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#include 
+#include 
+
+#include "guest_channel.h"
+#include "channel_commands.h"
+
+#define RTE_LOGTYPE_GUEST_CHANNEL RTE_LOGTYPE_USER1
+
+static int global_fds[RTE_MAX_LCORE];
+
+int
+guest_channel_host_connect(const char *path, unsigned lcore_id)
+{
+   int flags, ret;
+   struct channel_packet pkt;
+   char fd_path[PATH_MAX];
+   int fd = -1;
+
+   if (lcore_id >= RTE_MAX_LCORE) {
+   RTE_LOG(ERR, GUEST_CHANNEL, "Channel(%u) is out of range 
0...%d\n",
+   lcore_id, RTE_MAX_LCORE-1);
+   return -1;
+   }
+   /* check if path is already open */
+   if (global_fds[lcore_id] != 0) {
+   RTE_LOG(ERR, GUEST_CHANNEL, "Channel(%u) is already open with 
fd %d\n",
+   lcore_id, global_fds[lcore_id]);
+   return -1;
+   }
+
+   snprintf(fd_path, PATH_MAX, "%s.%u", path, lcore_id);
+   RTE_LOG(INFO, GUEST_CHANNEL, "Opening channel '%s' for lcore %u\n",
+   fd_path, lcore_id);
+   fd = open(fd_path, O_RDWR);
+   if (fd < 0) {
+   RTE_LOG(ERR, GUEST_CHANNEL, "Unable to to connect to '%s' with 
error "
+   "%s\n", fd_path, strerror(errno));
+   return -1;
+   }
+
+   flags = fcntl(fd, F_GETFL, 0);
+   if (flags < 0) {
+   RTE_LOG(ERR, GUEST_CHANNEL, "Failed on fcntl get flags for file 
%s\n",
+   fd_path);
+   goto error;
+   }
+
+   flags |= O_NONBLOCK;
+   if (fcntl(fd, F_SETFL, flags) < 0) {
+   RTE_LOG(ERR, GUEST_CHANNEL, "Failed on setting non-blocking 
mode for "
+   "file %s", fd_path);
+   goto error;
+   }
+   /* QEMU needs a delay after connection */
+   sleep(1);
+
+   /* Send a test packet, this command is ignored by the host, but a 
successful
+* send indicates that the host endpoint is monitoring.
+*/
+   pkt.command = CPU_POWER_CONNECT;
+   global_fds[lcore_id] = fd;
+   ret = guest_channel_send_msg(&pkt, lcore_id);
+   if (ret != 0) {
+  

[dpdk-dev] [PATCH v4 05/10] VM Power Management CLI(Guest).

2014-10-12 Thread Alan Carew
Provides a small sample application(guest_vm_power_mgr) to run on a VM.
The application is run by providing a core mask(-c) and number of memory
channels(-n). The core mask corresponds to the number of lcore channels to
attempt to open. A maximum of 64 channels per VM is allowed. The channels must
be monitored by the host.
After successful initialisation a CPU frequency command can be sent to the host
using:
set_cpu_freq  .

Signed-off-by: Alan Carew 
---
 examples/vm_power_manager/guest_cli/Makefile   |  56 
 examples/vm_power_manager/guest_cli/main.c |  87 
 examples/vm_power_manager/guest_cli/main.h |  52 +++
 .../guest_cli/vm_power_cli_guest.c | 155 +
 .../guest_cli/vm_power_cli_guest.h |  55 
 5 files changed, 405 insertions(+)
 create mode 100644 examples/vm_power_manager/guest_cli/Makefile
 create mode 100644 examples/vm_power_manager/guest_cli/main.c
 create mode 100644 examples/vm_power_manager/guest_cli/main.h
 create mode 100644 examples/vm_power_manager/guest_cli/vm_power_cli_guest.c
 create mode 100644 examples/vm_power_manager/guest_cli/vm_power_cli_guest.h

diff --git a/examples/vm_power_manager/guest_cli/Makefile 
b/examples/vm_power_manager/guest_cli/Makefile
new file mode 100644
index 000..167a7ed
--- /dev/null
+++ b/examples/vm_power_manager/guest_cli/Makefile
@@ -0,0 +1,56 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, can be overriden by command line or environment
+RTE_TARGET ?= x86_64-default-linuxapp-gcc
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# binary name
+APP = guest_vm_power_mgr
+
+# all source are stored in SRCS-y
+SRCS-y := main.c vm_power_cli_guest.c
+
+CFLAGS += -O3 -I$(RTE_SDK)/lib/librte_power/
+CFLAGS += $(WERROR_FLAGS)
+
+# workaround for a gcc bug with noreturn attribute
+# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12603
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+CFLAGS_main.o += -Wno-return-type
+endif
+
+include $(RTE_SDK)/mk/rte.extapp.mk
diff --git a/examples/vm_power_manager/guest_cli/main.c 
b/examples/vm_power_manager/guest_cli/main.c
new file mode 100644
index 000..1e4767a
--- /dev/null
+++ b/examples/vm_power_manager/guest_cli/main.c
@@ -0,0 +1,87 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPL

[dpdk-dev] [PATCH v4 08/10] Packet format for VM Power Management(Host and Guest).

2014-10-12 Thread Alan Carew
Provides a command packet format for host and guest.

Signed-off-by: Alan Carew 
---
 lib/librte_power/channel_commands.h | 77 +
 1 file changed, 77 insertions(+)
 create mode 100644 lib/librte_power/channel_commands.h

diff --git a/lib/librte_power/channel_commands.h 
b/lib/librte_power/channel_commands.h
new file mode 100644
index 000..7e78a8b
--- /dev/null
+++ b/lib/librte_power/channel_commands.h
@@ -0,0 +1,77 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef CHANNEL_COMMANDS_H_
+#define CHANNEL_COMMANDS_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include 
+
+/* Maximum number of CPUs */
+#define CHANNEL_CMDS_MAX_CPUS64
+#if CHANNEL_CMDS_MAX_CPUS > 64
+#error Maximum number of cores is 64, overflow is guaranteed to \
+   cause problems with VM Power Management
+#endif
+
+/* Maximum number of channels per VM */
+#define CHANNEL_CMDS_MAX_VM_CHANNELS 64
+
+/* Maximum number of channels per VM */
+#define CHANNEL_CMDS_MAX_VM_CHANNELS 64
+
+/* Valid Commands */
+#define CPU_POWER   1
+#define CPU_POWER_CONNECT   2
+
+/* CPU Power Command Scaling */
+#define CPU_POWER_SCALE_UP  1
+#define CPU_POWER_SCALE_DOWN2
+#define CPU_POWER_SCALE_MAX 3
+#define CPU_POWER_SCALE_MIN 4
+
+struct channel_packet {
+   uint64_t resource_id; /**< core_num, device */
+   uint32_t unit;/**< scale down/up/min/max */
+   uint32_t command; /**< Power, IO, etc */
+};
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* CHANNEL_COMMANDS_H_ */
-- 
1.9.3



[dpdk-dev] [PATCH v4 09/10] Build system integration for VM Power Management(Guest and Host)

2014-10-12 Thread Alan Carew
librte_power now contains both rte_power_acpi_cpufreq and rte_power_kvm_vm
implementations.

Signed-off-by: Alan Carew 
---
 lib/librte_power/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_power/Makefile b/lib/librte_power/Makefile
index 6185812..d672a5a 100644
--- a/lib/librte_power/Makefile
+++ b/lib/librte_power/Makefile
@@ -37,7 +37,8 @@ LIB = librte_power.a
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3 -fno-strict-aliasing

 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c
+SRCS-$(CONFIG_RTE_LIBRTE_POWER) := rte_power.c rte_power_acpi_cpufreq.c
+SRCS-$(CONFIG_RTE_LIBRTE_POWER) += rte_power_kvm_vm.c guest_channel.c

 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_POWER)-include := rte_power.h
-- 
1.9.3



[dpdk-dev] [PATCH v4 07/10] librte_power common interface for Guest and Host

2014-10-12 Thread Alan Carew
Moved the current librte_power implementation to rte_power_acpi_cpufreq, with
renaming of functions only.
Added rte_power_kvm_vm implmentation to support Power Management from a VM.

librte_power now hides the implementation based on the environment used.
A new call rte_power_set_env() can explicidly set the environment, if not
called then auto-detection takes place.

rte_power_kvm_vm is subset of the librte_power APIs, the following is supported:
 rte_power_init(unsigned lcore_id)
 rte_power_exit(unsigned lcore_id)
 rte_power_freq_up(unsigned lcore_id)
 rte_power_freq_down(unsigned lcore_id)
 rte_power_freq_min(unsigned lcore_id)
 rte_power_freq_max(unsigned lcore_id)

The other unsupported APIs return -ENOTSUP

Signed-off-by: Alan Carew 
---
 lib/librte_power/rte_power.c  | 540 -
 lib/librte_power/rte_power.h  | 120 +--
 lib/librte_power/rte_power_acpi_cpufreq.c | 545 ++
 lib/librte_power/rte_power_acpi_cpufreq.h | 192 +++
 lib/librte_power/rte_power_common.h   |  39 +++
 lib/librte_power/rte_power_kvm_vm.c   | 135 
 lib/librte_power/rte_power_kvm_vm.h   | 179 ++
 7 files changed, 1248 insertions(+), 502 deletions(-)
 create mode 100644 lib/librte_power/rte_power_acpi_cpufreq.c
 create mode 100644 lib/librte_power/rte_power_acpi_cpufreq.h
 create mode 100644 lib/librte_power/rte_power_common.h
 create mode 100644 lib/librte_power/rte_power_kvm_vm.c
 create mode 100644 lib/librte_power/rte_power_kvm_vm.h

diff --git a/lib/librte_power/rte_power.c b/lib/librte_power/rte_power.c
index 856da9a..998ed1c 100644
--- a/lib/librte_power/rte_power.c
+++ b/lib/librte_power/rte_power.c
@@ -31,515 +31,113 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
 #include 

 #include "rte_power.h"
+#include "rte_power_acpi_cpufreq.h"
+#include "rte_power_kvm_vm.h"
+#include "rte_power_common.h"

-#ifdef RTE_LIBRTE_POWER_DEBUG
-#define POWER_DEBUG_TRACE(fmt, args...) do { \
-   RTE_LOG(ERR, POWER, "%s: " fmt, __func__, ## args); \
-   } while (0)
-#else
-#define POWER_DEBUG_TRACE(fmt, args...)
-#endif
-
-#define FOPEN_OR_ERR_RET(f, retval) do { \
-   if ((f) == NULL) { \
-   RTE_LOG(ERR, POWER, "File not openned\n"); \
-   return (retval); \
-   } \
-} while(0)
-
-#define FOPS_OR_NULL_GOTO(ret, label) do { \
-   if ((ret) == NULL) { \
-   RTE_LOG(ERR, POWER, "fgets returns nothing\n"); \
-   goto label; \
-   } \
-} while(0)
-
-#define FOPS_OR_ERR_GOTO(ret, label) do { \
-   if ((ret) < 0) { \
-   RTE_LOG(ERR, POWER, "File operations failed\n"); \
-   goto label; \
-   } \
-} while(0)
-
-#define STR_SIZE 1024
-#define POWER_CONVERT_TO_DECIMAL 10
+enum power_management_env global_default_env = PM_ENV_NOT_SET;

-#define POWER_GOVERNOR_USERSPACE "userspace"
-#define POWER_SYSFILE_GOVERNOR   \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_governor"
-#define POWER_SYSFILE_AVAIL_FREQ \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_available_frequencies"
-#define POWER_SYSFILE_SETSPEED   \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_setspeed"
+volatile uint32_t global_env_cfg_status = 0;

-enum power_state {
-   POWER_IDLE = 0,
-   POWER_ONGOING,
-   POWER_USED,
-   POWER_UNKNOWN
-};
+/* function pointers */
+rte_power_freqs_t rte_power_freqs  = NULL;
+rte_power_get_freq_t rte_power_get_freq = NULL;
+rte_power_set_freq_t rte_power_set_freq = NULL;
+rte_power_freq_change_t rte_power_freq_up = NULL;
+rte_power_freq_change_t rte_power_freq_down = NULL;
+rte_power_freq_change_t rte_power_freq_max = NULL;
+rte_power_freq_change_t rte_power_freq_min = NULL;

-/**
- * Power info per lcore.
- */
-struct rte_power_info {
-   unsigned lcore_id;   /**< Logical core id */
-   uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
-   uint32_t nb_freqs;   /**< number of available freqs */
-   FILE *f; /**< FD of scaling_setspeed */
-   char governor_ori[32];   /**< Original governor name */
-   uint32_t curr_idx;   /**< Freq index in freqs array */
-   volatile uint32_t state; /**< Power in use state */
-} __rte_cache_aligned;
-
-static struct rte_power_info lcore_power_info[RTE_MAX_LCORE];
-
-/**
- * It is to set specific freq for specific logical core, according to the index
- * of supported frequencies.
- */
-static int
-set_freq_internal(struct rte_power_info *pi, uint32_t idx)
+int
+rte_power_set_env(enum power_management_env env)
 {
-   if (idx >= RTE_MAX_LCORE_FREQS || idx >= pi->nb_freqs) {
-   RTE_LOG(ERR, POWER, "Invalid frequency index %u, which "
-

[dpdk-dev] [PATCH v4 10/10] VM Power Management Unit Tests

2014-10-12 Thread Alan Carew
Updated the unit tests to cover both librte_power implementations as well as
the external API.

Signed-off-by: Alan Carew 
---
 app/test/Makefile  |   3 +-
 app/test/autotest_data.py  |  26 ++
 app/test/test_power.c  | 445 +++---
 app/test/test_power_acpi_cpufreq.c | 544 +
 app/test/test_power_kvm_vm.c   | 308 +
 5 files changed, 917 insertions(+), 409 deletions(-)
 create mode 100644 app/test/test_power_acpi_cpufreq.c
 create mode 100644 app/test/test_power_kvm_vm.c

diff --git a/app/test/Makefile b/app/test/Makefile
index 6af6d76..9417eda 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -119,7 +119,8 @@ endif

 SRCS-$(CONFIG_RTE_LIBRTE_METER) += test_meter.c
 SRCS-$(CONFIG_RTE_LIBRTE_KNI) += test_kni.c
-SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power.c
+SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power.c test_power_acpi_cpufreq.c
+SRCS-$(CONFIG_RTE_LIBRTE_POWER) += test_power_kvm_vm.c
 SRCS-y += test_common.c
 SRCS-$(CONFIG_RTE_LIBRTE_IVSHMEM) += test_ivshmem.c

diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 878c72e..618a946 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -425,6 +425,32 @@ non_parallel_test_group_list = [
]
 },
 {
+   "Prefix" :  "power_acpi_cpufreq",
+   "Memory" :  all_sockets(512),
+   "Tests" :
+   [
+   {
+"Name" :   "Power ACPI cpufreq autotest",
+"Command" :"power_acpi_cpufreq_autotest",
+"Func" :   default_autotest,
+"Report" : None,
+   },
+   ]
+},
+{
+   "Prefix" :  "power_kvm_vm",
+   "Memory" :  "512",
+   "Tests" :
+   [
+   {
+"Name" :   "Power KVM VM  autotest",
+"Command" :"power_kvm_vm_autotest",
+"Func" :   default_autotest,
+"Report" : None,
+   },
+   ]
+},
+{
"Prefix" :  "lpm6",
"Memory" :  "512",
"Tests" :
diff --git a/app/test/test_power.c b/app/test/test_power.c
index d9eb420..64a2305 100644
--- a/app/test/test_power.c
+++ b/app/test/test_power.c
@@ -41,437 +41,66 @@

 #include 

-#define TEST_POWER_LCORE_ID  2U
-#define TEST_POWER_LCORE_INVALID ((unsigned)RTE_MAX_LCORE)
-#define TEST_POWER_FREQS_NUM_MAX ((unsigned)RTE_MAX_LCORE_FREQS)
-
-#define TEST_POWER_SYSFILE_CUR_FREQ \
-   "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq"
-
-static uint32_t total_freq_num;
-static uint32_t freqs[TEST_POWER_FREQS_NUM_MAX];
-
-static int
-check_cur_freq(unsigned lcore_id, uint32_t idx)
-{
-#define TEST_POWER_CONVERT_TO_DECIMAL 10
-   FILE *f;
-   char fullpath[PATH_MAX];
-   char buf[BUFSIZ];
-   uint32_t cur_freq;
-   int ret = -1;
-
-   if (snprintf(fullpath, sizeof(fullpath),
-   TEST_POWER_SYSFILE_CUR_FREQ, lcore_id) < 0) {
-   return 0;
-   }
-   f = fopen(fullpath, "r");
-   if (f == NULL) {
-   return 0;
-   }
-   if (fgets(buf, sizeof(buf), f) == NULL) {
-   goto fail_get_cur_freq;
-   }
-   cur_freq = strtoul(buf, NULL, TEST_POWER_CONVERT_TO_DECIMAL);
-   ret = (freqs[idx] == cur_freq ? 0 : -1);
-
-fail_get_cur_freq:
-   fclose(f);
-
-   return ret;
-}
-
-/* Check rte_power_freqs() */
-static int
-check_power_freqs(void)
-{
-   uint32_t ret;
-
-   total_freq_num = 0;
-   memset(freqs, 0, sizeof(freqs));
-
-   /* test with an invalid lcore id */
-   ret = rte_power_freqs(TEST_POWER_LCORE_INVALID, freqs,
-   TEST_POWER_FREQS_NUM_MAX);
-   if (ret > 0) {
-   printf("Unexpectedly get available freqs successfully on "
-   "lcore %u\n", TEST_POWER_LCORE_INVALID);
-   return -1;
-   }
-
-   /* test with NULL buffer to save available freqs */
-   ret = rte_power_freqs(TEST_POWER_LCORE_ID, NULL,
-   TEST_POWER_FREQS_NUM_MAX);
-   if (ret > 0) {
-   printf("Unexpectedly get available freqs successfully with "
-   "NULL buffer on lcore %u\n", TEST_POWER_LCORE_ID);
-   return -1;
-   }
-
-   /* test of getting zero number of freqs */
-   ret = rte_power_freqs(TEST_POWER_LCORE_ID, freqs, 0);
-   if (ret > 0) {
-   printf("Unexpectedly get available freqs successfully with "
-   "zero buffer size on lcore %u\n", TEST_POWER_LCORE_ID);
-   return -1;
-   }
-
-   /* test with all valid input parameters */
-   ret = rte_power_freqs(TEST_POWER_LCORE_ID, freqs,
-   TEST_POWER_FREQS_NUM_MAX);
-   if (ret == 0 || ret > TEST_POWER_FREQS_NUM_MAX) {
-   printf("Fail to get available fre

[dpdk-dev] [PATCH 1/5] vmxnet3: Fix VLAN Rx stripping

2014-10-12 Thread Yong Wang
Shouldn't reset vlan_tci to 0 if a valid VLAN tag is stripped.

Signed-off-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 263f9ce..986e5e5 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -540,21 +540,19 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

/* Check for hardware stripped VLAN tag */
if (rcd->ts) {
-
PMD_RX_LOG(ERR, "Received packet with vlan ID: 
%d.",
   rcd->tci);
rxm->ol_flags = PKT_RX_VLAN_PKT;
-
 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
VMXNET3_ASSERT(rxm &&
   rte_pktmbuf_mtod(rxm, void *));
 #endif
/* Copy vlan tag in packet buffer */
-   rxm->vlan_tci = rte_le_to_cpu_16(
-   (uint16_t)rcd->tci);
-
-   } else
+   rxm->vlan_tci = 
rte_le_to_cpu_16((uint16_t)rcd->tci);
+   } else {
rxm->ol_flags = 0;
+   rxm->vlan_tci = 0;
+   }

/* Initialize newly received packet buffer */
rxm->port = rxq->port_id;
@@ -563,11 +561,9 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
rxm->pkt_len = (uint16_t)rcd->len;
rxm->data_len = (uint16_t)rcd->len;
rxm->port = rxq->port_id;
-   rxm->vlan_tci = 0;
rxm->data_off = RTE_PKTMBUF_HEADROOM;

rx_pkts[nb_rx++] = rxm;
-
 rcd_done:
rxq->cmd_ring[ring_idx].next2comp = idx;

VMXNET3_INC_RING_IDX_ONLY(rxq->cmd_ring[ring_idx].next2comp, 
rxq->cmd_ring[ring_idx].size);
-- 
1.9.1



[dpdk-dev] [PATCH 2/5] vmxnet3: Add VLAN Tx offload

2014-10-12 Thread Yong Wang
Signed-off-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 986e5e5..0b6363f 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -319,6 +319,12 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts,
txd->cq = 1;
txd->eop = 1;

+   /* Add VLAN tag if requested */
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+   txd->ti = 1;
+   txd->tci = rte_cpu_to_le_16(txm->vlan_tci);
+   }
+
/* Record current mbuf for freeing it later in tx 
complete */
 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
VMXNET3_ASSERT(txm);
-- 
1.9.1



[dpdk-dev] [PATCH 0/5] vmxnet3 pmd fixes/improvement

2014-10-12 Thread Yong Wang
This patch series include various fixes and improvement to the
vmxnet3 pmd driver.

Yong Wang (5):
  vmxnet3: Fix VLAN Rx stripping
  vmxnet3: Add VLAN Tx offload
  vmxnet3: Fix dev stop/restart bug
  vmxnet3: Add rx pkt check offloads
  vmxnet3: Some perf improvement on the rx path

 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 310 +-
 1 file changed, 195 insertions(+), 115 deletions(-)

-- 
1.9.1



[dpdk-dev] [PATCH 3/5] vmxnet3: Fix dev stop/restart bug

2014-10-12 Thread Yong Wang
This change makes vmxnet3 consistent with other pmds in
terms of dev_stop behavior: rather than releasing tx/rx
rings, it only resets the ring structure and release the
pending mbufs.

Verified with various tests (test-pmd and pktgen) over
vmxnet3 that dev stop/restart works fine.

Signed-off-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 78 ---
 1 file changed, 73 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 0b6363f..2017d4b 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -157,7 +157,7 @@ vmxnet3_txq_dump(struct vmxnet3_tx_queue *txq)
 #endif

 static inline void
-vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring)
+vmxnet3_cmd_ring_release_mbufs(vmxnet3_cmd_ring_t *ring)
 {
while (ring->next2comp != ring->next2fill) {
/* No need to worry about tx desc ownership, device is quiesced 
by now. */
@@ -171,16 +171,23 @@ vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring)
}
vmxnet3_cmd_ring_adv_next2comp(ring);
}
+}
+
+static void
+vmxnet3_cmd_ring_release(vmxnet3_cmd_ring_t *ring)
+{
+   vmxnet3_cmd_ring_release_mbufs(ring);
rte_free(ring->buf_info);
ring->buf_info = NULL;
 }

+
 void
 vmxnet3_dev_tx_queue_release(void *txq)
 {
vmxnet3_tx_queue_t *tq = txq;

-   if (txq != NULL) {
+   if (tq != NULL) {
/* Release the cmd_ring */
vmxnet3_cmd_ring_release(&tq->cmd_ring);
}
@@ -192,13 +199,74 @@ vmxnet3_dev_rx_queue_release(void *rxq)
int i;
vmxnet3_rx_queue_t *rq = rxq;

-   if (rxq != NULL) {
+   if (rq != NULL) {
/* Release both the cmd_rings */
for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++)
vmxnet3_cmd_ring_release(&rq->cmd_ring[i]);
}
 }

+static void
+vmxnet3_dev_tx_queue_reset(void *txq)
+{
+   vmxnet3_tx_queue_t *tq = txq;
+   struct vmxnet3_cmd_ring *ring = &tq->cmd_ring;
+   struct vmxnet3_comp_ring *comp_ring = &tq->comp_ring;
+   int size;
+
+   if (tq != NULL) {
+   /* Release the cmd_ring mbufs */
+   vmxnet3_cmd_ring_release_mbufs(&tq->cmd_ring);
+   }
+
+   /* Tx vmxnet rings structure initialization*/
+   ring->next2fill = 0;
+   ring->next2comp = 0;
+   ring->gen = VMXNET3_INIT_GEN;
+   comp_ring->next2proc = 0;
+   comp_ring->gen = VMXNET3_INIT_GEN;
+
+   size = sizeof(struct Vmxnet3_TxDesc) * ring->size;
+   size += sizeof(struct Vmxnet3_TxCompDesc) * comp_ring->size;
+
+   memset(ring->base, 0, size);
+}
+
+static void
+vmxnet3_dev_rx_queue_reset(void *rxq)
+{
+   int i;
+   vmxnet3_rx_queue_t *rq = rxq;
+   struct vmxnet3_cmd_ring *ring0, *ring1;
+   struct vmxnet3_comp_ring *comp_ring;
+   int size;
+
+   if (rq != NULL) {
+   /* Release both the cmd_rings mbufs */
+   for (i = 0; i < VMXNET3_RX_CMDRING_SIZE; i++)
+   vmxnet3_cmd_ring_release_mbufs(&rq->cmd_ring[i]);
+   }
+
+   ring0 = &rq->cmd_ring[0];
+   ring1 = &rq->cmd_ring[1];
+   comp_ring = &rq->comp_ring;
+
+   /* Rx vmxnet rings structure initialization */
+   ring0->next2fill = 0;
+   ring1->next2fill = 0;
+   ring0->next2comp = 0;
+   ring1->next2comp = 0;
+   ring0->gen = VMXNET3_INIT_GEN;
+   ring1->gen = VMXNET3_INIT_GEN;
+   comp_ring->next2proc = 0;
+   comp_ring->gen = VMXNET3_INIT_GEN;
+
+   size = sizeof(struct Vmxnet3_RxDesc) * (ring0->size + ring1->size);
+   size += sizeof(struct Vmxnet3_RxCompDesc) * comp_ring->size;
+
+   memset(ring0->base, 0, size);
+}
+
 void
 vmxnet3_dev_clear_queues(struct rte_eth_dev *dev)
 {
@@ -211,7 +279,7 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev)

if (txq != NULL) {
txq->stopped = TRUE;
-   vmxnet3_dev_tx_queue_release(txq);
+   vmxnet3_dev_tx_queue_reset(txq);
}
}

@@ -220,7 +288,7 @@ vmxnet3_dev_clear_queues(struct rte_eth_dev *dev)

if (rxq != NULL) {
rxq->stopped = TRUE;
-   vmxnet3_dev_rx_queue_release(rxq);
+   vmxnet3_dev_rx_queue_reset(rxq);
}
}
 }
-- 
1.9.1



[dpdk-dev] [PATCH 5/5] vmxnet3: Some perf improvement on the rx path

2014-10-12 Thread Yong Wang
Signed-off-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 242 --
 1 file changed, 116 insertions(+), 126 deletions(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index e2fb8a8..4799f4d 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -451,6 +451,19 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)
uint32_t i = 0, val = 0;
struct vmxnet3_cmd_ring *ring = &rxq->cmd_ring[ring_id];

+   if (ring_id == 0) {
+   /* Usually: One HEAD type buf per packet
+* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ?
+* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD;
+*/
+
+   /* We use single packet buffer so all heads here */
+   val = VMXNET3_RXD_BTYPE_HEAD;
+   } else {
+   /* All BODY type buffers for 2nd ring */
+   val = VMXNET3_RXD_BTYPE_BODY;
+   }
+
while (vmxnet3_cmd_ring_desc_avail(ring) > 0) {
struct Vmxnet3_RxDesc *rxd;
struct rte_mbuf *mbuf;
@@ -458,22 +471,9 @@ vmxnet3_post_rx_bufs(vmxnet3_rx_queue_t *rxq, uint8_t 
ring_id)

rxd = (struct Vmxnet3_RxDesc *)(ring->base + ring->next2fill);

-   if (ring->rid == 0) {
-   /* Usually: One HEAD type buf per packet
-* val = (ring->next2fill % rxq->hw->bufs_per_pkt) ?
-* VMXNET3_RXD_BTYPE_BODY : VMXNET3_RXD_BTYPE_HEAD;
-*/
-
-   /* We use single packet buffer so all heads here */
-   val = VMXNET3_RXD_BTYPE_HEAD;
-   } else {
-   /* All BODY type buffers for 2nd ring; which won't be 
used at all by ESXi */
-   val = VMXNET3_RXD_BTYPE_BODY;
-   }
-
/* Allocate blank mbuf for the current Rx Descriptor */
mbuf = rte_rxmbuf_alloc(rxq->mp);
-   if (mbuf == NULL) {
+   if (unlikely(mbuf == NULL)) {
PMD_RX_LOG(ERR, "Error allocating mbuf in %s", 
__func__);
rxq->stats.rx_buf_alloc_failure++;
err = ENOMEM;
@@ -536,151 +536,141 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

rcd = &rxq->comp_ring.base[rxq->comp_ring.next2proc].rcd;

-   if (rxq->stopped) {
+   if (unlikely(rxq->stopped)) {
PMD_RX_LOG(DEBUG, "Rx queue is stopped.");
return 0;
}

while (rcd->gen == rxq->comp_ring.gen) {
-
if (nb_rx >= nb_pkts)
break;
+
idx = rcd->rxdIdx;
ring_idx = (uint8_t)((rcd->rqID == rxq->qid1) ? 0 : 1);
rxd = (Vmxnet3_RxDesc *)rxq->cmd_ring[ring_idx].base + idx;
rbi = rxq->cmd_ring[ring_idx].buf_info + idx;

-   if (rcd->sop != 1 || rcd->eop != 1) {
+   if (unlikely(rcd->sop != 1 || rcd->eop != 1)) {
rte_pktmbuf_free_seg(rbi->m);
-
PMD_RX_LOG(DEBUG, "Packet spread across multiple 
buffers\n)");
goto rcd_done;
+   }

-   } else {
-
-   PMD_RX_LOG(DEBUG, "rxd idx: %d ring idx: %d.", idx, 
ring_idx);
+   PMD_RX_LOG(DEBUG, "rxd idx: %d ring idx: %d.", idx, ring_idx);

 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
-   VMXNET3_ASSERT(rcd->len <= rxd->len);
-   VMXNET3_ASSERT(rbi->m);
+   VMXNET3_ASSERT(rcd->len <= rxd->len);
+   VMXNET3_ASSERT(rbi->m);
 #endif
-   if (rcd->len == 0) {
-   PMD_RX_LOG(DEBUG, "Rx buf was skipped. 
rxring[%d][%d]\n)",
-  ring_idx, idx);
+   if (unlikely(rcd->len == 0)) {
+   PMD_RX_LOG(DEBUG, "Rx buf was skipped. 
rxring[%d][%d]\n)",
+  ring_idx, idx);
 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
-   VMXNET3_ASSERT(rcd->sop && rcd->eop);
+   VMXNET3_ASSERT(rcd->sop && rcd->eop);
 #endif
-   rte_pktmbuf_free_seg(rbi->m);
-
-   goto rcd_done;
-   }
+   rte_pktmbuf_free_seg(rbi->m);
+   goto rcd_done;
+   }

-   /* Assuming a packet is coming in a single packet 
buffer */
-   if (rxd->btype != VMXNET3_RXD_BTYPE_HEAD) {
-   PMD_RX_LOG(DEBUG,
-  "Alert : Misbehaving device, 
incorrect "
-  " buffer type used. iPacket 
dropped."

[dpdk-dev] [PATCH 4/5] vmxnet3: Add rx pkt check offloads

2014-10-12 Thread Yong Wang
Only supports IPv4 so far.

Signed-off-by: Yong Wang 
---
 lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
index 2017d4b..e2fb8a8 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
+++ b/lib/librte_pmd_vmxnet3/vmxnet3_rxtx.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -614,7 +615,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)

/* Check for hardware stripped VLAN tag */
if (rcd->ts) {
-   PMD_RX_LOG(ERR, "Received packet with vlan ID: 
%d.",
+   PMD_RX_LOG(DEBUG, "Received packet with vlan 
ID: %d.",
   rcd->tci);
rxm->ol_flags = PKT_RX_VLAN_PKT;
 #ifdef RTE_LIBRTE_VMXNET3_DEBUG_DRIVER
@@ -637,6 +638,25 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts, uint16_t nb_pkts)
rxm->port = rxq->port_id;
rxm->data_off = RTE_PKTMBUF_HEADROOM;

+   /* Check packet types, rx checksum errors, etc. Only 
support IPv4 so far. */
+   if (rcd->v4) {
+   struct ether_hdr *eth = rte_pktmbuf_mtod(rxm, 
struct ether_hdr *);
+   struct ipv4_hdr *ip = (struct ipv4_hdr *)(eth + 
1);
+
+   if (((ip->version_ihl & 0xf) << 2) > 
(int)sizeof(struct ipv4_hdr))
+   rxm->ol_flags |= PKT_RX_IPV4_HDR_EXT;
+   else
+   rxm->ol_flags |= PKT_RX_IPV4_HDR;
+
+   if (!rcd->cnc) {
+   if (!rcd->ipc)
+   rxm->ol_flags |= 
PKT_RX_IP_CKSUM_BAD;
+
+   if ((rcd->tcp || rcd->udp) && !rcd->tuc)
+   rxm->ol_flags |= 
PKT_RX_L4_CKSUM_BAD;
+   }
+   }
+
rx_pkts[nb_rx++] = rxm;
 rcd_done:
rxq->cmd_ring[ring_idx].next2comp = idx;
-- 
1.9.1