date:20140618

[dpdk-dev] [PATCH v3 0/7] add mtu and flow control handlers

2014-06-18 Thread Thomas Monjalon

2014-06-17 22:30, Ananyev, Konstantin:
> > This patchset introduces 3 new ethdev operations: flow control parameters
> > retrieval and mtu get/set operations.
> > 
> > David Marchand (3):
> >   ethdev: add autoneg parameter in flow ctrl accessors
> >   ethdev: store min rx buffer size
> >   ethdev: introduce enable_scatter rx mode
> > 
> > Ivan Boule (2):
> >   ixgbe: add set_mtu to ixgbevf
> >   app/testpmd: allow to configure mtu
> > 
> > Samuel Gauthier (1):
> >   ethdev: add mtu accessors
> > 
> > Zijie Pan (1):
> >   ethdev: retrieve flow control configuration
> 
> Acked-by: Konstantin Ananyev 

Applied for version 1.7.0.

Thanks
-- 
Thomas

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto

Hi,

> Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi everyone:
> We are interested in testing the performance of the memnic driver 
> posted at http://dpdk.org/browse/memnic/refs/.
> We want to compare its performance compared to other techniques to transfer 
> packets between the guest and the kernel,
> predominately for VM to VM transfers.
> 
> We have downloaded the memnic components and have got it running in a guest 
> VM.
> 
> The question we hope this group might be able to help with is what would be 
> the best way to processes the packets in the
> kernel to get a VM to VM transfer.

I think there is no kernel code work with MEMNIC.
The recommend switching software on the host is Intel DPDK vSwitch hosted on 
01.org and github.
https://github.com/01org/dpdk-ovs/tree/development

Intel DPDK vSwitch runs on userspace not kernel.

I introduced this mechanism to DPDK vSwitch and the guest drivers are 
maintained in dpdk.org.

thanks,
Hiroshi

> 
> A couple options might be possible
> 
> 
> 1.   Common shared buffer between two VMs.  With some utility/code to 
> switch TX & RX rings between the two VMs.
> 
> VM1 application --- memnic  ---  common shared memory buffer on the host --- 
> memnic  ---  VM2 application
> 
> 2.   Special purpose Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- Kernel switching 
> module  --- shared memory VM2  --- memnic  ---
> VM2 application
> 
> 3.   Existing Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- existing Kernel 
> switching module (e.g. OVS/linux Bridge/VETh pair)
> --- shared memory VM2  --- memnic  ---  VM2 application
> 
> Can anyone recommend which approach might be best or easiest?   We would like 
> to avoid writing much (or any) kernel code
> so if there are already any open source code or test utilities that provide 
> one of these options or would be a good starting
> point to start from,  a pointer would be much appreciated.
> 
> Thanks in advance
> 
> 
> John Joyce

[dpdk-dev] [PATCH v3] cpu_layout.py: adjust output format to align

2014-06-18 Thread Shannon Zhao

Bug: when "core id" is greater than 9, the cpu_layout.py output doesn't align.

Socket 0Socket 1
-   -
Core 9  [4, 16] [10, 22]

Core 10 [5, 17] [11, 23]

Solution: adjust output format to align based on the maximum length of the 
"core id" and "processor"

Socket 0Socket 1

Core 9  [4, 16] [10, 22]

Core 10 [5, 17] [11, 23]

Signed-off-by: Shannon Zhao 
---
 tools/cpu_layout.py | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/cpu_layout.py b/tools/cpu_layout.py
index 623fad9..20a409d 100755
--- a/tools/cpu_layout.py
+++ b/tools/cpu_layout.py
@@ -75,15 +75,21 @@ print "cores = ",cores
 print "sockets = ", sockets
 print ""

+max_processor_len = len(str(len(cores) * len(sockets) * 2 - 1))
+max_core_map_len = max_processor_len * 2 + len('[, ]') + len('Socket ')
+max_core_id_len = len(str(max(cores)))
+
+print " ".ljust(max_core_id_len + len('Core ')),
 for s in sockets:
-   print "\tSocket %s" % s,
+print "Socket %s" % str(s).ljust(max_core_map_len - len('Socket ')),
 print ""
+print " ".ljust(max_core_id_len + len('Core ')),
 for s in sockets:
-   print "\t-",
+print "".ljust(max_core_map_len),
 print ""

 for c in cores:
-   print "Core %s" % c,
-   for s in sockets:
-   print "\t", core_map[(s,c)],
-   print "\n"
+print "Core %s" % str(c).ljust(max_core_id_len),
+for s in sockets:
+print str(core_map[(s,c)]).ljust(max_core_map_len),
+print "\n"
-- 
1.9.0.msysgit.0

[dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

2014-06-18 Thread GongJinrong

Hi, 

   I want find a zero copy framework from host to vm without any physical
NIC device, it seems l2fwd-ivshmem can be used, but I have problems to run
this example:
   1. there is no document about this example, even a simple readme :-(
   2. does this example need ovdk? 
   3. can I use standard qemu to run this example? Does the standard qemu
support ivshmem?

Best Regards
John Gong

[dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

2014-06-18 Thread Thomas Monjalon

Hi,

2014-06-18 15:56, GongJinrong:
>I want find a zero copy framework from host to vm without any physical
> NIC device,

I think memnic is what you want:
http://dpdk.org/doc/memnic-pmd

> it seems l2fwd-ivshmem can be used, but I have problems to run
> this example:
>1. there is no document about this example, even a simple readme :-(
>2. does this example need ovdk?

No

>3. can I use standard qemu to run this example? Does the standard qemu
> support ivshmem?

You should be able to use standard Qemu.

-- 
Thomas

[dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

2014-06-18 Thread GongJinrong

Ok, thanks Thomas, I will try memnic.

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, June 18, 2014 4:20 PM
To: GongJinrong
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

Hi,

2014-06-18 15:56, GongJinrong:
>I want find a zero copy framework from host to vm without any 
> physical NIC device,

I think memnic is what you want:
http://dpdk.org/doc/memnic-pmd

> it seems l2fwd-ivshmem can be used, but I have problems to run this 
> example:
>1. there is no document about this example, even a simple readme :-(
>2. does this example need ovdk?

No

>3. can I use standard qemu to run this example? Does the standard 
> qemu support ivshmem?

You should be able to use standard Qemu.

--
Thomas

[dpdk-dev] [PATCH v2 00/27] Add i40e PMD support

2014-06-18 Thread Zhang, Helin



-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, June 18, 2014 12:28 AM
To: Zhang, Helin; Chen, Jing D
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 00/27] Add i40e PMD support

> The 2nd version of series of patches are to add i40e PMD support.
> It contains the updated basic shared code, and some other enhancements.
> It adds the support of the latest version of firmware.
> * Add new PMD driver of i40e in the folder of librte_pmd_i40e
> * Add some neccessary definitions, changes in rte_mbuf.h and eth_dev
> * Add new configurations for i40e
> * Add or modifiy makefiles to support i40e compilation
> * Add neccessary changes in ixgbe, e1000 and vmxnet3 PMD, as hash flags
>   has been enlarged from 16 bits to 64 bits to support i40e
> * Add neccessary changes in example applications and testpmd to use
>   ETH_RSS_IP to replace all IP hash flags, as i40e introduced more
>   hash flags.
> * Add command in testpmd for port based vlan insertion offload testing
> * Add neccessary changes in eth_dev to support configuring maximum
>   packet length of less than 1518
> * Add two sys files in igb_uio to support enabling/disabling
>   'Extended Tag' and resetting 'Max Read Request Size', as it has
>   big impacts on i40e performance
> * Add neccessary changes in pci to read/write the above two sys files
>   during probing PCI
> 
> Features/enhancements to be implemented later:
> * Set link speed, and physically up/down
> * Double VLAN support, flow director, VMDq and DCB
> * VLAN insertion/stripping, RSS in VF
> 
> Signed-off-by: Helin Zhang 
> Signed-off-by: Jing Chen 
> Acked-by: Cunming Liang 
> Acked-by: Jijiang Liu 
> Acked-by: Jingjing Wu 
> Acked-by: Heqing Zhu 
> Tested-by: Waterman Cao 

Applied for version 1.7.0.

Some things could be cleaned up later, especially i40e specific flags in 
generic API must be removed. Please work on a patch for next release.

Thanks for the hard work
--
Thomas



Hi Thomas

Thank you greatly to merge those code!

Btw, what do you mean the i40e specific flags in generic API? Did you mean the 
new flags defined in rte_mbuf.h?
Yes, we are working on some bug fixes and enhancements which might need to be 
added in the next release. Hopefully we can see them soon.

Regards,
Helin

[dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

2014-06-18 Thread Gray, Mark D

> 
> Hi,
> 
> 2014-06-18 15:56, GongJinrong:
> >I want find a zero copy framework from host to vm without any
> > physical NIC device,
> 
> I think memnic is what you want:
>   http://dpdk.org/doc/memnic-pmd
> 
> > it seems l2fwd-ivshmem can be used, but I have problems to run this
> > example:
> >1. there is no document about this example, even a simple readme :-(
> >2. does this example need ovdk?
> 
> No
> 
> >3. can I use standard qemu to run this example? Does the standard
> > qemu support ivshmem?
> 
> You should be able to use standard Qemu.

Standard QEMU will work for Memnic but not when using DPDK ivshmem. It uses
the standard QEMU ivshem and doesn't use DPDK in the way you would like it to

You should look at  the DPDK vSwitch code to see how DPDK ivshmem is used. 
Basically, in the host you need to identify what objects that you want to share 
with the 
virtual machine

e.g. rings, memzones

>From this, you can generate a command line to pass to QEMU (with a modified 
>ivshmem.c file
- we haven't tried to upstream this yet)

Then when you start a DPDK application in the guest, each of the objects that
you shared from the host are also available in the guest.

I presume the l2fwd-ivshmem does the same

> 
> --
> Thomas
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Burakov, Anatoly

Hi Thomas,

> Subject: Re: [dpdk-dev] [PATCH] vfio: open VFIO container at startup rather
> than during init
> 
> > Signed-off-by: Anatoly Burakov 
> 
> Please Anatoly, could you provide a text explaining what was broken and
> why you fixed it this way?

What was broken was if, for some reason, VFIO is loaded but the user can't 
initialize it (an example would be wrong permissions, or unsupported IOMMU 
type, which is what Bruce seems to be having... which shouldn't happen as far 
as I know, but there's nothing I can do on DPDK's side to fix this as this is 
the kernel reporting wrong kind of IOMMU type), DPDK would fail to load. The 
fix makes DPDK simply not try VFIO support at all if the container cannot be 
opened for some reason.

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

2014-06-18 Thread GongJinrong

Thanks, Mark, the comment is really help.

-Original Message-
From: Gray, Mark D [mailto:mark.d.g...@intel.com] 
Sent: Wednesday, June 18, 2014 4:53 PM
To: GongJinrong; 'Thomas Monjalon'
Cc: dev at dpdk.org
Subject: RE: [dpdk-dev] Can anyone help me to run the l2fwd-ivshmem example?

> 
> Hi,
> 
> 2014-06-18 15:56, GongJinrong:
> >I want find a zero copy framework from host to vm without any 
> > physical NIC device,
> 
> I think memnic is what you want:
>   http://dpdk.org/doc/memnic-pmd
> 
> > it seems l2fwd-ivshmem can be used, but I have problems to run this
> > example:
> >1. there is no document about this example, even a simple readme :-(
> >2. does this example need ovdk?
> 
> No
> 
> >3. can I use standard qemu to run this example? Does the standard 
> > qemu support ivshmem?
> 
> You should be able to use standard Qemu.

Standard QEMU will work for Memnic but not when using DPDK ivshmem. It uses
the standard QEMU ivshem and doesn't use DPDK in the way you would like it
to

You should look at  the DPDK vSwitch code to see how DPDK ivshmem is used. 
Basically, in the host you need to identify what objects that you want to
share with the virtual machine

e.g. rings, memzones

>From this, you can generate a command line to pass to QEMU (with a modified
ivshmem.c file
- we haven't tried to upstream this yet)

Then when you start a DPDK application in the guest, each of the objects
that you shared from the host are also available in the guest.

I presume the l2fwd-ivshmem does the same

> 
> --
> Thomas
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263 Business address: Dromore House, East Park,
Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the
sole use of the intended recipient(s). Any review or distribution by others
is strictly prohibited. If you are not the intended recipient, please
contact the sender and delete all copies.

[dpdk-dev] [PATCH v2 00/27] Add i40e PMD support

2014-06-18 Thread Thomas Monjalon

2014-06-18 08:51, Zhang, Helin:
> Thomas Monjalon:
> > Some things could be cleaned up later, especially i40e specific flags in
> > generic API must be removed. Please work on a patch for next release.
> 
> Btw, what do you mean the i40e specific flags in generic API? Did you mean
> the new flags defined in rte_mbuf.h? Yes, we are working on some bug fixes
> and enhancements which might need to be added in the next release.

I mean that RSS flags are really Intel-specific. In case other NIC vendors 
would use this RSS API, it can be a problem to address. But we can wait for 
this one.
About mbuf flags, they are all zeros and useless. So a patch is needed to get 
some space and define new values (as you already explained).
And last but not the least, RTE_LIBRTE_I40E_16BYTE_RX_DESC handling is very 
strange as it can be seen in this commit:
http://dpdk.org/browse/dpdk/commit/?id=ac2ece3fb1f5511

-- 
Thomas

[dpdk-dev] [PATCH 1/9] eal: map shared config into exact same address as primary process

2014-06-18 Thread Burakov, Anatoly

Hi Konstantin,

> I think we introduce a race window here.
> If secondary process would do first mmap() before rte_config.mem_config-
> >mem_cfg_addr was properly set by primary process, then it will try to do
> second mmap() with wrong address.
> I think we need to do second mmap() straight after
> rte_eal_mcfg_wait_complete(), or even just inside it.

Acknowledged, will respin.

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Dumitrescu, Cristian

Hi Anatoly,

I would suggest we add a log message explaining which mechanism is loaded 
(igb_uio/vfio) and why (e.g. tried vfio first but container could not be 
opened, so falling back to igb_uio, etc).

Regards,
Cristian

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Burakov, Anatoly
Sent: Wednesday, June 18, 2014 9:57 AM
To: Thomas Monjalon
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] vfio: open VFIO container at startup rather 
than during init

Hi Thomas,

> Subject: Re: [dpdk-dev] [PATCH] vfio: open VFIO container at startup rather
> than during init
> 
> > Signed-off-by: Anatoly Burakov 
> 
> Please Anatoly, could you provide a text explaining what was broken and
> why you fixed it this way?

What was broken was if, for some reason, VFIO is loaded but the user can't 
initialize it (an example would be wrong permissions, or unsupported IOMMU 
type, which is what Bruce seems to be having... which shouldn't happen as far 
as I know, but there's nothing I can do on DPDK's side to fix this as this is 
the kernel reporting wrong kind of IOMMU type), DPDK would fail to load. The 
fix makes DPDK simply not try VFIO support at all if the container cannot be 
opened for some reason.

Best regards,
Anatoly Burakov
DPDK SW Engineer



--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread GongJinrong

Hi, Hiroshi

   I just start to learn DPDK and memnic, in memnic guide, you said "On
host, the shared memory must be initialized by an application using memnic",
I am not so clear that how to initialize the share memory in host, do you
means use posix API or DPDK API to create the share memory?(it seems memnic
guest side use rte_mbuf to transfer data), do you have any sample code to
demo how to use memnic in host?

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Hiroshi Shimamoto
Sent: Wednesday, June 18, 2014 12:02 PM
To: John Joyce (joycej); dev at dpdk.org
Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer

Hi,

> Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi everyone:
> We are interested in testing the performance of the memnic driver
posted at http://dpdk.org/browse/memnic/refs/.
> We want to compare its performance compared to other techniques to 
> transfer packets between the guest and the kernel, predominately for VM to
VM transfers.
> 
> We have downloaded the memnic components and have got it running in a
guest VM.
> 
> The question we hope this group might be able to help with is what 
> would be the best way to processes the packets in the kernel to get a VM
to VM transfer.

I think there is no kernel code work with MEMNIC.
The recommend switching software on the host is Intel DPDK vSwitch hosted on
01.org and github.
https://github.com/01org/dpdk-ovs/tree/development

Intel DPDK vSwitch runs on userspace not kernel.

I introduced this mechanism to DPDK vSwitch and the guest drivers are
maintained in dpdk.org.

thanks,
Hiroshi

> 
> A couple options might be possible
> 
> 
> 1.   Common shared buffer between two VMs.  With some utility/code to
switch TX & RX rings between the two VMs.
> 
> VM1 application --- memnic  ---  common shared memory buffer on the 
> host --- memnic  ---  VM2 application
> 
> 2.   Special purpose Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- Kernel 
> switching module  --- shared memory VM2  --- memnic  ---
> VM2 application
> 
> 3.   Existing Kernel switching module
> 
> VM1 application --- memnic  ---  shared memory VM1  --- existing 
> Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> --- shared memory VM2  --- memnic  ---  VM2 application
> 
> Can anyone recommend which approach might be best or easiest?   We would
like to avoid writing much (or any) kernel code
> so if there are already any open source code or test utilities that 
> provide one of these options or would be a good starting point to start
from,  a pointer would be much appreciated.
> 
> Thanks in advance
> 
> 
> John Joyce

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Burakov, Anatoly

Hi Cristian,

> I would suggest we add a log message explaining which mechanism is loaded
> (igb_uio/vfio) and why (e.g. tried vfio first but container could not be
> opened, so falling back to igb_uio, etc).

This already happens.

If the container could not be loaded for whatever reason, the log message is 
displayed (it did before, but the previous code didn't account for situations 
such as Bruce was having, hence the patch). If VFIO is loaded and enabled, all 
drivers will then report either being bound or skipped (e.g. "not bound to 
VFIO, skipping").

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [memnic PATCH v2 5/5] linux: support MTU change

2014-06-18 Thread Hiroshi Shimamoto

From: Hiroshi Shimamoto 

Add the capability to change MTU.

On MTU change, remember the corresponding frame size and request new
frame size to the host on reset, if the host MEMNIC has that feature.

Don't trust framesz of header in general usage, because host might change
the value unexpectedly.

v2: forgot to update netdev->mtu on change, fix it.

Signed-off-by: Hiroshi Shimamoto 
Reviewed-by: Hayato Momma 
---
 linux/memnic_net.c | 41 ++---
 linux/memnic_net.h |  5 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/linux/memnic_net.c b/linux/memnic_net.c
index 02b5acc..f92cbd1 100644
--- a/linux/memnic_net.c
+++ b/linux/memnic_net.c
@@ -31,6 +31,7 @@

 #include 
 #include 
+#include 

 #include "memnic_net.h"
 #include "memnic.h"
@@ -152,19 +153,31 @@ static int memnic_open(struct net_device *netdev)
 {
struct memnic_net *memnic = netdev_priv(netdev);
struct memnic_area *nic = memnic->dev->base_addr;
+   struct memnic_header *hdr = &nic->hdr;
struct task_struct *kthread;

/* clear stats */
memset(&memnic->stats, 0, sizeof(memnic->stats));
/* invalidate and reset here */
-   nic->hdr.valid = 0;
+   hdr->valid = 0;
+
+   /* setup parameters */
+   if (memnic->request.features & MEMNIC_FEAT_FRAME_SIZE)
+   hdr->framesz = memnic->request.framesz;
+   hdr->request = memnic->request.features;
+
smp_wmb();
-   nic->hdr.reset = 1;
+   hdr->reset = 1;
+
+   while (ACCESS_ONCE(hdr->reset))
+   schedule_timeout_interruptible(HZ/100);
+
/* clear index */
memnic->up = 0;
memnic->down = 0;
memnic->framesz = MEMNIC_MAX_FRAME_LEN;
-   /* will become valid after reset handling in vswitch */
+   if (memnic->request.features & MEMNIC_FEAT_FRAME_SIZE)
+   memnic->framesz = hdr->framesz;

/* already run */
if (memnic->kthread)
@@ -260,6 +273,26 @@ static int memnic_set_mac(struct net_device *netdev, void 
*p)

 static int memnic_change_mtu(struct net_device *netdev, int new_mtu)
 {
+   struct memnic_net *memnic = netdev_priv(netdev);
+   struct memnic_area *nic = memnic->dev->base_addr;
+   struct memnic_header *hdr = &nic->hdr;
+   uint32_t framesz = new_mtu + ETH_HLEN + VLAN_HLEN;
+
+   if (!(hdr->features & MEMNIC_FEAT_FRAME_SIZE))
+   return -ENOSYS;
+
+   /* new_mtu less than 68 might cause problem */
+   if (new_mtu < 68 || framesz > MEMNIC_MAX_JUMBO_FRAME_LEN)
+   return -EINVAL;
+
+   printk(KERN_INFO "MEMNIC: Changing MTU from %u to %u\n",
+   netdev->mtu, new_mtu);
+
+   memnic->request.features |= MEMNIC_FEAT_FRAME_SIZE;
+   memnic->request.framesz = framesz;
+
+   netdev->mtu = new_mtu;
+
return 0;
 }

@@ -298,6 +331,8 @@ struct memnic_net *memnic_net_create(struct memnic_dev *dev)

memnic->netdev = netdev;
memnic->dev = dev;
+   memnic->framesz = MEMNIC_MAX_FRAME_LEN;
+   memnic->request.features = 0;

netdev->netdev_ops = &memnic_netdev_ops;

diff --git a/linux/memnic_net.h b/linux/memnic_net.h
index 10c8eed..b6c57ab 100644
--- a/linux/memnic_net.h
+++ b/linux/memnic_net.h
@@ -44,6 +44,11 @@ struct memnic_net {
struct net_device_stats stats;
int up, down;
uint32_t framesz;
+   /* request to host */
+   struct {
+   uint32_t features;
+   uint32_t framesz;
+   } request;
 };

 struct memnic_net *memnic_net_create(struct memnic_dev *dev);
-- 
1.8.4

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Neil Horman

On Wed, Jun 18, 2014 at 10:26:08AM +, Burakov, Anatoly wrote:
> Hi Cristian,
> 
> > I would suggest we add a log message explaining which mechanism is loaded
> > (igb_uio/vfio) and why (e.g. tried vfio first but container could not be
> > opened, so falling back to igb_uio, etc).
> 
> This already happens.
> 
> If the container could not be loaded for whatever reason, the log message is 
> displayed (it did before, but the previous code didn't account for situations 
> such as Bruce was having, hence the patch). If VFIO is loaded and enabled, 
> all drivers will then report either being bound or skipped (e.g. "not bound 
> to VFIO, skipping").

I think what Thomas wants is for you to resend the patch with a proper changelog
entry added to it, so the commit has an explination of what was changed and, for
posterity.
Neil

> 
> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
> 
> 
> 
>

[dpdk-dev] vfio detection

2014-06-18 Thread Burakov, Anatoly

Hi Bruce,

> > > I have a number of NIC ports which were working correctly yesterday
> > > and are bound correctly to the igb_uio driver - and I want to keep
> > > using them through the igb_uio driver for now, not vfio. However,
> > > whenever I run a dpdk application today, I find that the vfio kernel
> > > module is getting loaded each time - even after I manually remove
> > > it, and verify that it has been removed by checking lsmod. Is this
> > > expected? If so, why are we loading the vfio driver when I just want to
> continue using igb_uio which works fine?
> >
> > Can you elaborate a bit on what do you mean by "loading vfio driver"?
> > Do you mean the vfio-pci kernel gets loaded by DPDK? I certainly
> > didn't put in any code that would automatically load that driver, and
> certainly not binding devices to it.
> 
> The kernel module called just "vfio" is constantly getting reloaded, and there
> is always a "/dev/vfio" directory, which triggers the vfio code handling every
> time I run dpdk.

I can't reproduce this.

Please note that VFIO actually consists of three drivers (on an x86 system, 
that is) - vfio (the core VFIO infrastructure such as containers), 
vfio_iommu_type1 (support for x86-style IOMMU) and vfio-pci (the generic PCI 
driver). I have unloaded all three and ran dpdk_nic_bind and testpmd - it 
worked fine and no VFIO kernel drivers were loaded as a result.

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Burakov, Anatoly

Hi Neil

> I think what Thomas wants is for you to resend the patch with a proper
> changelog entry added to it, so the commit has an explination of what was
> changed and, for posterity.

Got it. Can I also incorporate your changes to error codes as well?

Best regards,
Anatoly Burakov
DPDK SW Engineer

[dpdk-dev] [PATCH] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Neil Horman

On Wed, Jun 18, 2014 at 11:02:23AM +, Burakov, Anatoly wrote:
> Hi Neil
> 
> > I think what Thomas wants is for you to resend the patch with a proper
> > changelog entry added to it, so the commit has an explination of what was
> > changed and, for posterity.
> 
> Got it. Can I also incorporate your changes to error codes as well?
> 
If you want to take Bruce's suggestions and incoproate them as well from my
thread below, sure, as long as Thomas is ok with it.
Neil

> Best regards,
> Anatoly Burakov
> DPDK SW Engineer
>

[dpdk-dev] ##freemail## RE: Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto

Hi,

> Subject: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi, Hiroshi
> 
>I just start to learn DPDK and memnic, in memnic guide, you said "On
> host, the shared memory must be initialized by an application using memnic",
> I am not so clear that how to initialize the share memory in host, do you
> means use posix API or DPDK API to create the share memory?(it seems memnic
> guest side use rte_mbuf to transfer data), do you have any sample code to
> demo how to use memnic in host?

I don't have simple MEMNIC sample to use it on host.
Could you please try DPDK vSwitch and enables MEMNIC vport?
DPDK vSwitch must handle packets between physical NIC port and MEMNIC vport
exposed to guest with dpdk.org memnic driver.

thanks,
Hiroshi

> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> Sent: Wednesday, June 18, 2014 12:02 PM
> To: John Joyce (joycej); dev at dpdk.org
> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi,
> 
> > Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> >
> > Hi everyone:
> > We are interested in testing the performance of the memnic driver
> posted at http://dpdk.org/browse/memnic/refs/.
> > We want to compare its performance compared to other techniques to
> > transfer packets between the guest and the kernel, predominately for VM to
> VM transfers.
> >
> > We have downloaded the memnic components and have got it running in a
> guest VM.
> >
> > The question we hope this group might be able to help with is what
> > would be the best way to processes the packets in the kernel to get a VM
> to VM transfer.
> 
> I think there is no kernel code work with MEMNIC.
> The recommend switching software on the host is Intel DPDK vSwitch hosted on
> 01.org and github.
> https://github.com/01org/dpdk-ovs/tree/development
> 
> Intel DPDK vSwitch runs on userspace not kernel.
> 
> I introduced this mechanism to DPDK vSwitch and the guest drivers are
> maintained in dpdk.org.
> 
> thanks,
> Hiroshi
> 
> >
> > A couple options might be possible
> >
> >
> > 1.   Common shared buffer between two VMs.  With some utility/code to
> switch TX & RX rings between the two VMs.
> >
> > VM1 application --- memnic  ---  common shared memory buffer on the
> > host --- memnic  ---  VM2 application
> >
> > 2.   Special purpose Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- Kernel
> > switching module  --- shared memory VM2  --- memnic  ---
> > VM2 application
> >
> > 3.   Existing Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- existing
> > Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> > --- shared memory VM2  --- memnic  ---  VM2 application
> >
> > Can anyone recommend which approach might be best or easiest?   We would
> like to avoid writing much (or any) kernel code
> > so if there are already any open source code or test utilities that
> > provide one of these options or would be a good starting point to start
> from,  a pointer would be much appreciated.
> >
> > Thanks in advance
> >
> >
> > John Joyce

[dpdk-dev] ##freemail## RE: Testing memnic for VM to VM transfer

2014-06-18 Thread GongJinrong

Hi, Hiroshi

   Do you mean I must use DPDK vSwitch in host when I use MEMNIC PMD in
guest VM? actually, I just want a channel which can put the data from host
to guest quickly. Do you have any idea that how to write a host application
to put the data to guest memnic PMD?

-Original Message-
From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com] 
Sent: Wednesday, June 18, 2014 7:11 PM
To: GongJinrong; 'John Joyce (joycej)'; dev at dpdk.org
Subject: RE: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM
transfer

Hi,

> Subject: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM 
> transfer
> 
> Hi, Hiroshi
> 
>I just start to learn DPDK and memnic, in memnic guide, you said 
> "On host, the shared memory must be initialized by an application 
> using memnic", I am not so clear that how to initialize the share 
> memory in host, do you means use posix API or DPDK API to create the 
> share memory?(it seems memnic guest side use rte_mbuf to transfer 
> data), do you have any sample code to demo how to use memnic in host?

I don't have simple MEMNIC sample to use it on host.
Could you please try DPDK vSwitch and enables MEMNIC vport?
DPDK vSwitch must handle packets between physical NIC port and MEMNIC vport
exposed to guest with dpdk.org memnic driver.

thanks,
Hiroshi

> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> Sent: Wednesday, June 18, 2014 12:02 PM
> To: John Joyce (joycej); dev at dpdk.org
> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> Hi,
> 
> > Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> >
> > Hi everyone:
> > We are interested in testing the performance of the memnic 
> > driver
> posted at http://dpdk.org/browse/memnic/refs/.
> > We want to compare its performance compared to other techniques to 
> > transfer packets between the guest and the kernel, predominately for 
> > VM to
> VM transfers.
> >
> > We have downloaded the memnic components and have got it running in 
> > a
> guest VM.
> >
> > The question we hope this group might be able to help with is what 
> > would be the best way to processes the packets in the kernel to get 
> > a VM
> to VM transfer.
> 
> I think there is no kernel code work with MEMNIC.
> The recommend switching software on the host is Intel DPDK vSwitch 
> hosted on 01.org and github.
> https://github.com/01org/dpdk-ovs/tree/development
> 
> Intel DPDK vSwitch runs on userspace not kernel.
> 
> I introduced this mechanism to DPDK vSwitch and the guest drivers are 
> maintained in dpdk.org.
> 
> thanks,
> Hiroshi
> 
> >
> > A couple options might be possible
> >
> >
> > 1.   Common shared buffer between two VMs.  With some utility/code
to
> switch TX & RX rings between the two VMs.
> >
> > VM1 application --- memnic  ---  common shared memory buffer on the 
> > host --- memnic  ---  VM2 application
> >
> > 2.   Special purpose Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- Kernel 
> > switching module  --- shared memory VM2  --- memnic  ---
> > VM2 application
> >
> > 3.   Existing Kernel switching module
> >
> > VM1 application --- memnic  ---  shared memory VM1  --- existing 
> > Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> > --- shared memory VM2  --- memnic  ---  VM2 application
> >
> > Can anyone recommend which approach might be best or easiest?   We would
> like to avoid writing much (or any) kernel code
> > so if there are already any open source code or test utilities that 
> > provide one of these options or would be a good starting point to 
> > start
> from,  a pointer would be much appreciated.
> >
> > Thanks in advance
> >
> >
> > John Joyce

[dpdk-dev] [PATCH v3 0/9] Make DPDK tailqs fully local

2014-06-18 Thread Anatoly Burakov

This issue was reported by OVS-DPDK project, and the fix should go to
upstream DPDK. This is not memnic-related - this is to do with
DPDK's rte_ivshmem library.

Every DPDK data structure has a corresponding TAILQ reserved for it in
the runtime config file. Those TAILQs are fully local to the process,
however most data structures contain pointers to next entry in the
TAILQ.

Since the data structures such as rings are shared in their entirety,
those TAILQ pointers are shared as well. Meaning that, after a
successful rte_ring creation, the tailq_next pointer of the last
ring in the TAILQ will be updated with a pointer to a ring which may
not be present in the address space of another process (i.e. a ring
that may be host-local or guest-local, and not shared over IVSHMEM).
Any successive ring create/lookup on the other side of IVSHMEM will
result in trying to dereference an invalid pointer.

This patchset fixes this problem by creating a default tailq entry
that may be used by any data structure that chooses to use TAILQs.
This default TAILQ entry will consist of a tailq_next/tailq_prev
pointers, and an opaque pointer to arbitrary data. All TAILQ
pointers from data structures themselves will be removed and
replaced by those generic TAILQ entries, thus fixing the problem
of potentially exposing local address space to shared structures.

Technically, only rte_ring structure require modification, because
IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
but for consistency's sake other TAILQ-based data structures were
adapted as well.

v2 changes:
* fixed race conditions in *_free operations
* fixed multiprocess support for malloc heaps
* added similar changes for acl
* rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d

v3 changes:
* fixed race reported by Konstantin Ananyev (introduced in v2)

Anatoly Burakov (9):
  eal: map shared config into exact same address as primary process
  rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
  rte_ring: make ring tailq fully local
  rte_hash: make rte_hash tailq fully local
  rte_fbk_hash: make rte_fbk_hash tailq fully local
  rte_mempool: make mempool tailq fully local
  rte_lpm: make lpm tailq fully local
  rte_lpm6: make lpm6 tailq fully local
  rte_acl: make acl tailq fully local

 app/test/test_tailq.c | 33 +-
 lib/librte_acl/acl.h  |  1 -
 lib/librte_acl/rte_acl.c  | 74 ++-
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
 lib/librte_eal/common/include/rte_tailq.h |  9 +--
 lib/librte_eal/linuxapp/eal/eal.c | 44 --
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
 lib/librte_hash/rte_fbk_hash.c| 73 +-
 lib/librte_hash/rte_fbk_hash.h|  3 -
 lib/librte_hash/rte_hash.c| 61 ---
 lib/librte_hash/rte_hash.h|  2 -
 lib/librte_lpm/rte_lpm.c  | 65 
 lib/librte_lpm/rte_lpm.h  |  2 -
 lib/librte_lpm/rte_lpm6.c | 62 +++
 lib/librte_mempool/Makefile   |  3 +-
 lib/librte_mempool/rte_mempool.c  | 37 +---
 lib/librte_mempool/rte_mempool.h  |  2 -
 lib/librte_ring/Makefile  |  4 +-
 lib/librte_ring/rte_ring.c| 33 +++---
 lib/librte_ring/rte_ring.h|  2 -
 21 files changed, 415 insertions(+), 119 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 4/9] rte_hash: make rte_hash tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_hash.c | 61 +++---
 lib/librte_hash/rte_hash.h |  2 --
 2 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/lib/librte_hash/rte_hash.c b/lib/librte_hash/rte_hash.c
index d4221a8..eea5c01 100644
--- a/lib/librte_hash/rte_hash.c
+++ b/lib/librte_hash/rte_hash.c
@@ -60,7 +60,7 @@
 #include "rte_hash.h"


-TAILQ_HEAD(rte_hash_list, rte_hash);
+TAILQ_HEAD(rte_hash_list, rte_tailq_entry);

 /* Macro to enable/disable run-time checking of function parameters */
 #if defined(RTE_LIBRTE_HASH_DEBUG)
@@ -141,24 +141,29 @@ find_first(uint32_t sig, const uint32_t *sig_bucket, 
uint32_t num_sigs)
 struct rte_hash *
 rte_hash_find_existing(const char *name)
 {
-   struct rte_hash *h;
+   struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_hash_list *hash_list;

/* check that we have an initialised tail queue */
-   if ((hash_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, 
rte_hash_list)) == NULL) {
+   if ((hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) 
== NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -166,6 +171,7 @@ struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
struct rte_hash *h = NULL;
+   struct rte_tailq_entry *te;
uint32_t num_buckets, sig_bucket_size, key_size,
hash_tbl_size, sig_tbl_size, key_tbl_size, mem_size;
char hash_name[RTE_HASH_NAMESIZE];
@@ -212,17 +218,25 @@ rte_hash_create(const struct rte_hash_parameters *params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(h, hash_list, next) {
+   TAILQ_FOREACH(te, hash_list, next) {
+   h = (struct rte_hash *) te->data;
if (strncmp(params->name, h->name, RTE_HASH_NAMESIZE) == 0)
break;
}
-   if (h != NULL)
+   if (te != NULL)
+   goto exit;
+
+   te = rte_zmalloc("HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "tailq entry allocation failed\n");
goto exit;
+   }

h = (struct rte_hash *)rte_zmalloc_socket(hash_name, mem_size,
   CACHE_LINE_SIZE, params->socket_id);
if (h == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -242,7 +256,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hash_func = (params->hash_func == NULL) ?
DEFAULT_HASH_FUNC : params->hash_func;

-   TAILQ_INSERT_TAIL(hash_list, h, next);
+   te->data = (void *) h;
+
+   TAILQ_INSERT_TAIL(hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -253,11 +269,38 @@ exit:
 void
 rte_hash_free(struct rte_hash *h)
 {
+   struct rte_tailq_entry *te;
+   struct rte_hash_list *hash_list;
+
if (h == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_HASH, rte_hash_list, h);
+   /* check that we have an initialised tail queue */
+   if ((hash_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_HASH, rte_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find out tailq entry */
+   TAILQ_FOREACH(te, hash_list, next) {
+   if (te->data == (void *) h)
+   break;
+   }
+
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(hash_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(h);
+   rte_free(te);
 }

 static inline int32_t
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 5228e3a..2ecaf1a 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -86,8 +86,6 @@ struct rte_hash_parameters {

 /** A hash table structure. */
 struct rte_hash {
-   TAILQ_ENTRY(rte_hash) next;/**< Next in list. */
-
char name[RTE_HASH_NAMESIZE];   /**< Name of the hash. */
uint32_t entries;   /**< Total table entries. */
uint32_t buc

[dpdk-dev] [PATCH v3 9/9] rte_acl: make acl tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_acl/acl.h |  1 -
 lib/librte_acl/rte_acl.c | 74 +++-
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/lib/librte_acl/acl.h b/lib/librte_acl/acl.h
index e6d7985..b9d63fd 100644
--- a/lib/librte_acl/acl.h
+++ b/lib/librte_acl/acl.h
@@ -149,7 +149,6 @@ struct rte_acl_bld_trie {
 };

 struct rte_acl_ctx {
-   TAILQ_ENTRY(rte_acl_ctx) next;/**< Next in list. */
charname[RTE_ACL_NAMESIZE];
/** Name of the ACL context. */
int32_t socket_id;
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 129a41f..3b47ab6 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -36,13 +36,14 @@

 #defineBIT_SIZEOF(x)   (sizeof(x) * CHAR_BIT)

-TAILQ_HEAD(rte_acl_list, rte_acl_ctx);
+TAILQ_HEAD(rte_acl_list, rte_tailq_entry);

 struct rte_acl_ctx *
 rte_acl_find_existing(const char *name)
 {
-   struct rte_acl_ctx *ctx;
+   struct rte_acl_ctx *ctx = NULL;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;

/* check that we have an initialised tail queue */
acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
@@ -52,27 +53,55 @@ rte_acl_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(name, ctx->name, sizeof(ctx->name)) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (ctx == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return ctx;
 }

 void
 rte_acl_free(struct rte_acl_ctx *ctx)
 {
+   struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
+
if (ctx == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_ACL, rte_acl_list, ctx);
+   /* check that we have an initialised tail queue */
+   acl_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_ACL, rte_acl_list);
+   if (acl_list == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, acl_list, next) {
+   if (te->data == (void *) ctx)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(acl_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

rte_free(ctx->mem);
rte_free(ctx);
+   rte_free(te);
 }

 struct rte_acl_ctx *
@@ -81,6 +110,7 @@ rte_acl_create(const struct rte_acl_param *param)
size_t sz;
struct rte_acl_ctx *ctx;
struct rte_acl_list *acl_list;
+   struct rte_tailq_entry *te;
char name[sizeof(ctx->name)];

/* check that we have an initialised tail queue */
@@ -105,15 +135,31 @@ rte_acl_create(const struct rte_acl_param *param)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* if we already have one with that name */
-   TAILQ_FOREACH(ctx, acl_list, next) {
+   TAILQ_FOREACH(te, acl_list, next) {
+   ctx = (struct rte_acl_ctx*) te->data;
if (strncmp(param->name, ctx->name, sizeof(ctx->name)) == 0)
break;
}

/* if ACL with such name doesn't exist, then create a new one. */
-   if (ctx == NULL && (ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE,
-   param->socket_id)) != NULL) {
+   if (te == NULL) {
+   ctx = NULL;
+   te = rte_zmalloc("ACL_TAILQ_ENTRY", sizeof(*te), 0);
+
+   if (te == NULL) {
+   RTE_LOG(ERR, ACL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
+   ctx = rte_zmalloc_socket(name, sz, CACHE_LINE_SIZE, 
param->socket_id);

+   if (ctx == NULL) {
+   RTE_LOG(ERR, ACL,
+   "allocation of %zu bytes on socket %d for %s 
failed\n",
+   sz, param->socket_id, name);
+   rte_free(te);
+   goto exit;
+   }
/* init new allocated context. */
ctx->rules = ctx + 1;
ctx->max_rules = param->max_rule_num;
@@ -121,14 +167,12 @@ rte_acl_create(const struct rte_acl_param *param)
ctx->socket_id = param->socket_id;
rte_snprintf(ctx->name, sizeof(ctx->name), "%s", param->name);

-   TAILQ_INSERT_TAIL(acl_list, ctx, next);
+   te->data = (void *) ctx;

-   } else if (ctx == NU

[dpdk-dev] [PATCH v3 1/9] eal: map shared config into exact same address as primary process

2014-06-18 Thread Anatoly Burakov

Shared config is shared across primary and secondary processes.
However,when using rte_malloc, the malloc elements keep references to
the heap inside themselves. This heap reference might not be referencing
a local heap because the heap reference points to the heap of whatever
process has allocated that malloc element. Therefore, there can be
situations when malloc elements in a given heap actually reference
different addresses for the same heap - depending on which process has
allocated the element. This can lead to segmentation faults when dealing
with malloc elements allocated on the same heap by different processes.

To fix this problem, heaps will now have the same addresses across
processes. In order to achieve that, a new field in a shared mem_config
(a structure that holds the heaps, and which is shared across processes)
was added to keep the address of where this config is mapped in the
primary process.

Secondary process will now map the config in two stages - first, it'll
map it into an arbitrary address and read the address the primary
process has allocated for the shared config. Then, the config is
unmapped and re-mapped using the address previously read.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/include/rte_eal_memconfig.h |  5 +++
 lib/librte_eal/linuxapp/eal/eal.c | 44 ---
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h 
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 30ce6fc..d6359e5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -89,6 +89,11 @@ struct rte_mem_config {

/* Heaps of Malloc per socket */
struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+
+   /* address of mem_config in primary process. used to map shared config 
into
+* exact same address the primary process maps it.
+*/
+   uint64_t mem_cfg_addr;
 } __attribute__((__packed__));


diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 6994303..fee375c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -239,13 +239,19 @@ rte_eal_config_create(void)
}
memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+
+   /* store address of the config in the config itself so that secondary
+* processes could later map the config into this exact location */
+   rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+
 }

 /* attach to an existing shared memory config */
 static void
 rte_eal_config_attach(void)
 {
-   void *rte_mem_cfg_addr;
+   struct rte_mem_config *mem_config;
+
const char *pathname = eal_runtime_config_path();

if (internal_config.no_shconf)
@@ -257,13 +263,40 @@ rte_eal_config_attach(void)
rte_panic("Cannot open '%s' for rte_mem_config\n", 
pathname);
}

-   rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
-   PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 
0);
+   /* map it as read-only first */
+   mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
+   PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
+   if (mem_config == MAP_FAILED)
+   rte_panic("Cannot mmap memory for rte_config\n");
+
+   rte_config.mem_config = mem_config;
+}
+
+/* reattach the shared config at exact memory location primary process has it 
*/
+static void
+rte_eal_config_reattach(void)
+{
+   struct rte_mem_config *mem_config;
+   void *rte_mem_cfg_addr;
+
+   if (internal_config.no_shconf)
+   return;
+
+   /* save the address primary process has mapped shared config to */
+   rte_mem_cfg_addr = (void *) (uintptr_t) 
rte_config.mem_config->mem_cfg_addr;
+
+   /* unmap original config */
+   munmap(rte_config.mem_config, sizeof(struct rte_mem_config));
+
+   /* remap the config at proper address */
+   mem_config = (struct rte_mem_config *) mmap(rte_mem_cfg_addr,
+   sizeof(*mem_config), PROT_READ | PROT_WRITE, MAP_SHARED,
+   mem_cfg_fd, 0);
close(mem_cfg_fd);
-   if (rte_mem_cfg_addr == MAP_FAILED)
+   if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr)
rte_panic("Cannot mmap memory for rte_config\n");

-   rte_config.mem_config = (struct rte_mem_config *) rte_mem_cfg_addr;
+   rte_config.mem_config = mem_config;
 }

 /* Detect if we are a primary or a secondary process */
@@ -301,6 +334,7 @@ rte_config_init(void)
case RTE_PROC_SECONDARY:
rte_eal_config_attach();
rte_eal_mcfg_wait_complete(rte_config.mem_config);
+   rte_eal_config_reattach();

[dpdk-dev] [PATCH v3 5/9] rte_fbk_hash: make rte_fbk_hash tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_hash/rte_fbk_hash.c | 73 ++
 lib/librte_hash/rte_fbk_hash.h |  3 --
 2 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/lib/librte_hash/rte_fbk_hash.c b/lib/librte_hash/rte_fbk_hash.c
index 4d67554..1356cf4 100644
--- a/lib/librte_hash/rte_fbk_hash.c
+++ b/lib/librte_hash/rte_fbk_hash.c
@@ -54,7 +54,7 @@

 #include "rte_fbk_hash.h"

-TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
+TAILQ_HEAD(rte_fbk_hash_list, rte_tailq_entry);

 /**
  * Performs a lookup for an existing hash table, and returns a pointer to
@@ -69,24 +69,29 @@ TAILQ_HEAD(rte_fbk_hash_list, rte_fbk_hash_table);
 struct rte_fbk_hash_table *
 rte_fbk_hash_find_existing(const char *name)
 {
-   struct rte_fbk_hash_table *h;
+   struct rte_fbk_hash_table *h = NULL;
+   struct rte_tailq_entry *te;
struct rte_fbk_hash_list *fbk_hash_list;

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(h, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   h = (struct rte_fbk_hash_table *) te->data;
if (strncmp(name, h->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);
-   if (h == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }
return h;
 }

@@ -104,6 +109,7 @@ struct rte_fbk_hash_table *
 rte_fbk_hash_create(const struct rte_fbk_hash_params *params)
 {
struct rte_fbk_hash_table *ht = NULL;
+   struct rte_tailq_entry *te;
char hash_name[RTE_FBK_HASH_NAMESIZE];
const uint32_t mem_size =
sizeof(*ht) + (sizeof(ht->t[0]) * params->entries);
@@ -112,7 +118,8 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)

/* check that we have an initialised tail queue */
if ((fbk_hash_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list)) == 
NULL) {
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -134,20 +141,28 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(ht, fbk_hash_list, next) {
+   TAILQ_FOREACH(te, fbk_hash_list, next) {
+   ht = (struct rte_fbk_hash_table *) te->data;
if (strncmp(params->name, ht->name, RTE_FBK_HASH_NAMESIZE) == 0)
break;
}
-   if (ht != NULL)
+   if (te != NULL)
goto exit;

+   te = rte_zmalloc("FBK_HASH_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory for table. */
-   ht = (struct rte_fbk_hash_table *)rte_malloc_socket(hash_name, mem_size,
+   ht = (struct rte_fbk_hash_table *)rte_zmalloc_socket(hash_name, 
mem_size,
0, params->socket_id);
-   if (ht == NULL)
+   if (ht == NULL) {
+   RTE_LOG(ERR, HASH, "Failed to allocate fbk hash table\n");
+   rte_free(te);
goto exit;
-
-   memset(ht, 0, mem_size);
+   }

/* Set up hash table context. */
rte_snprintf(ht->name, sizeof(ht->name), "%s", params->name);
@@ -169,7 +184,9 @@ rte_fbk_hash_create(const struct rte_fbk_hash_params 
*params)
ht->init_val = RTE_FBK_HASH_INIT_VAL_DEFAULT;
}

-   TAILQ_INSERT_TAIL(fbk_hash_list, ht, next);
+   te->data = (void *) ht;
+
+   TAILQ_INSERT_TAIL(fbk_hash_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +203,38 @@ exit:
 void
 rte_fbk_hash_free(struct rte_fbk_hash_table *ht)
 {
+   struct rte_tailq_entry *te;
+   struct rte_fbk_hash_list *fbk_hash_list;
+
if (ht == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_FBK_HASH, rte_fbk_hash_list, ht);
+   /* check that we have an initialised tail queue */
+   if ((fbk_hash_list =
+   RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_FBK_HASH,
+   rte_fbk_hash_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   r

[dpdk-dev] [PATCH v3 7/9] rte_lpm: make lpm tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm.c | 65 
 lib/librte_lpm/rte_lpm.h |  2 --
 2 files changed, 54 insertions(+), 13 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 592750e..6a49d43 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -56,7 +56,7 @@

 #include "rte_lpm.h"

-TAILQ_HEAD(rte_lpm_list, rte_lpm);
+TAILQ_HEAD(rte_lpm_list, rte_tailq_entry);

 #define MAX_DEPTH_TBL24 24

@@ -118,24 +118,29 @@ depth_to_range(uint8_t depth)
 struct rte_lpm *
 rte_lpm_find_existing(const char *name)
 {
-   struct rte_lpm *l;
+   struct rte_lpm *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) 
== NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm *) te->data;
if (strncmp(name, l->name, RTE_LPM_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -149,12 +154,13 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
 {
char mem_name[RTE_LPM_NAMESIZE];
struct rte_lpm *lpm = NULL;
+   struct rte_tailq_entry *te;
uint32_t mem_size;
struct rte_lpm_list *lpm_list;

/* check that we have an initialised tail queue */
-   if ((lpm_list =
-RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   if ((lpm_list = RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM,
+   rte_lpm_list)) == NULL) {
rte_errno = E_RTE_NO_TAILQ;
return NULL;
}
@@ -176,18 +182,27 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm *) te->data;
if (strncmp(name, lpm->name, RTE_LPM_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
goto exit;

+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+   goto exit;
+   }
+
/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm *)rte_zmalloc_socket(mem_name, mem_size,
CACHE_LINE_SIZE, socket_id);
if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -195,7 +210,9 @@ rte_lpm_create(const char *name, int socket_id, int 
max_rules,
lpm->max_rules = max_rules;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -209,12 +226,38 @@ exit:
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+   struct rte_lpm_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM, rte_lpm_list, lpm);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index d35565d..308f5ef 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -132,8 +132,6 @@ struct rte_lpm_rule_info {

 /** @internal LPM structure. */
 struct rte_lpm

[dpdk-dev] [PATCH v3 8/9] rte_lpm6: make lpm6 tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_lpm/rte_lpm6.c | 62 ++-
 1 file changed, 51 insertions(+), 11 deletions(-)

diff --git a/lib/librte_lpm/rte_lpm6.c b/lib/librte_lpm/rte_lpm6.c
index 56c74a1..73b48d0 100644
--- a/lib/librte_lpm/rte_lpm6.c
+++ b/lib/librte_lpm/rte_lpm6.c
@@ -77,7 +77,7 @@ enum valid_flag {
VALID
 };

-TAILQ_HEAD(rte_lpm6_list, rte_lpm6);
+TAILQ_HEAD(rte_lpm6_list, rte_tailq_entry);

 /** Tbl entry structure. It is the same for both tbl24 and tbl8 */
 struct rte_lpm6_tbl_entry {
@@ -99,8 +99,6 @@ struct rte_lpm6_rule {

 /** LPM6 structure. */
 struct rte_lpm6 {
-   TAILQ_ENTRY(rte_lpm6) next;  /**< Next in list. */
-
/* LPM metadata. */
char name[RTE_LPM6_NAMESIZE];/**< Name of the lpm. */
uint32_t max_rules;  /**< Max number of rules. */
@@ -149,6 +147,7 @@ rte_lpm6_create(const char *name, int socket_id,
 {
char mem_name[RTE_LPM6_NAMESIZE];
struct rte_lpm6 *lpm = NULL;
+   struct rte_tailq_entry *te;
uint64_t mem_size, rules_size;
struct rte_lpm6_list *lpm_list;

@@ -179,12 +178,20 @@ rte_lpm6_create(const char *name, int socket_id,
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);

/* Guarantee there's no existing */
-   TAILQ_FOREACH(lpm, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   lpm = (struct rte_lpm6 *) te->data;
if (strncmp(name, lpm->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
-   if (lpm != NULL)
+   if (te != NULL)
+   goto exit;
+
+   /* allocate tailq entry */
+   te = rte_zmalloc("LPM6_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, LPM, "Failed to allocate tailq entry!\n");
goto exit;
+   }

/* Allocate memory to store the LPM data structures. */
lpm = (struct rte_lpm6 *)rte_zmalloc_socket(mem_name, (size_t)mem_size,
@@ -192,6 +199,7 @@ rte_lpm6_create(const char *name, int socket_id,

if (lpm == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
+   rte_free(te);
goto exit;
}

@@ -201,6 +209,7 @@ rte_lpm6_create(const char *name, int socket_id,
if (lpm->rules_tbl == NULL) {
RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
rte_free(lpm);
+   rte_free(te);
goto exit;
}

@@ -209,7 +218,9 @@ rte_lpm6_create(const char *name, int socket_id,
lpm->number_tbl8s = config->number_tbl8s;
rte_snprintf(lpm->name, sizeof(lpm->name), "%s", name);

-   TAILQ_INSERT_TAIL(lpm_list, lpm, next);
+   te->data = (void *) lpm;
+
+   TAILQ_INSERT_TAIL(lpm_list, te, next);

 exit:
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
@@ -223,7 +234,8 @@ exit:
 struct rte_lpm6 *
 rte_lpm6_find_existing(const char *name)
 {
-   struct rte_lpm6 *l;
+   struct rte_lpm6 *l = NULL;
+   struct rte_tailq_entry *te;
struct rte_lpm6_list *lpm_list;

/* Check that we have an initialised tail queue */
@@ -234,14 +246,17 @@ rte_lpm6_find_existing(const char *name)
}

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);
-   TAILQ_FOREACH(l, lpm_list, next) {
+   TAILQ_FOREACH(te, lpm_list, next) {
+   l = (struct rte_lpm6 *) te->data;
if (strncmp(name, l->name, RTE_LPM6_NAMESIZE) == 0)
break;
}
rte_rwlock_read_unlock(RTE_EAL_TAILQ_RWLOCK);

-   if (l == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return l;
 }
@@ -252,13 +267,38 @@ rte_lpm6_find_existing(const char *name)
 void
 rte_lpm6_free(struct rte_lpm6 *lpm)
 {
+   struct rte_lpm6_list *lpm_list;
+   struct rte_tailq_entry *te;
+
/* Check user arguments. */
if (lpm == NULL)
return;

-   RTE_EAL_TAILQ_REMOVE(RTE_TAILQ_LPM6, rte_lpm6_list, lpm);
-   rte_free(lpm->rules_tbl);
+   /* check that we have an initialised tail queue */
+   if ((lpm_list =
+RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_LPM, rte_lpm6_list)) == NULL) {
+   rte_errno = E_RTE_NO_TAILQ;
+   return;
+   }
+
+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+   /* find our tailq entry */
+   TAILQ_FOREACH(te, lpm_list, next) {
+   if (te->data == (void *) lpm)
+   break;
+   }
+   if (te == NULL) {
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+   return;
+   }
+
+   TAILQ_REMOVE(lpm_list, te, next);
+
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
rte_free(lpm);
+   rte_free(te);
 }

 /*
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 3/9] rte_ring: make ring tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 ++--
 lib/librte_ring/Makefile  |  4 ++--
 lib/librte_ring/rte_ring.c| 33 +++
 lib/librte_ring/rte_ring.h|  2 --
 4 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c 
b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
index 4ad76a7..fa5f4e3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
+++ b/lib/librte_eal/linuxapp/eal/eal_ivshmem.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -101,7 +102,7 @@ static int memseg_idx;
 static int pagesz;

 /* Tailq heads to add rings to */
-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /*
  * Utility functions
@@ -754,6 +755,7 @@ rte_eal_ivshmem_obj_init(void)
struct ivshmem_segment * seg;
struct rte_memzone * mz;
struct rte_ring * r;
+   struct rte_tailq_entry *te;
unsigned i, ms, idx;
uint64_t offset;

@@ -808,6 +810,8 @@ rte_eal_ivshmem_obj_init(void)
mcfg->memzone_idx++;
}

+   rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
/* find rings */
for (i = 0; i < mcfg->memzone_idx; i++) {
mz = &mcfg->memzone[i];
@@ -819,10 +823,19 @@ rte_eal_ivshmem_obj_init(void)

r = (struct rte_ring*) (mz->addr_64);

-   TAILQ_INSERT_TAIL(ring_list, r, next);
+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot allocate ring tailq 
entry!\n");
+   return -1;
+   }
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);

RTE_LOG(DEBUG, EAL, "Found ring: '%s' at %p\n", r->name, 
mz->addr);
}
+   rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

 #ifdef RTE_LIBRTE_IVSHMEM_DEBUG
rte_memzone_dump(stdout);
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index 550507d..2380a43 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -42,7 +42,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_RING)-include := rte_ring.h

-# this lib needs eal
-DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal
+# this lib needs eal and rte_malloc
+DEPDIRS-$(CONFIG_RTE_LIBRTE_RING) += lib/librte_eal lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_ring/rte_ring.c b/lib/librte_ring/rte_ring.c
index 2fe4024..d2ff3fe 100644
--- a/lib/librte_ring/rte_ring.c
+++ b/lib/librte_ring/rte_ring.c
@@ -75,6 +75,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -89,7 +90,7 @@

 #include "rte_ring.h"

-TAILQ_HEAD(rte_ring_list, rte_ring);
+TAILQ_HEAD(rte_ring_list, rte_tailq_entry);

 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)
@@ -155,6 +156,7 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
 {
char mz_name[RTE_MEMZONE_NAMESIZE];
struct rte_ring *r;
+   struct rte_tailq_entry *te;
const struct rte_memzone *mz;
ssize_t ring_size;
int mz_flags = 0;
@@ -173,6 +175,13 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
return NULL;
}

+   te = rte_zmalloc("RING_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, RING, "Cannot reserve memory for tailq\n");
+   rte_errno = ENOMEM;
+   return NULL;
+   }
+
rte_snprintf(mz_name, sizeof(mz_name), "%s%s", RTE_RING_MZ_PREFIX, 
name);

rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -186,10 +195,14 @@ rte_ring_create(const char *name, unsigned count, int 
socket_id,
/* no need to check return value here, we already checked the
 * arguments above */
rte_ring_init(r, name, count, flags);
-   TAILQ_INSERT_TAIL(ring_list, r, next);
+
+   te->data = (void *) r;
+
+   TAILQ_INSERT_TAIL(ring_list, te, next);
} else {
r = NULL;
RTE_LOG(ERR, RING, "Cannot reserve memory\n");
+   rte_free(te);
}
rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);

@@ -272,7 +285,7 @@ rte_ring_dump(FILE *f, const struct rte_ring *r)
 void
 rte_ring_list_dump(FILE *f)
 {
-   const struct rte_ring *mp;
+   const struct rte_tailq_entry *te;
struct rte_ring_list *ring_list;

/* check that we have an initialised tail queue */
@@ -284,8 +297,8 @@ rte_ring_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_TAILQ_RWLOCK);

-   TAILQ_FOREACH(mp, ring_list, next) {
-   rte_ring_dump(f, mp);
+   TAILQ_FOREACH(te, ring_list, next)

[dpdk-dev] [PATCH v3 6/9] rte_mempool: make mempool tailq fully local

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_mempool/Makefile  |  3 ++-
 lib/librte_mempool/rte_mempool.c | 37 -
 lib/librte_mempool/rte_mempool.h |  2 --
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index c79b306..9939e10 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -44,7 +44,8 @@ endif
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h

-# this lib needs eal
+# this lib needs eal, rte_ring and rte_malloc
 DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_eal lib/librte_ring
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MEMPOOL) += lib/librte_malloc

 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7eebf7f..736e854 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +61,7 @@

 #include "rte_mempool.h"

-TAILQ_HEAD(rte_mempool_list, rte_mempool);
+TAILQ_HEAD(rte_mempool_list, rte_tailq_entry);

 #define CACHE_FLUSHTHRESH_MULTIPLIER 1.5

@@ -404,6 +405,7 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
char mz_name[RTE_MEMZONE_NAMESIZE];
char rg_name[RTE_RING_NAMESIZE];
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_ring *r;
const struct rte_memzone *mz;
size_t mempool_size;
@@ -501,6 +503,13 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
}
}

+   /* try to allocate tailq entry */
+   te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
+   if (te == NULL) {
+   RTE_LOG(ERR, MEMPOOL, "Cannot allocate tailq entry!\n");
+   goto exit;
+   }
+
/*
 * If user provided an external memory buffer, then use it to
 * store mempool objects. Otherwise reserve memzone big enough to
@@ -527,8 +536,10 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,
 * no more memory: in this case we loose previously reserved
 * space for the as we cannot free it
 */
-   if (mz == NULL)
+   if (mz == NULL) {
+   rte_free(te);
goto exit;
+   }

if (rte_eal_has_hugepages()) {
startaddr = (void*)mz->addr;
@@ -587,7 +598,9 @@ rte_mempool_xmem_create(const char *name, unsigned n, 
unsigned elt_size,

mempool_populate(mp, n, 1, obj_init, obj_init_arg);

-   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, mp);
+   te->data = (void *) mp;
+
+   RTE_EAL_TAILQ_INSERT_TAIL(RTE_TAILQ_MEMPOOL, rte_mempool_list, te);

 exit:
rte_rwlock_write_unlock(RTE_EAL_MEMPOOL_RWLOCK);
@@ -812,6 +825,7 @@ void
 rte_mempool_list_dump(FILE *f)
 {
const struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -822,7 +836,8 @@ rte_mempool_list_dump(FILE *f)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
rte_mempool_dump(f, mp);
}

@@ -834,6 +849,7 @@ struct rte_mempool *
 rte_mempool_lookup(const char *name)
 {
struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -844,15 +860,18 @@ rte_mempool_lookup(const char *name)

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
+   TAILQ_FOREACH(te, mempool_list, next) {
+   mp = (struct rte_mempool *) te->data;
if (strncmp(name, mp->name, RTE_MEMPOOL_NAMESIZE) == 0)
break;
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOOL_RWLOCK);

-   if (mp == NULL)
+   if (te == NULL) {
rte_errno = ENOENT;
+   return NULL;
+   }

return mp;
 }
@@ -860,7 +879,7 @@ rte_mempool_lookup(const char *name)
 void rte_mempool_walk(void (*func)(const struct rte_mempool *, void *),
  void *arg)
 {
-   struct rte_mempool *mp = NULL;
+   struct rte_tailq_entry *te = NULL;
struct rte_mempool_list *mempool_list;

if ((mempool_list =
@@ -871,8 +890,8 @@ void rte_mempool_walk(void (*func)(const struct rte_mempool 
*, void *),

rte_rwlock_read_lock(RTE_EAL_MEMPOOL_RWLOCK);

-   TAILQ_FOREACH(mp, mempool_list, next) {
-   (*func)(mp, arg);
+   TAILQ_FOREACH(te, mempool_list, next) {
+   (*func)((struct rte_mempool *) te->data, arg);
}

rte_rwlock_read_unlock(RTE_EAL_MEMPOO

[dpdk-dev] [PATCH v3 2/9] rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 app/test/test_tailq.c | 33 ---
 lib/librte_eal/common/eal_common_tailqs.c |  2 +-
 lib/librte_eal/common/include/rte_tailq.h |  9 +
 3 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/app/test/test_tailq.c b/app/test/test_tailq.c
index 67da009..c9b53ee 100644
--- a/app/test/test_tailq.c
+++ b/app/test/test_tailq.c
@@ -52,16 +52,16 @@

 #define DEFAULT_TAILQ (RTE_TAILQ_NUM)

-static struct rte_dummy d_elem;
+static struct rte_tailq_entry d_elem;

 static int
 test_tailq_create(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;
unsigned i;

/* create a first tailq and check its non-null */
-   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error allocating dummy_q0\n");

@@ -70,13 +70,14 @@ test_tailq_create(void)
TAILQ_INSERT_TAIL(d_head, &d_elem, next);

/* try allocating dummy_q0 again, and check for failure */
-   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_dummy_head) == NULL)
+   if (RTE_TAILQ_RESERVE_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head) == 
NULL)
do_return("Error, non-null result returned when attemption to "
"re-allocate a tailq\n");

/* now fill up the tailq slots available and check we get an error */
for (i = RTE_TAILQ_NUM; i < RTE_MAX_TAILQ; i++){
-   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i, rte_dummy_head)) == 
NULL)
+   if ((d_head = RTE_TAILQ_RESERVE_BY_IDX(i,
+   rte_tailq_entry_head)) == NULL)
break;
}

@@ -91,10 +92,10 @@ static int
 test_tailq_lookup(void)
 {
/* run successful  test - check result is found */
-   struct rte_dummy_head *d_head;
-   struct rte_dummy *d_ptr;
+   struct rte_tailq_entry_head *d_head;
+   struct rte_tailq_entry *d_ptr;

-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(DEFAULT_TAILQ, rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error with tailq lookup\n");

@@ -104,7 +105,7 @@ test_tailq_lookup(void)
"expected element not found\n");

/* now try a bad/error lookup */
-   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP_BY_IDX(RTE_MAX_TAILQ, rte_tailq_entry_head);
if (d_head != NULL)
do_return("Error, lookup does not return NULL for bad tailq 
name\n");

@@ -115,7 +116,7 @@ test_tailq_lookup(void)
 static int
 test_tailq_deprecated(void)
 {
-   struct rte_dummy_head *d_head;
+   struct rte_tailq_entry_head *d_head;

/* since TAILQ_RESERVE is not able to create new tailqs,
 * we should find an existing one (IOW, RTE_TAILQ_RESERVE behaves 
identical
@@ -123,29 +124,29 @@ test_tailq_deprecated(void)
 *
 * PCI_RESOURCE_LIST tailq is guaranteed to
 * be present in any DPDK app. */
-   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

-   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("PCI_RESOURCE_LIST", rte_tailq_entry_head);
if (d_head == NULL)
do_return("Error finding PCI_RESOURCE_LIST\n");

/* try doing that with non-existent names */
-   d_head = RTE_TAILQ_RESERVE("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP("random name", rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP("random name", rte_tailq_entry_head);
if (d_head != NULL)
do_return("Non-existent tailq found!\n");

/* try doing the same with NULL names */
-   d_head = RTE_TAILQ_RESERVE(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_RESERVE(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

-   d_head = RTE_TAILQ_LOOKUP(NULL, rte_dummy_head);
+   d_head = RTE_TAILQ_LOOKUP(NULL, rte_tailq_entry_head);
if (d_head != NULL)
do_return("NULL tailq found!\n");

diff --git a/lib/librte_eal/common/eal_common_tailqs.c 
b/lib/librte_eal/common/eal_common_tailqs.c
index f294a58..db9a185 100644
--- a/lib/librte_eal/common/eal_common_tailqs.c
+++ b/lib/librte_eal/common/eal_common_tailqs.c
@@ -118,7 +118,7 @@ rte_dump_tailq(FILE *f)
rte_rwlock_rea

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto

Hi,

> Subject: ##freemail## RE: ##freemail## RE: [dpdk-dev] Testing memnic for VM 
> to VM transfer
> 
> Hi, Hiroshi
> 
>Do you mean I must use DPDK vSwitch in host when I use MEMNIC PMD in
> guest VM? actually, I just want a channel which can put the data from host
> to guest quickly. Do you have any idea that how to write a host application
> to put the data to guest memnic PMD?

Yes, basically I made the MEMNIC interface work with DPDK vSwitch.

By the way, you can mmap() the shm which specified as the ivshmem and put
the proper data to send a packet to guest PMD.
I don't have time to make proper code, but can advise you;
please see common/memnic.h and the memory layout.
1) Set magic and version in header on host.
2) Initialize PMD on guest.
3) Check the reset is 1 and set valid to 1, reset to 0 on host.
4) Use uplink area the default block size 4K.
   Set len and fill ether frame data, then set the status to 2 on host.
   Guest PMD may receive the packet.
   Proceed to the next packet block.

thanks,
Hiroshi

> 
> -Original Message-
> From: Hiroshi Shimamoto [mailto:h-shimamoto at ct.jp.nec.com]
> Sent: Wednesday, June 18, 2014 7:11 PM
> To: GongJinrong; 'John Joyce (joycej)'; dev at dpdk.org
> Subject: RE: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM
> transfer
> 
> Hi,
> 
> > Subject: ##freemail## RE: [dpdk-dev] Testing memnic for VM to VM
> > transfer
> >
> > Hi, Hiroshi
> >
> >I just start to learn DPDK and memnic, in memnic guide, you said
> > "On host, the shared memory must be initialized by an application
> > using memnic", I am not so clear that how to initialize the share
> > memory in host, do you means use posix API or DPDK API to create the
> > share memory?(it seems memnic guest side use rte_mbuf to transfer
> > data), do you have any sample code to demo how to use memnic in host?
> 
> I don't have simple MEMNIC sample to use it on host.
> Could you please try DPDK vSwitch and enables MEMNIC vport?
> DPDK vSwitch must handle packets between physical NIC port and MEMNIC vport
> exposed to guest with dpdk.org memnic driver.
> 
> thanks,
> Hiroshi
> 
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> > Sent: Wednesday, June 18, 2014 12:02 PM
> > To: John Joyce (joycej); dev at dpdk.org
> > Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> >
> > Hi,
> >
> > > Subject: [dpdk-dev] Testing memnic for VM to VM transfer
> > >
> > > Hi everyone:
> > > We are interested in testing the performance of the memnic
> > > driver
> > posted at http://dpdk.org/browse/memnic/refs/.
> > > We want to compare its performance compared to other techniques to
> > > transfer packets between the guest and the kernel, predominately for
> > > VM to
> > VM transfers.
> > >
> > > We have downloaded the memnic components and have got it running in
> > > a
> > guest VM.
> > >
> > > The question we hope this group might be able to help with is what
> > > would be the best way to processes the packets in the kernel to get
> > > a VM
> > to VM transfer.
> >
> > I think there is no kernel code work with MEMNIC.
> > The recommend switching software on the host is Intel DPDK vSwitch
> > hosted on 01.org and github.
> > https://github.com/01org/dpdk-ovs/tree/development
> >
> > Intel DPDK vSwitch runs on userspace not kernel.
> >
> > I introduced this mechanism to DPDK vSwitch and the guest drivers are
> > maintained in dpdk.org.
> >
> > thanks,
> > Hiroshi
> >
> > >
> > > A couple options might be possible
> > >
> > >
> > > 1.   Common shared buffer between two VMs.  With some utility/code
> to
> > switch TX & RX rings between the two VMs.
> > >
> > > VM1 application --- memnic  ---  common shared memory buffer on the
> > > host --- memnic  ---  VM2 application
> > >
> > > 2.   Special purpose Kernel switching module
> > >
> > > VM1 application --- memnic  ---  shared memory VM1  --- Kernel
> > > switching module  --- shared memory VM2  --- memnic  ---
> > > VM2 application
> > >
> > > 3.   Existing Kernel switching module
> > >
> > > VM1 application --- memnic  ---  shared memory VM1  --- existing
> > > Kernel switching module (e.g. OVS/linux Bridge/VETh pair)
> > > --- shared memory VM2  --- memnic  ---  VM2 application
> > >
> > > Can anyone recommend which approach might be best or easiest?   We would
> > like to avoid writing much (or any) kernel code
> > > so if there are already any open source code or test utilities that
> > > provide one of these options or would be a good starting point to
> > > start
> > from,  a pointer would be much appreciated.
> > >
> > > Thanks in advance
> > >
> > >
> > > John Joyce

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Thomas Monjalon

2014-06-18 11:42, Hiroshi Shimamoto:
> 2014-06-18 19:26, GongJinrong:
> > Do you have any idea that how to write a host application
> > to put the data to guest memnic PMD?
> 
> Yes, basically I made the MEMNIC interface work with DPDK vSwitch.
> 
> By the way, you can mmap() the shm which specified as the ivshmem and put
> the proper data to send a packet to guest PMD.
> I don't have time to make proper code, but can advise you;
> please see common/memnic.h and the memory layout.
> 1) Set magic and version in header on host.
> 2) Initialize PMD on guest.
> 3) Check the reset is 1 and set valid to 1, reset to 0 on host.
> 4) Use uplink area the default block size 4K.
>Set len and fill ether frame data, then set the status to 2 on host.
>Guest PMD may receive the packet.
>Proceed to the next packet block.

Such application should be integrated in memnic repository.
I know Olivier wrote one which could be sent on next week.

-- 
Thomas

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread Hiroshi Shimamoto

Hi,

> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> 2014-06-18 11:42, Hiroshi Shimamoto:
> > 2014-06-18 19:26, GongJinrong:
> > > Do you have any idea that how to write a host application
> > > to put the data to guest memnic PMD?
> >
> > Yes, basically I made the MEMNIC interface work with DPDK vSwitch.
> >
> > By the way, you can mmap() the shm which specified as the ivshmem and put
> > the proper data to send a packet to guest PMD.
> > I don't have time to make proper code, but can advise you;
> > please see common/memnic.h and the memory layout.
> > 1) Set magic and version in header on host.
> > 2) Initialize PMD on guest.
> > 3) Check the reset is 1 and set valid to 1, reset to 0 on host.
> > 4) Use uplink area the default block size 4K.
> >Set len and fill ether frame data, then set the status to 2 on host.
> >Guest PMD may receive the packet.
> >Proceed to the next packet block.
> 
> Such application should be integrated in memnic repository.
> I know Olivier wrote one which could be sent on next week.

yeah, I just begin to feel to need such a software in the repository.

thanks,
Hiroshi

> 
> --
> Thomas

[dpdk-dev] Testing memnic for VM to VM transfer

2014-06-18 Thread GongJinrong

Thanks guys, I will try.

-Original Message-
From: Hiroshi Shimamoto [mailto:h-shimam...@ct.jp.nec.com] 
Sent: Wednesday, June 18, 2014 8:06 PM
To: Thomas Monjalon; GongJinrong
Cc: dev at dpdk.org; 'John Joyce (joycej)'; Olivier MATZ
Subject: RE: [dpdk-dev] Testing memnic for VM to VM transfer

Hi,

> Subject: Re: [dpdk-dev] Testing memnic for VM to VM transfer
> 
> 2014-06-18 11:42, Hiroshi Shimamoto:
> > 2014-06-18 19:26, GongJinrong:
> > > Do you have any idea that how to write a host application to put 
> > > the data to guest memnic PMD?
> >
> > Yes, basically I made the MEMNIC interface work with DPDK vSwitch.
> >
> > By the way, you can mmap() the shm which specified as the ivshmem 
> > and put the proper data to send a packet to guest PMD.
> > I don't have time to make proper code, but can advise you; please 
> > see common/memnic.h and the memory layout.
> > 1) Set magic and version in header on host.
> > 2) Initialize PMD on guest.
> > 3) Check the reset is 1 and set valid to 1, reset to 0 on host.
> > 4) Use uplink area the default block size 4K.
> >Set len and fill ether frame data, then set the status to 2 on host.
> >Guest PMD may receive the packet.
> >Proceed to the next packet block.
> 
> Such application should be integrated in memnic repository.
> I know Olivier wrote one which could be sent on next week.

yeah, I just begin to feel to need such a software in the repository.

thanks,
Hiroshi

> 
> --
> Thomas

[dpdk-dev] [PATCH v2 2/2] vfio: more verbose error messages

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
CC: Neil Horman 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 48 --
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 9eb5dcd..bf765b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -180,7 +180,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
VFIO_TYPE1_IOMMU);
if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
+   "error %i (%s)\n", errno, strerror(errno));
return -1;
}

@@ -201,7 +202,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);

if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}
}
@@ -253,7 +255,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)

ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
if (ret < 0) {
-   RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+   RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -271,7 +274,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
/* set up an eventfd for interrupts */
fd = eventfd(0, 0);
if (fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up eventfd, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -313,22 +317,31 @@ pci_vfio_get_container_fd(void)
if (internal_config.process_type == RTE_PROC_PRIMARY) {
vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+   RTE_LOG(ERR, EAL, "  cannot open VFIO container, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get VFIO API 
version, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported VFIO API 
version!\n");
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret != 1) {
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported IOMMU 
type!\n");
close(vfio_container_fd);
return -1;
}
@@ -564,7 +577,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
/* check if the group is viable */
ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
if (ret) {
-   RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+   RTE_LOG(ERR, EAL, "  %s cannot get group status, "
+   "error %i (%s)\n", pci_addr, errno, 
strerror(errno));
close(vfio_group_fd);
clear_current_group();
return -1;
@@ -587,8 +601,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,
&vfio_cfg.vfio_container_fd);
if (ret) {
-   RTE_LOG(E

[dpdk-dev] [PATCH v2 1/2] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Anatoly Burakov

Currently, VFIO only checks for being able to access the /dev/vfio
directory when initializing VFIO, deferring actual VFIO container
initialization to VFIO binding code. This doesn't bode well for when
VFIO container cannot be initialized for whatever reason, because
it results in unrecoverable error even if the user didn't set up
VFIO and didn't even want to use it in the first place.

This patch fixes this by moving container initialization into the
code that checks if VFIO is available at runtime. Therefore, any
issues with the container will be known at initialization stage and
VFIO will simply be turned off if container could not be set up.

Signed-off-by: Anatoly Burakov 
Acked-by: Bruce Richardson 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..9eb5dcd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -523,17 +523,6 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
loc->domain, loc->bus, loc->devid, loc->function);

-   /* get container fd (needs to be done only once per initialization) */
-   if (vfio_cfg.vfio_container_fd == -1) {
-   int vfio_container_fd = pci_vfio_get_container_fd();
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", 
pci_addr);
-   return -1;
-   }
-
-   vfio_cfg.vfio_container_fd = vfio_container_fd;
-   }
-
/* get group number */
iommu_group_no = pci_vfio_get_group_no(pci_addr);

@@ -770,10 +759,10 @@ pci_vfio_enable(void)
vfio_cfg.vfio_groups[i].fd = -1;
vfio_cfg.vfio_groups[i].group_no = -1;
}
-   vfio_cfg.vfio_container_fd = -1;
+   vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();

/* check if we have VFIO driver enabled */
-   if (access(VFIO_DIR, F_OK) == 0)
+   if (vfio_cfg.vfio_container_fd != -1)
vfio_cfg.vfio_enabled = 1;
else
RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong 
permissions\n");
-- 
1.8.1.4

[dpdk-dev] [PATCH v2 0/2] Fix issues with VFIO

2014-06-18 Thread Anatoly Burakov

This patchset fixes an issue with VFIO where DPDK initialization could
fail even if the user didn't want to use VFIO in the first place. Also,
more verbose and descriptive error messages were added to VFIO code, for
example distinguishing between a failed ioctl() call and an unsupported
VFIO API version.

Anatoly Burakov (2):
  vfio: open VFIO container at startup rather than during init
  vfio: more verbose error messages

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 --
 1 file changed, 34 insertions(+), 29 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH v2 0/2] Fix issues with VFIO

2014-06-18 Thread Neil Horman

On Wed, Jun 18, 2014 at 02:07:17PM +0100, Anatoly Burakov wrote:
> This patchset fixes an issue with VFIO where DPDK initialization could
> fail even if the user didn't want to use VFIO in the first place. Also,
> more verbose and descriptive error messages were added to VFIO code, for
> example distinguishing between a failed ioctl() call and an unsupported
> VFIO API version.
> 
> Anatoly Burakov (2):
>   vfio: open VFIO container at startup rather than during init
>   vfio: more verbose error messages
> 

You still need a changelog entry with each patch (the cover letter doesn't get
included with the patches in git).  You also probably want to mention that the
second patch also fixes a case in which syscall errors are erroneously treated
as a success case

Neil

>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 
> --
>  1 file changed, 34 insertions(+), 29 deletions(-)
> 
> -- 
> 1.8.1.4
> 
>

[dpdk-dev] [PATCH v3 0/9] Make DPDK tailqs fully local

2014-06-18 Thread Ananyev, Konstantin



> This issue was reported by OVS-DPDK project, and the fix should go to
> upstream DPDK. This is not memnic-related - this is to do with
> DPDK's rte_ivshmem library.
> 
> Every DPDK data structure has a corresponding TAILQ reserved for it in
> the runtime config file. Those TAILQs are fully local to the process,
> however most data structures contain pointers to next entry in the
> TAILQ.
> 
> Since the data structures such as rings are shared in their entirety,
> those TAILQ pointers are shared as well. Meaning that, after a
> successful rte_ring creation, the tailq_next pointer of the last
> ring in the TAILQ will be updated with a pointer to a ring which may
> not be present in the address space of another process (i.e. a ring
> that may be host-local or guest-local, and not shared over IVSHMEM).
> Any successive ring create/lookup on the other side of IVSHMEM will
> result in trying to dereference an invalid pointer.
> 
> This patchset fixes this problem by creating a default tailq entry
> that may be used by any data structure that chooses to use TAILQs.
> This default TAILQ entry will consist of a tailq_next/tailq_prev
> pointers, and an opaque pointer to arbitrary data. All TAILQ
> pointers from data structures themselves will be removed and
> replaced by those generic TAILQ entries, thus fixing the problem
> of potentially exposing local address space to shared structures.
> 
> Technically, only rte_ring structure require modification, because
> IVSHMEM is only using memzones (which aren't in TAILQs) and rings,
> but for consistency's sake other TAILQ-based data structures were
> adapted as well.
> 
> v2 changes:
> * fixed race conditions in *_free operations
> * fixed multiprocess support for malloc heaps
> * added similar changes for acl
> * rebased on top of e88b42f818bc1a6d4ce6cb70371b66e37fa34f7d
> 
> v3 changes:
> * fixed race reported by Konstantin Ananyev (introduced in v2)
> 
> Anatoly Burakov (9):
>   eal: map shared config into exact same address as primary process
>   rte_tailq: change rte_dummy to rte_tailq_entry, add data pointer
>   rte_ring: make ring tailq fully local
>   rte_hash: make rte_hash tailq fully local
>   rte_fbk_hash: make rte_fbk_hash tailq fully local
>   rte_mempool: make mempool tailq fully local
>   rte_lpm: make lpm tailq fully local
>   rte_lpm6: make lpm6 tailq fully local
>   rte_acl: make acl tailq fully local
> 
>  app/test/test_tailq.c | 33 +-
>  lib/librte_acl/acl.h  |  1 -
>  lib/librte_acl/rte_acl.c  | 74 
> ++-
>  lib/librte_eal/common/eal_common_tailqs.c |  2 +-
>  lib/librte_eal/common/include/rte_eal_memconfig.h |  5 ++
>  lib/librte_eal/common/include/rte_tailq.h |  9 +--
>  lib/librte_eal/linuxapp/eal/eal.c | 44 --
>  lib/librte_eal/linuxapp/eal/eal_ivshmem.c | 17 +-
>  lib/librte_hash/rte_fbk_hash.c| 73 +-
>  lib/librte_hash/rte_fbk_hash.h|  3 -
>  lib/librte_hash/rte_hash.c| 61 ---
>  lib/librte_hash/rte_hash.h|  2 -
>  lib/librte_lpm/rte_lpm.c  | 65 
>  lib/librte_lpm/rte_lpm.h  |  2 -
>  lib/librte_lpm/rte_lpm6.c | 62 +++
>  lib/librte_mempool/Makefile   |  3 +-
>  lib/librte_mempool/rte_mempool.c  | 37 +---
>  lib/librte_mempool/rte_mempool.h  |  2 -
>  lib/librte_ring/Makefile  |  4 +-
>  lib/librte_ring/rte_ring.c| 33 +++---
>  lib/librte_ring/rte_ring.h|  2 -
>  21 files changed, 415 insertions(+), 119 deletions(-)
> 
> --

Acked-by: Konstantin Ananyev

[dpdk-dev] [PATCH 01/10] ip_frag: rename RTE_IP_FRAG_ASSERT to IP_FRAG_ASSERT

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_common.h | 4 ++--
 lib/librte_ip_frag/rte_ipv4_fragmentation.c | 2 +-
 lib/librte_ip_frag/rte_ipv6_fragmentation.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
index ac5cd61..5ad0a0b 100644
--- a/lib/librte_ip_frag/ip_frag_common.h
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -41,14 +41,14 @@

 #defineIP_FRAG_LOG(lvl, fmt, args...)  RTE_LOG(lvl, USER1, fmt, ##args)

-#defineRTE_IP_FRAG_ASSERT(exp) \
+#defineIP_FRAG_ASSERT(exp) \
 if (!(exp)){   \
rte_panic("function %s, line%d\tassert \"" #exp "\" failed\n",  \
__func__, __LINE__);\
 }
 #else
 #defineIP_FRAG_LOG(lvl, fmt, args...)  do {} while(0)
-#define RTE_IP_FRAG_ASSERT(exp)do { } while(0)
+#define IP_FRAG_ASSERT(exp)do { } while(0)
 #endif /* IP_FRAG_DEBUG */

 #define IPV4_KEYLEN 1
diff --git a/lib/librte_ip_frag/rte_ipv4_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
index 3ab665f..9d4e1f7 100644
--- a/lib/librte_ip_frag/rte_ipv4_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv4_fragmentation.c
@@ -107,7 +107,7 @@ rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
frag_size = (uint16_t)(mtu_size - sizeof(struct ipv4_hdr));

/* Fragment size should be a multiply of 8. */
-   RTE_IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);
+   IP_FRAG_ASSERT((frag_size & IPV4_HDR_FO_MASK) == 0);

in_hdr = (struct ipv4_hdr *) pkt_in->pkt.data;
flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);
diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c 
b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
index 6b660c4..fa04991 100644
--- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
+++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
@@ -118,7 +118,7 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
frag_size = (uint16_t)(mtu_size - sizeof(struct ipv6_hdr));

/* Fragment size should be a multiple of 8. */
-   RTE_IP_FRAG_ASSERT((frag_size & IPV6_HDR_FO_MASK) == 0);
+   IP_FRAG_ASSERT((frag_size & IPV6_HDR_FO_MASK) == 0);

/* Check that pkts_out is big enough to hold all fragments */
if (unlikely (frag_size * nb_pkts_out <
-- 
1.8.1.4

[dpdk-dev] [PATCH 04/10] ip_frag: fix stats macro, rename rte_ip_frag_tbl_stat structure

2014-06-18 Thread Anatoly Burakov

This also makes ip_reassembly sample application statistics to obey the
CONFIG_RTE_LIBRTE_IP_FRAG_FRAG_TBL_STATS config option

Signed-off-by: Anatoly Burakov 
---
 config/common_bsdapp | 1 +
 config/common_linuxapp   | 1 +
 examples/ip_reassembly/main.c| 4 ++--
 lib/librte_ip_frag/rte_ip_frag.h | 4 ++--
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 989e1da..d5db4ab 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -294,6 +294,7 @@ CONFIG_RTE_LIBRTE_NET=y
 CONFIG_RTE_LIBRTE_IP_FRAG=y
 CONFIG_RTE_LIBRTE_IP_FRAG_DEBUG=n
 CONFIG_RTE_LIBRTE_IP_FRAG_MAX_FRAG=4
+CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n

 #
 # Compile librte_meter
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5b896c3..5ee10c3 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -337,6 +337,7 @@ CONFIG_RTE_LIBRTE_NET=y
 CONFIG_RTE_LIBRTE_IP_FRAG=y
 CONFIG_RTE_LIBRTE_IP_FRAG_DEBUG=n
 CONFIG_RTE_LIBRTE_IP_FRAG_MAX_FRAG=4
+CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n

 #
 # Compile librte_meter
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 625d21f..7311b29 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -310,11 +310,11 @@ struct rte_lpm6_config lpm6_config = {
 static struct rte_lpm *socket_lpm[RTE_MAX_NUMA_NODES];
 static struct rte_lpm6 *socket_lpm6[RTE_MAX_NUMA_NODES];

-#ifdef IPV6_FRAG_TBL_STAT
+#ifdef RTE_LIBRTE_IP_FRAG_TBL_STAT
 #define TX_LCORE_STAT_UPDATE(s, f, v)   ((s)->f += (v))
 #else
 #define TX_LCORE_STAT_UPDATE(s, f, v)   do {} while (0)
-#endif /* IPV6_FRAG_TBL_STAT */
+#endif /* RTE_LIBRTE_IP_FRAG_TBL_STAT */

 /*
  * If number of queued packets reached given threahold, then
diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 582a52b..84952a1 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -97,7 +97,7 @@ struct rte_ip_frag_death_row {
 TAILQ_HEAD(ip_pkt_list, ip_frag_pkt); /**< @internal fragments tailq */

 /** fragmentation table statistics */
-struct rte_ip_frag_tbl_stat {
+struct ip_frag_tbl_stat {
uint64_t find_num;  /**< total # of find/insert attempts. */
uint64_t add_num;   /**< # of add ops. */
uint64_t del_num;   /**< # of del ops. */
@@ -117,7 +117,7 @@ struct rte_ip_frag_tbl {
uint32_t nb_buckets;  /**< num of associativity lines. 
*/
struct ip_frag_pkt *last; /**< last used entry. */
struct ip_pkt_list lru;   /**< LRU list for table entries. */
-   struct rte_ip_frag_tbl_stat stat; /**< statistics counters. */
+   struct ip_frag_tbl_stat stat; /**< statistics counters. */
struct ip_frag_pkt pkt[0];/**< hash table. */
 };

-- 
1.8.1.4

[dpdk-dev] [PATCH 06/10] ip_frag: replace memmove with custom copying

2014-06-18 Thread Anatoly Burakov

some implementations of memmove may make a copy of src before writing to
dst. we avoid that by explicitly writing from src to dst backwards.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ipv6_reassembly.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c 
b/lib/librte_ip_frag/rte_ipv6_reassembly.c
index c622827..3f06960 100644
--- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
@@ -45,6 +45,16 @@
  *
  */

+static inline void
+ip_frag_memmove(char *dst, char *src, int len)
+{
+   int i;
+
+   /* go backwards to make sure we don't overwrite anything important */
+   for (i = len - 1; i >= 0; i--)
+   dst[i] = src[i];
+}
+
 /*
  * Reassemble fragments into one packet.
  */
@@ -115,7 +125,7 @@ ipv6_frag_reassemble(const struct ip_frag_pkt *fp)
frag_hdr = (struct ipv6_extension_fragment *) (ip_hdr + 1);
ip_hdr->proto = frag_hdr->next_header;

-   memmove(rte_pktmbuf_mtod(m, char*) + sizeof(*frag_hdr),
+   ip_frag_memmove(rte_pktmbuf_mtod(m, char*) + sizeof(*frag_hdr),
rte_pktmbuf_mtod(m, char*), move_len);

rte_pktmbuf_adj(m, sizeof(*frag_hdr));
-- 
1.8.1.4

[dpdk-dev] [PATCH 08/10] ip_fragmentation: small fixes

2014-06-18 Thread Anatoly Burakov

Adding check for non-existent ports in portmask.

Also, making everything NUMA-related depend on lcore sockets, not device
sockets. This is because the init_mem() function allocates all data
structures based on NUMA nodes of the lcores in the coremask. Therefore,
when no cores are on socket 0, but there are devices on socket 0, it may
lead to segmentation faults.

Signed-off-by: Anatoly Burakov 
---
 examples/ip_fragmentation/main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 02e40a1..3172ad5 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -886,6 +886,10 @@ MAIN(int argc, char **argv)
if (init_mem() < 0)
rte_panic("Cannot initialize memory structures!\n");

+   /* check if portmask has non-existent ports */
+   if (enabled_port_mask & ~(RTE_LEN2MASK(nb_ports, unsigned)))
+   rte_exit(EXIT_FAILURE, "Non-existent ports in portmask!\n");
+
/* initialize all ports */
for (portid = 0; portid < nb_ports; portid++) {
/* skip ports that are not enabled */
@@ -907,7 +911,7 @@ MAIN(int argc, char **argv)
qconf = &lcore_queue_conf[rx_lcore_id];
}

-   socket = rte_eth_dev_socket_id(portid);
+   socket = (int) rte_lcore_to_socket_id(rx_lcore_id);
if (socket == SOCKET_ID_ANY)
socket = 0;

-- 
1.8.1.4

[dpdk-dev] [PATCH 10/10] rte_ip_frag: API header file fix

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ip_frag.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 84952a1..e0936dc 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -36,9 +36,9 @@

 /**
  * @file
- * RTE IPv4 Fragmentation and Reassembly
+ * RTE IP Fragmentation and Reassembly
  *
- * Implementation of IPv4 packet fragmentation and reassembly.
+ * Implementation of IP packet fragmentation and reassembly.
  */

 #include 
-- 
1.8.1.4

[dpdk-dev] [PATCH 02/10] ip_frag: fix debug macros

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/rte_ipv4_reassembly.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c 
b/lib/librte_ip_frag/rte_ipv4_reassembly.c
index cbac413..c14c677 100644
--- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
@@ -145,7 +145,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
"max_entries: %u, use_entries: %u\n\n",
__func__, __LINE__,
-   mb, tms, key.src_dst, key.id, ip_ofs, ip_len, ip_flag,
+   mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);

@@ -161,7 +161,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
", total_size: %u, frag_size: %u, last_idx: %u\n\n",
__func__, __LINE__,
tbl, tbl->max_entries, tbl->use_entries,
-   fp, fp->key.src_dst, fp->key.id, fp->start,
+   fp, fp->key.src_dst[0], fp->key.id, fp->start,
fp->total_size, fp->frag_size, fp->last_idx);


@@ -176,7 +176,7 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
", total_size: %u, frag_size: %u, last_idx: %u\n\n",
__func__, __LINE__, mb,
tbl, tbl->max_entries, tbl->use_entries,
-   fp, fp->key.src_dst, fp->key.id, fp->start,
+   fp, fp->key.src_dst[0], fp->key.id, fp->start,
fp->total_size, fp->frag_size, fp->last_idx);

return (mb);
-- 
1.8.1.4

[dpdk-dev] [PATCH 07/10] ip_frag: fix order of arguments to key compare function

2014-06-18 Thread Anatoly Burakov

when using key compare function, it uses key length of the first
argument to determine how long should be the keys that are compared.
however, currently we are passing a key from the fragmentation table as
first argument. the problem with this is that this key is potentially
uninitialized (i.e. contains all zeroes, including key length). this
leads to a nasty bug of comparing only the key id's and not keys
themselves.

of course, a safer way would be to do RTE_MAX between key lengths, but
since this compare is done per-packet, every cycle counts, so we just
use the key whos length is guaranteed to be correct because it comes
from an actual packet.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_internal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index 6203740..a2c645b 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -346,7 +346,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
max_cycles = tbl->max_cycles;
assoc = tbl->bucket_entries;

-   if (tbl->last != NULL && ip_frag_key_cmp(&tbl->last->key, key) == 0)
+   if (tbl->last != NULL && ip_frag_key_cmp(key, &tbl->last->key) == 0)
return (tbl->last);

/* different hashing methods for IPv4 and IPv6 */
@@ -378,7 +378,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
p1, i, assoc,
IPv6_KEY_BYTES(p1[i].key.src_dst), p1[i].key.id, 
p1[i].start);

-   if (ip_frag_key_cmp(&p1[i].key, key) == 0)
+   if (ip_frag_key_cmp(key, &p1[i].key) == 0)
return (p1 + i);
else if (ip_frag_key_is_empty(&p1[i].key))
empty = (empty == NULL) ? (p1 + i) : empty;
@@ -404,7 +404,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
p2, i, assoc,
IPv6_KEY_BYTES(p2[i].key.src_dst), p2[i].key.id, 
p2[i].start);

-   if (ip_frag_key_cmp(&p2[i].key, key) == 0)
+   if (ip_frag_key_cmp(key, &p2[i].key) == 0)
return (p2 + i);
else if (ip_frag_key_is_empty(&p2[i].key))
empty = (empty == NULL) ?( p2 + i) : empty;
-- 
1.8.1.4

[dpdk-dev] [PATCH 00/10] rte_ip_frag: various fixes for lib and examples

2014-06-18 Thread Anatoly Burakov

This patchset fixes a few issues found during validation, and also
does a bunch of renames (so that internally-used data structures aren't
starting with rte_) and fixes a few typos.

Anatoly Burakov (10):
  ip_frag: rename RTE_IP_FRAG_ASSERT to IP_FRAG_ASSERT
  ip_frag: fix debug macros
  ip_frag: renaming rte_ip_frag_pkt to ip_frag_pkt
  ip_frag: fix stats macro, rename rte_ip_frag_tbl_stat structure
  ip_frag: small fix, replace hardcode with a macro
  ip_frag: replace memmove with custom copying
  ip_frag: fix order of arguments to key compare function
  ip_fragmentation: small fixes
  ip_reassembly: small fixes
  rte_ip_frag: API header file fix

 config/common_bsdapp|  1 +
 config/common_linuxapp  |  1 +
 examples/ip_fragmentation/main.c|  6 +-
 examples/ip_reassembly/main.c   | 16 ++--
 lib/librte_ip_frag/ip_frag_common.h | 22 +++---
 lib/librte_ip_frag/ip_frag_internal.c   | 28 ++--
 lib/librte_ip_frag/rte_ip_frag.h| 20 ++--
 lib/librte_ip_frag/rte_ipv4_fragmentation.c |  2 +-
 lib/librte_ip_frag/rte_ipv4_reassembly.c| 10 +-
 lib/librte_ip_frag/rte_ipv6_fragmentation.c |  2 +-
 lib/librte_ip_frag/rte_ipv6_reassembly.c| 16 +---
 11 files changed, 72 insertions(+), 52 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH 03/10] ip_frag: renaming rte_ip_frag_pkt to ip_frag_pkt

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_common.h  | 18 +-
 lib/librte_ip_frag/ip_frag_internal.c| 20 ++--
 lib/librte_ip_frag/rte_ip_frag.h | 12 ++--
 lib/librte_ip_frag/rte_ipv4_reassembly.c |  4 ++--
 lib/librte_ip_frag/rte_ipv6_reassembly.c |  4 ++--
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/librte_ip_frag/ip_frag_common.h 
b/lib/librte_ip_frag/ip_frag_common.h
index 5ad0a0b..9df8074 100644
--- a/lib/librte_ip_frag/ip_frag_common.h
+++ b/lib/librte_ip_frag/ip_frag_common.h
@@ -63,21 +63,21 @@ if (!(exp)) {   
\
"%08" PRIx64 "%08" PRIx64 "%08" PRIx64 "%08" PRIx64

 /* internal functions declarations */
-struct rte_mbuf * ip_frag_process(struct rte_ip_frag_pkt *fp,
+struct rte_mbuf * ip_frag_process(struct ip_frag_pkt *fp,
struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
uint16_t ofs, uint16_t len, uint16_t more_frags);

-struct rte_ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
+struct ip_frag_pkt * ip_frag_find(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
const struct ip_frag_key *key, uint64_t tms);

-struct rte_ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
+struct ip_frag_pkt * ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
const struct ip_frag_key *key, uint64_t tms,
-   struct rte_ip_frag_pkt **free, struct rte_ip_frag_pkt **stale);
+   struct ip_frag_pkt **free, struct ip_frag_pkt **stale);

 /* these functions need to be declared here as ip_frag_process relies on them 
*/
-struct rte_mbuf * ipv4_frag_reassemble(const struct rte_ip_frag_pkt *fp);
-struct rte_mbuf * ipv6_frag_reassemble(const struct rte_ip_frag_pkt *fp);
+struct rte_mbuf * ipv4_frag_reassemble(const struct ip_frag_pkt *fp);
+struct rte_mbuf * ipv6_frag_reassemble(const struct ip_frag_pkt *fp);



@@ -122,7 +122,7 @@ ip_frag_key_cmp(const struct ip_frag_key * k1, const struct 
ip_frag_key * k2)

 /* put fragment on death row */
 static inline void
-ip_frag_free(struct rte_ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
+ip_frag_free(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr)
 {
uint32_t i, k;

@@ -140,7 +140,7 @@ ip_frag_free(struct rte_ip_frag_pkt *fp, struct 
rte_ip_frag_death_row *dr)

 /* if key is empty, mark key as in use */
 static inline void
-ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  rte_ip_frag_pkt *fp)
+ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  ip_frag_pkt *fp)
 {
if (ip_frag_key_is_empty(&fp->key)) {
TAILQ_REMOVE(&tbl->lru, fp, lru);
@@ -150,7 +150,7 @@ ip_frag_inuse(struct rte_ip_frag_tbl *tbl, const struct  
rte_ip_frag_pkt *fp)

 /* reset the fragment */
 static inline void
-ip_frag_reset(struct rte_ip_frag_pkt *fp, uint64_t tms)
+ip_frag_reset(struct ip_frag_pkt *fp, uint64_t tms)
 {
static const struct ip_frag zero_frag = {
.ofs = 0,
diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index cfcab1b..219221f 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -54,7 +54,7 @@
 /* local frag table helper functions */
 static inline void
 ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *dr,
-   struct rte_ip_frag_pkt *fp)
+   struct ip_frag_pkt *fp)
 {
ip_frag_free(fp, dr);
ip_frag_key_invalidate(&fp->key);
@@ -64,7 +64,7 @@ ip_frag_tbl_del(struct rte_ip_frag_tbl *tbl, struct 
rte_ip_frag_death_row *dr,
 }

 static inline void
-ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct rte_ip_frag_pkt *fp,
+ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct ip_frag_pkt *fp,
const struct ip_frag_key *key, uint64_t tms)
 {
fp->key = key[0];
@@ -76,7 +76,7 @@ ip_frag_tbl_add(struct rte_ip_frag_tbl *tbl,  struct 
rte_ip_frag_pkt *fp,

 static inline void
 ip_frag_tbl_reuse(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row 
*dr,
-   struct rte_ip_frag_pkt *fp, uint64_t tms)
+   struct ip_frag_pkt *fp, uint64_t tms)
 {
ip_frag_free(fp, dr);
ip_frag_reset(fp, tms);
@@ -137,7 +137,7 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, 
uint32_t *v2)
 }

 struct rte_mbuf *
-ip_frag_process(struct rte_ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
+ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
 {
uint32_t idx;
@@ -268,11 +268,11 @@ ip_frag_process(struct rte_ip_frag_pkt *fp, struct 
rte_ip_frag_death_row *dr,
  * If such entry is not present, then allocate a new one.
  * If the entry is stale, then free and reuse it.
  */
-struct rte_ip_frag_pkt *
+struct ip_frag_pkt *
 ip_frag_find(struct rte_ip_frag_tbl *tbl, struct rte_ip_frag_death_row *d

[dpdk-dev] [PATCH 05/10] ip_frag: small fix, replace hardcode with a macro

2014-06-18 Thread Anatoly Burakov


Signed-off-by: Anatoly Burakov 
---
 lib/librte_ip_frag/ip_frag_internal.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ip_frag/ip_frag_internal.c 
b/lib/librte_ip_frag/ip_frag_internal.c
index 219221f..6203740 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -350,7 +350,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
return (tbl->last);

/* different hashing methods for IPv4 and IPv6 */
-   if (key->key_len == 1)
+   if (key->key_len == IPV4_KEYLEN)
ipv4_frag_hash(key, &sig1, &sig2);
else
ipv6_frag_hash(key, &sig1, &sig2);
-- 
1.8.1.4

[dpdk-dev] [PATCH 09/10] ip_reassembly: small fixes

2014-06-18 Thread Anatoly Burakov

Adding check for non-existent ports in portmask.

Also, making everything NUMA-related depend on lcore sockets, not device
sockets. This is because the init_mem() function allocates all data
structures based on NUMA nodes of the lcores in the coremask. Therefore,
when no cores are on socket 0, but there are devices on socket 0, it may
lead to segmentation faults.

Also, making ip_reassembly eat up a bit less memory.

Signed-off-by: Anatoly Burakov 
---
 examples/ip_reassembly/main.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 7311b29..1b60e41 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -942,15 +942,15 @@ setup_queue_tbl(struct rx_queue *rxq, uint32_t lcore, 
uint32_t queue)
}

/*
-* At any given moment up to 
+* At any given moment up to 
 * mbufs could be stored int the fragment table.
 * Plus, each TX queue can hold up to  packets.
 */

-   nb_mbuf = 2 * RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) * MAX_FRAG_NUM;
+   nb_mbuf = RTE_MAX(max_flow_num, 2UL * MAX_PKT_BURST) * MAX_FRAG_NUM;
nb_mbuf *= (port_conf.rxmode.max_rx_pkt_len + BUF_SIZE - 1) / BUF_SIZE;
-   nb_mbuf += RTE_TEST_RX_DESC_DEFAULT + RTE_TEST_TX_DESC_DEFAULT;
nb_mbuf *= 2; /* ipv4 and ipv6 */
+   nb_mbuf += RTE_TEST_RX_DESC_DEFAULT + RTE_TEST_TX_DESC_DEFAULT;

nb_mbuf = RTE_MAX(nb_mbuf, (uint32_t)NB_MBUF);

@@ -1093,6 +1093,10 @@ MAIN(int argc, char **argv)
if (init_mem() < 0)
rte_panic("Cannot initialize memory structures!\n");

+   /* check if portmask has non-existent ports */
+   if (enabled_port_mask & ~(RTE_LEN2MASK(nb_ports, unsigned)))
+   rte_exit(EXIT_FAILURE, "Non-existent ports in portmask!\n");
+
/* initialize all ports */
for (portid = 0; portid < nb_ports; portid++) {
/* skip ports that are not enabled */
@@ -1114,7 +1118,7 @@ MAIN(int argc, char **argv)
qconf = &lcore_queue_conf[rx_lcore_id];
}

-   socket = rte_eth_dev_socket_id(portid);
+   socket = rte_lcore_to_socket_id(portid);
if (socket == SOCKET_ID_ANY)
socket = 0;

-- 
1.8.1.4

[dpdk-dev] [PATCH v3 1/2] vfio: open VFIO container at startup rather than during init

2014-06-18 Thread Anatoly Burakov

Currently, VFIO only checks for being able to access the /dev/vfio
directory when initializing VFIO, deferring actual VFIO container
initialization to VFIO binding code. This doesn't bode well for when
VFIO container cannot be initialized for whatever reason, because
it results in unrecoverable error even if the user didn't set up
VFIO and didn't even want to use it in the first place.

This patch fixes this by moving container initialization into the
code that checks if VFIO is available at runtime. Therefore, any
issues with the container will be known at initialization stage and
VFIO will simply be turned off if container could not be set up.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 4de6061..9eb5dcd 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -523,17 +523,6 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
rte_snprintf(pci_addr, sizeof(pci_addr), PCI_PRI_FMT,
loc->domain, loc->bus, loc->devid, loc->function);

-   /* get container fd (needs to be done only once per initialization) */
-   if (vfio_cfg.vfio_container_fd == -1) {
-   int vfio_container_fd = pci_vfio_get_container_fd();
-   if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  %s cannot open VFIO container!\n", 
pci_addr);
-   return -1;
-   }
-
-   vfio_cfg.vfio_container_fd = vfio_container_fd;
-   }
-
/* get group number */
iommu_group_no = pci_vfio_get_group_no(pci_addr);

@@ -770,10 +759,10 @@ pci_vfio_enable(void)
vfio_cfg.vfio_groups[i].fd = -1;
vfio_cfg.vfio_groups[i].group_no = -1;
}
-   vfio_cfg.vfio_container_fd = -1;
+   vfio_cfg.vfio_container_fd = pci_vfio_get_container_fd();

/* check if we have VFIO driver enabled */
-   if (access(VFIO_DIR, F_OK) == 0)
+   if (vfio_cfg.vfio_container_fd != -1)
vfio_cfg.vfio_enabled = 1;
else
RTE_LOG(INFO, EAL, "VFIO driver not loaded or wrong 
permissions\n");
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 2/2] vfio: more verbose error messages

2014-06-18 Thread Anatoly Burakov

also, making VFIO code distinguish between actual unexpected values
and ioctl() failures, providing appropriate error messages.

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 48 --
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 9eb5dcd..bf765b5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -180,7 +180,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
VFIO_TYPE1_IOMMU);
if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type!\n");
+   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
+   "error %i (%s)\n", errno, strerror(errno));
return -1;
}

@@ -201,7 +202,8 @@ pci_vfio_setup_dma_maps(int vfio_container_fd)
ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map);

if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up DMA remapping, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}
}
@@ -253,7 +255,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)

ret = ioctl(vfio_dev_fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
if (ret < 0) {
-   RTE_LOG(ERR, EAL, "  cannot get IRQ info!\n");
+   RTE_LOG(ERR, EAL, "  cannot get IRQ info, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -271,7 +274,8 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, int 
vfio_dev_fd)
/* set up an eventfd for interrupts */
fd = eventfd(0, 0);
if (fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot set up eventfd!\n");
+   RTE_LOG(ERR, EAL, "  cannot set up eventfd, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

@@ -313,22 +317,31 @@ pci_vfio_get_container_fd(void)
if (internal_config.process_type == RTE_PROC_PRIMARY) {
vfio_container_fd = open(VFIO_CONTAINER_PATH, O_RDWR);
if (vfio_container_fd < 0) {
-   RTE_LOG(ERR, EAL, "  cannot open VFIO container!\n");
+   RTE_LOG(ERR, EAL, "  cannot open VFIO container, "
+   "error %i (%s)\n", errno, 
strerror(errno));
return -1;
}

/* check VFIO API version */
ret = ioctl(vfio_container_fd, VFIO_GET_API_VERSION);
if (ret != VFIO_API_VERSION) {
-   RTE_LOG(ERR, EAL, "  unknown VFIO API version!\n");
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get VFIO API 
version, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported VFIO API 
version!\n");
close(vfio_container_fd);
return -1;
}

/* check if we support IOMMU type 1 */
ret = ioctl(vfio_container_fd, VFIO_CHECK_EXTENSION, 
VFIO_TYPE1_IOMMU);
-   if (!ret) {
-   RTE_LOG(ERR, EAL, "  unknown IOMMU driver!\n");
+   if (ret != 1) {
+   if (ret < 0)
+   RTE_LOG(ERR, EAL, "  could not get IOMMU type, "
+   "error %i (%s)\n", errno, 
strerror(errno));
+   else
+   RTE_LOG(ERR, EAL, "  unsupported IOMMU 
type!\n");
close(vfio_container_fd);
return -1;
}
@@ -564,7 +577,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
/* check if the group is viable */
ret = ioctl(vfio_group_fd, VFIO_GROUP_GET_STATUS, &group_status);
if (ret) {
-   RTE_LOG(ERR, EAL, "  %s cannot get group status!\n", pci_addr);
+   RTE_LOG(ERR, EAL, "  %s cannot get group status, "
+   "error %i (%s)\n", pci_addr, errno, 
strerror(errno));
close(vfio_group_fd);
clear_current_group();
return -1;
@@ -587,8 +601,8 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
ret = ioctl(vfio_group_fd, VFIO_GROUP_SET_CONTAINER,

[dpdk-dev] [PATCH v3 0/2] Fix issues with VFIO

2014-06-18 Thread Anatoly Burakov

This patchset fixes an issue with VFIO where DPDK initialization could
fail even if the user didn't want to use VFIO in the first place. Also,
more verbose and descriptive error messages were added to VFIO code, for
example distinguishing between a failed ioctl() call and an unsupported
VFIO API version.

Anatoly Burakov (2):
  vfio: open VFIO container at startup rather than during init
  vfio: more verbose error messages

 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 --
 1 file changed, 34 insertions(+), 29 deletions(-)

-- 
1.8.1.4

[dpdk-dev] [PATCH 1/2] Patch for Qemu wrapper for US-VHost to ensure Qemu process ends when VM is shutdown.

2014-06-18 Thread Claire Murphy


Signed-off-by: Claire Murphy 
---
 examples/vhost/libvirt/qemu-wrap.py |   31 +++
 1 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/examples/vhost/libvirt/qemu-wrap.py 
b/examples/vhost/libvirt/qemu-wrap.py
index e2d68a0..bfe668a 100755
--- a/examples/vhost/libvirt/qemu-wrap.py
+++ b/examples/vhost/libvirt/qemu-wrap.py
@@ -76,6 +76,7 @@
 #"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
 #"/dev/rtc", "/dev/hpet", "/dev/net/tun",
 #"/dev/-",
+#"/dev/hugepages"
 #]
 #
 #   4.b) Disable SELinux or set to permissive mode
@@ -161,6 +162,8 @@ hugetlbfs_dir = ""
 #

 import sys, os, subprocess
+import time
+import signal


 #List of open userspace vhost file descriptors
@@ -174,6 +177,18 @@ vhost_flags = [ "csum=off",
 "guest_ecn=off"
   ]

+#String of the path to the Qemu process pid
+qemu_pid = "/tmp/%d-qemu.pid" % os.getpid()
+
+#
+# Signal haldler to kill Qemu subprocess
+#
+def kill_qemu_process(signum, stack):
+pidfile = open(qemu_pid, 'r')
+pid = int(pidfile.read())
+os.killpg(pid, signal.SIGTERM)
+pidfile.close()
+

 #
 # Find the system hugefile mount point.
@@ -274,13 +289,13 @@ def main():
 emul_call = ''
 mem_prealloc_set = 0
 mem_path_set = 0
-num = 0;
+num = 0

 #parse the parameters
 while (num < num_cmd_args):
 arg = sys.argv[num]

-   #Check netdev +1 parameter for vhostfd
+#Check netdev +1 parameter for vhostfd
 if arg == '-netdev':
 num_vhost_devs = len(fd_list)
 new_args.append(arg)
@@ -333,7 +348,6 @@ def main():
 emul_call += mp
 emul_call += " "

-
 #add user options
 for opt in emul_opts_user:
 emul_call += opt
@@ -353,13 +367,22 @@ def main():
 emul_call+=str(arg)
 emul_call+= " "

+emul_call += "-pidfile %s " % qemu_pid
 #Call QEMU
-subprocess.call(emul_call, shell=True)
+process = subprocess.Popen(emul_call, shell=True, preexec_fn=os.setsid)
+
+for sig in [signal.SIGTERM, signal.SIGINT, signal.SIGHUP, signal.SIGQUIT]:
+signal.signal(sig, kill_qemu_process)

+process.wait()

 #Close usvhost files
 for fd in fd_list:
 os.close(fd)
+#Cleanup temporary files
+if os.access(qemu_pid, os.F_OK):
+os.remove(qemu_pid)
+


 if __name__ == "__main__":
-- 
1.7.0.7

[dpdk-dev] [PATCH 0/2] * SUBJECT HERE *

2014-06-18 Thread Claire Murphy

*** BLURB HERE ***

Claire Murphy (2):
  Patch for Qemu wrapper for US-VHost to ensure Qemu process ends when
VM is shutdown.
  Patch to allow live migration of a VM with US-VHost.

 examples/vhost/libvirt/qemu-wrap.py |   31 +++
 examples/vhost/vhost-net-cdev.c |   18 ++
 examples/vhost/virtio-net.c |8 +++-
 3 files changed, 52 insertions(+), 5 deletions(-)

[dpdk-dev] [PATCH 2/2] Patch to allow live migration of a VM with US-VHost.

2014-06-18 Thread Claire Murphy


Signed-off-by: Claire Murphy 
---
 examples/vhost/vhost-net-cdev.c |   18 ++
 examples/vhost/virtio-net.c |8 +++-
 2 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/examples/vhost/vhost-net-cdev.c b/examples/vhost/vhost-net-cdev.c
index ef42e88..e942df0 100644
--- a/examples/vhost/vhost-net-cdev.c
+++ b/examples/vhost/vhost-net-cdev.c
@@ -275,6 +275,24 @@ vhost_net_ioctl(fuse_req_t req, int cmd, void *arg,
VHOST_IOCTL_R(struct vhost_vring_file, file, 
ops->set_vring_call);
break;

+   case VHOST_SET_VRING_ERR:
+   RTE_LOG(ERR, CONFIG, "(%"PRIu64") IOCTL: 
VHOST_SET_VRING_ERR cmd=%d Un-Supported\n", ctx.fh,cmd);
+   result = -1;
+   fuse_reply_ioctl(req, result, NULL, 0);
+   break;
+
+   case VHOST_SET_LOG_BASE:
+   RTE_LOG(ERR, CONFIG, "(%"PRIu64") IOCTL: 
VHOST_SET_LOG_BASE cmd=%d Un-Supported\n", ctx.fh,cmd);
+   result = 0;
+   fuse_reply_ioctl(req, result, NULL, 0);
+   break;
+
+   case VHOST_SET_LOG_FD:
+   RTE_LOG(ERR, CONFIG, "(%"PRIu64") IOCTL: 
VHOST_SET_LOG_FD cmd=%d Un-Supported \n", ctx.fh,cmd);
+   result = -1;
+   fuse_reply_ioctl(req, result, NULL, 0);
+   break;
+
default:
RTE_LOG(ERR, VHOST_CONFIG, "(%"PRIu64") IOCTL: DOESN 
NOT EXIST\n", ctx.fh);
result = -1;
diff --git a/examples/vhost/virtio-net.c b/examples/vhost/virtio-net.c
index 9be959f..3cf650d 100644
--- a/examples/vhost/virtio-net.c
+++ b/examples/vhost/virtio-net.c
@@ -573,7 +573,13 @@ set_features(struct vhost_device_ctx ctx, uint64_t *pu)
dev = get_device(ctx);
if (dev == NULL)
return -1;
-   if (*pu & ~VHOST_FEATURES)
+
+   /* 
+* We mask the VHOST_F_LOG_ALL feature bit here as it is enabled by 
default
+* during migration in QEMU even if we have it disabled as a feature in 
+* userspace vhost.
+*/
+   if (*pu & ~(VHOST_FEATURES | (1ULL << VHOST_F_LOG_ALL)))
return -1;

/* Store the negotiated feature list for the device. */
-- 
1.7.0.7

[dpdk-dev] [PATCH v5 2/6] Support for unique interface naming of pmds

2014-06-18 Thread Declan Doherty

Adding support to rte_eth_dev_data structure to support unique
name identifier for ethdevs to support adding slave ethdevs
(specifically virtual devices which have no public unique
identifier) to a link bonding device. This changes the API
rte_eth_dev_allocate() to require a const char *name when
allocating a ethdev, which also verifies that the name is
unique and hasn?t been already used by an existed allocated
rte_eth_dev. Also contains updates to virtual pmd?s to now call
the API with a name parameter.

Signed-off-by: Declan Doherty 
---
 lib/librte_ether/rte_ethdev.c|   32 +++--
 lib/librte_ether/rte_ethdev.h|7 +-
 lib/librte_pmd_pcap/rte_eth_pcap.c   |   22 ++--
 lib/librte_pmd_ring/rte_eth_ring.c   |   32 +++--
 lib/librte_pmd_ring/rte_eth_ring.h   |3 +-
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c |2 +-
 6 files changed, 66 insertions(+), 32 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 7256841..d938603 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -65,6 +65,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -153,21 +154,40 @@ rte_eth_dev_data_alloc(void)
RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
 }

+static struct rte_eth_dev *
+rte_eth_dev_allocated(const char *name)
+{
+   unsigned i;
+
+   for (i = 0; i < nb_ports; i++) {
+   if (strcmp(rte_eth_devices[i].data->name, name) == 0)
+   return &rte_eth_devices[i];
+   }
+   return NULL;
+}
+
 struct rte_eth_dev *
-rte_eth_dev_allocate(void)
+rte_eth_dev_allocate(const char *name)
 {
struct rte_eth_dev *eth_dev;

if (nb_ports == RTE_MAX_ETHPORTS) {
-   PMD_DEBUG_TRACE("Reached maximum number of ethernet ports\n");
+   PMD_DEBUG_TRACE("Reached maximum number of Ethernet ports\n");
return NULL;
}

if (rte_eth_dev_data == NULL)
rte_eth_dev_data_alloc();

+   if (rte_eth_dev_allocated(name) != NULL) {
+   PMD_DEBUG_TRACE("Ethernet Device with name %s already 
allocated!\n");
+   return NULL;
+   }
+
eth_dev = &rte_eth_devices[nb_ports];
eth_dev->data = &rte_eth_dev_data[nb_ports];
+   rte_snprintf(eth_dev->data->name, sizeof(eth_dev->data->name),
+   "%s", name);
eth_dev->data->port_id = nb_ports++;
return eth_dev;
 }
@@ -178,11 +198,17 @@ rte_eth_dev_init(struct rte_pci_driver *pci_drv,
 {
struct eth_driver*eth_drv;
struct rte_eth_dev *eth_dev;
+   char ethdev_name[RTE_ETH_NAME_MAX_LEN];
+
int diag;

eth_drv = (struct eth_driver *)pci_drv;

-   eth_dev = rte_eth_dev_allocate();
+   /* Create unique Ethernet device name using PCI address */
+   rte_snprintf(ethdev_name, RTE_ETH_NAME_MAX_LEN, "%d:%d.%d",
+   pci_dev->addr.bus, pci_dev->addr.devid, 
pci_dev->addr.function);
+
+   eth_dev = rte_eth_dev_allocate(ethdev_name);
if (eth_dev == NULL)
return -ENOMEM;

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2406e45..50df654 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1497,6 +1497,8 @@ struct rte_eth_dev_sriov {
 };
 #define RTE_ETH_DEV_SRIOV(dev) ((dev)->data->sriov)

+#define RTE_ETH_NAME_MAX_LEN (32)
+
 /**
  * @internal
  * The data part, with no function pointers, associated with each ethernet 
device.
@@ -1505,6 +1507,8 @@ struct rte_eth_dev_sriov {
  * processes in a multi-process configuration.
  */
 struct rte_eth_dev_data {
+   char name[RTE_ETH_NAME_MAX_LEN]; /**< Unique identifier name */
+
void **rx_queues; /**< Array of pointers to RX queues. */
void **tx_queues; /**< Array of pointers to TX queues. */
uint16_t nb_rx_queues; /**< Number of RX queues. */
@@ -1560,10 +1564,11 @@ extern uint8_t rte_eth_dev_count(void);
  * Allocates a new ethdev slot for an ethernet device and returns the pointer
  * to that slot for the driver to use.
  *
+ * @param  nameUnique identifier name for each Ethernet device
  * @return
  *   - Slot in the rte_dev_devices array for a new device;
  */
-struct rte_eth_dev *rte_eth_dev_allocate(void);
+struct rte_eth_dev *rte_eth_dev_allocate(const char *name);

 struct eth_driver;
 /**
diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c 
b/lib/librte_pmd_pcap/rte_eth_pcap.c
index b3dbbda..12b7e0c 100644
--- a/lib/librte_pmd_pcap/rte_eth_pcap.c
+++ b/lib/librte_pmd_pcap/rte_eth_pcap.c
@@ -534,7 +534,7 @@ open_tx_iface(const char *key __rte_unused, const char 
*value, void *extra_args)


 static int
-rte_pmd_init_internals(const unsigned nb_rx_queues,
+rte_pmd_init_internals(const char *name, const unsigned nb_rx_

[dpdk-dev] [PATCH v5 1/6] Link Bonding Library (lib/librte_pmd_bond)

2014-06-18 Thread Declan Doherty

Initial release with support for
 Mode 0 - Round Robin
 Mode 1 - Active Backup
 Mode 2 - Balance -> Supports 3 transmit polices (layer 2, layer 2+3, layer 3+4)
 Mode 3 - Broadcast

Signed-off-by: Declan Doherty 
---
 config/common_bsdapp   |5 +
 config/common_linuxapp |5 +
 lib/Makefile   |1 +
 lib/librte_pmd_bond/Makefile   |   32 +
 lib/librte_pmd_bond/rte_eth_bond.c | 2148 
 lib/librte_pmd_bond/rte_eth_bond.h |  255 +
 mk/rte.app.mk  |5 +
 7 files changed, 2451 insertions(+), 0 deletions(-)
 create mode 100644 lib/librte_pmd_bond/Makefile
 create mode 100644 lib/librte_pmd_bond/rte_eth_bond.c
 create mode 100644 lib/librte_pmd_bond/rte_eth_bond.h

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 989e1da..214398b 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -206,6 +206,11 @@ CONFIG_RTE_PMD_RING_MAX_TX_RINGS=16
 CONFIG_RTE_LIBRTE_PMD_PCAP=y

 #
+# Compile link bonding pmd library
+#
+CONFIG_RTE_LIBRTE_PMD_BOND=y
+
+#
 # Do prefetch of packet data within PMD driver receive function
 #
 CONFIG_RTE_PMD_PACKET_PREFETCH=y
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 5b896c3..2bf90df 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -244,6 +244,11 @@ CONFIG_RTE_PMD_RING_MAX_TX_RINGS=16
 CONFIG_RTE_LIBRTE_PMD_PCAP=n

 #
+# Compile link bonding pmd library
+#
+CONFIG_RTE_LIBRTE_PMD_BOND=y
+
+#
 # Compile Xen PMD
 #
 CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
diff --git a/lib/Makefile b/lib/Makefile
index c58c0c9..88e875f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -49,6 +49,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio
 DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_HASH) += librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
diff --git a/lib/librte_pmd_bond/Makefile b/lib/librte_pmd_bond/Makefile
new file mode 100644
index 000..51f6159
--- /dev/null
+++ b/lib/librte_pmd_bond/Makefile
@@ -0,0 +1,32 @@
+# 
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_bond.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += rte_eth_bond.c
+
+
+#
+# Export include files
+#
+SYMLINK-y-include += rte_eth_bond.h
+
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += lib/librte_malloc
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += lib/librte_kvargs
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_bond/rte_eth_bond.c 
b/lib/librte_pmd_bond/rte_eth_bond.c
new file mode 100644
index 000..69e7bae
--- /dev/null
+++ b/lib/librte_pmd_bond/rte_eth_bond.c
@@ -0,0 +1,2148 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#includ

[dpdk-dev] [PATCH v5 6/6] Link Bonding Library doxygen additions

2014-06-18 Thread Declan Doherty


Signed-off-by: Declan Doherty 
---
 doc/doxy-api-index.md |1 +
 doc/doxy-api.conf |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/doc/doxy-api-index.md b/doc/doxy-api-index.md
index 7b26e98..ee3ad4f 100644
--- a/doc/doxy-api-index.md
+++ b/doc/doxy-api-index.md
@@ -36,6 +36,7 @@ API {#index}
 There are many libraries, so their headers may be grouped by topics:

 - **device**:
+  [bond]   (@ref rte_eth_bond.h),
   [ethdev] (@ref rte_ethdev.h),
   [devargs](@ref rte_devargs.h),
   [KNI](@ref rte_kni.h),
diff --git a/doc/doxy-api.conf b/doc/doxy-api.conf
index f380d9a..b15a340 100644
--- a/doc/doxy-api.conf
+++ b/doc/doxy-api.conf
@@ -30,6 +30,7 @@

 PROJECT_NAME= DPDK
 INPUT   = doc/doxy-api-index.md \
+  lib/librte_pmd_bond \
   lib/librte_eal/common/include \
   lib/librte_acl \
   lib/librte_distributor \
-- 
1.7.0.7

[dpdk-dev] [PATCH v5 4/6] Link bonding Unit Tests

2014-06-18 Thread Declan Doherty

Including:
 - code to generate packet bursts for testing rx and tx
   functionality of bonded device
 - virtual/stubbed out ethdev for use as slave ethdev in testing

Signed-off-by: Declan Doherty 
---
 app/test/Makefile |4 +-
 app/test/commands.c   |7 +
 app/test/packet_burst_generator.c |  287 +++
 app/test/packet_burst_generator.h |   78 +
 app/test/test.h   |1 +
 app/test/test_link_bonding.c  | 3958 +
 app/test/virtual_pmd.c|  574 ++
 app/test/virtual_pmd.h|   74 +
 8 files changed, 4982 insertions(+), 1 deletions(-)
 create mode 100644 app/test/packet_burst_generator.c
 create mode 100644 app/test/packet_burst_generator.h
 create mode 100644 app/test/test_link_bonding.c
 create mode 100644 app/test/virtual_pmd.c
 create mode 100644 app/test/virtual_pmd.h

diff --git a/app/test/Makefile b/app/test/Makefile
index 9c52460..643f1b9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -102,7 +102,9 @@ SRCS-$(CONFIG_RTE_APP_TEST) += test_ivshmem.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_distributor.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_distributor_perf.c
 SRCS-$(CONFIG_RTE_APP_TEST) += test_devargs.c
-
+SRCS-$(CONFIG_RTE_APP_TEST) += virtual_pmd.c
+SRCS-$(CONFIG_RTE_APP_TEST) += packet_burst_generator.c
+SRCS-$(CONFIG_RTE_APP_TEST) += test_link_bonding.c
 ifeq ($(CONFIG_RTE_APP_TEST),y)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += test_acl.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) += test_pmd_ring.c
diff --git a/app/test/commands.c b/app/test/commands.c
index c9dc085..5f23420 100644
--- a/app/test/commands.c
+++ b/app/test/commands.c
@@ -159,6 +159,10 @@ static void cmd_autotest_parsed(void *parsed_result,
ret = test_timer();
if (!strcmp(res->autotest, "timer_perf_autotest"))
ret = test_timer_perf();
+#ifdef RTE_LIBRTE_PMD_BOND
+   if (!strcmp(res->autotest, "link_bonding_autotest"))
+   ret = test_link_bonding();
+#endif
if (!strcmp(res->autotest, "mempool_autotest"))
ret = test_mempool();
if (!strcmp(res->autotest, "mempool_perf_autotest"))
@@ -227,6 +231,9 @@ cmdline_parse_token_string_t cmd_autotest_autotest =
"alarm_autotest#interrupt_autotest#"
"version_autotest#eal_fs_autotest#"
"cmdline_autotest#func_reentrancy_autotest#"
+#ifdef RTE_LIBRTE_PMD_BOND
+   "link_bonding_autotest#"
+#endif
"mempool_perf_autotest#hash_perf_autotest#"
"memcpy_perf_autotest#ring_perf_autotest#"
"red_autotest#meter_autotest#sched_autotest#"
diff --git a/app/test/packet_burst_generator.c 
b/app/test/packet_burst_generator.c
new file mode 100644
index 000..5d539f1
--- /dev/null
+++ b/app/test/packet_burst_generator.c
@@ -0,0 +1,287 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include "packet_burst_generator.h"
+
+#define UDP_SRC_PORT 1024
+#define UDP_DST_PORT 1024
+
+
+#define IP_DEFTTL  64   /* from RFC 1340. */
+#define IP_VERSION 0x40
+#define IP_HDRLEN  0x05 /* default IP header length == five 32-bits words. */
+#define IP_VHL_DEF (IP_VERSION | IP_HDRLEN)
+
+static void
+copy_buf_to

[dpdk-dev] [PATCH v5 0/6] Link Bonding Library

2014-06-18 Thread Declan Doherty

This patch contains the initial release of the Link Bonding PMD Library

Supporting bonding modes:
 0 - Round Robin
 1 - Active Backup
 2 - Balance (Supporting 3 transmission polices)
layer 2, layer 2+3, layer 3+4
 3 - Broadcast

Version 5 of patch set:
Contains changes to EAL code to allow initialisation of Bonded devices from 
application startup options. rte_eal_init now calls rte_eal_pci_probe 
between calling rte_eal_dev_init with PRE and POST PCI probe flags. This gets
around polluting the eal pci code with references to link bonding devices.
Also rte_eal_pci_probe can now be called multiple times and will not try to
re-initialize the driver if one already exists, this means that existing
applications which currently call rte_eal_pci_probe will not be affected
by this change


Patch Set Description:
 0001 - librte_pmd_bond + makefile changes
 0002 - librte_ether changes to support unique naming of pmds 
 0003 - librte_eal changes to support bonding device intialization
 0005 - link bonding unti test suite
 0005 - testpmd link bonding support changes
 0006 - doxygen additions


Declan Doherty (6):
  Link Bonding Library (lib/librte_pmd_bond)
  Support for unique interface naming of pmds
  EAL support for link bonding device initialization
  Link bonding Unit Tests
  testpmd link bonding additions
  Link Bonding Library doxygen additions

 app/test-pmd/cmdline.c  |  579 
 app/test-pmd/config.c   |4 +-
 app/test-pmd/parameters.c   |3 +
 app/test-pmd/testpmd.c  |   40 +-
 app/test-pmd/testpmd.h  |2 +
 app/test/Makefile   |4 +-
 app/test/commands.c |7 +
 app/test/packet_burst_generator.c   |  287 ++
 app/test/packet_burst_generator.h   |   78 +
 app/test/test.h |1 +
 app/test/test_link_bonding.c| 3958 +++
 app/test/virtual_pmd.c  |  574 
 app/test/virtual_pmd.h  |   74 +
 config/common_bsdapp|5 +
 config/common_linuxapp  |5 +
 doc/doxy-api-index.md   |1 +
 doc/doxy-api.conf   |1 +
 lib/Makefile|1 +
 lib/librte_eal/bsdapp/eal/eal.c |   10 +-
 lib/librte_eal/common/eal_common_dev.c  |   58 +-
 lib/librte_eal/common/eal_common_pci.c  |3 +
 lib/librte_eal/common/include/eal_private.h |7 -
 lib/librte_eal/common/include/rte_dev.h |   13 +-
 lib/librte_eal/linuxapp/eal/eal.c   |   11 +-
 lib/librte_ether/rte_ethdev.c   |   32 +-
 lib/librte_ether/rte_ethdev.h   |7 +-
 lib/librte_pmd_bond/Makefile|   32 +
 lib/librte_pmd_bond/rte_eth_bond.c  | 2148 +++
 lib/librte_pmd_bond/rte_eth_bond.h  |  255 ++
 lib/librte_pmd_pcap/rte_eth_pcap.c  |   22 +-
 lib/librte_pmd_ring/rte_eth_ring.c  |   32 +-
 lib/librte_pmd_ring/rte_eth_ring.h  |3 +-
 lib/librte_pmd_xenvirt/rte_eth_xenvirt.c|2 +-
 mk/rte.app.mk   |5 +
 34 files changed, 8193 insertions(+), 71 deletions(-)
 create mode 100644 app/test/packet_burst_generator.c
 create mode 100644 app/test/packet_burst_generator.h
 create mode 100644 app/test/test_link_bonding.c
 create mode 100644 app/test/virtual_pmd.c
 create mode 100644 app/test/virtual_pmd.h
 create mode 100644 lib/librte_pmd_bond/Makefile
 create mode 100644 lib/librte_pmd_bond/rte_eth_bond.c
 create mode 100644 lib/librte_pmd_bond/rte_eth_bond.h

[dpdk-dev] [PATCH v5 3/6] EAL support for link bonding device initialization

2014-06-18 Thread Declan Doherty

Updating functionality in EAL to support adding link bonding
devices via ?vdev option. Link bonding devices will be
initialized after all physical devices have been probed and
initialized.

Signed-off-by: Declan Doherty 
---
 lib/librte_eal/bsdapp/eal/eal.c |   10 -
 lib/librte_eal/common/eal_common_dev.c  |   58 ++
 lib/librte_eal/common/eal_common_pci.c  |3 +
 lib/librte_eal/common/include/eal_private.h |7 ---
 lib/librte_eal/common/include/rte_dev.h |   13 +-
 lib/librte_eal/linuxapp/eal/eal.c   |   11 +-
 6 files changed, 73 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index a1f014f..c53f63e 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -874,7 +874,7 @@ rte_eal_init(int argc, char **argv)

rte_eal_mcfg_complete();

-   if (rte_eal_dev_init() < 0)
+   if (rte_eal_dev_init(PMD_INIT_PRE_PCI_PROBE) < 0)
rte_panic("Cannot init pmd devices\n");

RTE_LCORE_FOREACH_SLAVE(i) {
@@ -906,6 +906,14 @@ rte_eal_init(int argc, char **argv)
rte_eal_mp_remote_launch(sync_func, NULL, SKIP_MASTER);
rte_eal_mp_wait_lcore();

+   /* Probe & Initialize PCI devices */
+   if (rte_eal_pci_probe())
+   rte_panic("Cannot probe PCI\n");
+
+   /* Initialize any outstanding devices */
+   if (rte_eal_dev_init(PMD_INIT_POST_PCI_PROBE) < 0)
+   rte_panic("Cannot init pmd devices\n");
+
return fctret;
 }

diff --git a/lib/librte_eal/common/eal_common_dev.c 
b/lib/librte_eal/common/eal_common_dev.c
index eae5656..8e80093 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -62,7 +62,7 @@ rte_eal_driver_unregister(struct rte_driver *driver)
 }

 int
-rte_eal_dev_init(void)
+rte_eal_dev_init(uint8_t init_pri)
 {
struct rte_devargs *devargs;
struct rte_driver *driver;
@@ -80,30 +80,52 @@ rte_eal_dev_init(void)
continue;

TAILQ_FOREACH(driver, &dev_driver_list, next) {
-   if (driver->type != PMD_VDEV)
-   continue;
+   /* RTE_DEVTYPE_VIRTUAL can only be a virtual or bonded 
device,
+* virtual devices are initialized pre PCI probing and 
bonded
+* device are post pci probing */
+   if ((driver->type == PMD_VDEV && init_pri ==
+   PMD_INIT_PRE_PCI_PROBE) ||
+   (driver->type == PMD_BDEV && init_pri ==
+   PMD_INIT_POST_PCI_PROBE)) {

-   /* search a driver prefix in virtual device name */
-   if (!strncmp(driver->name, devargs->virtual.drv_name,
-   strlen(driver->name))) {
-   driver->init(devargs->virtual.drv_name,
-   devargs->args);
-   break;
+   /* search a driver prefix in virtual device 
name */
+   if (!strncmp(driver->name, 
devargs->virtual.drv_name,
+   strlen(driver->name))) {
+   printf("init (%u) %s\n", init_pri, 
devargs->virtual.drv_name);
+   driver->init(devargs->virtual.drv_name,
+   devargs->args);
+   break;
+   }
}
}

-   if (driver == NULL) {
-   rte_panic("no driver found for %s\n",
- devargs->virtual.drv_name);
+   /* If initializing pre PCI probe, then we don't expect a bonded 
driver
+* to be found */
+   if (init_pri == PMD_INIT_PRE_PCI_PROBE &&
+   strncmp(RTE_PMD_BOND, devargs->virtual.drv_name,
+   strlen(RTE_PMD_BOND)) != 0) {
+   if (driver == NULL) {
+   rte_panic("no driver found for virtual device 
%s\n",
+   devargs->virtual.drv_name);
+   }
+   } else if (init_pri == PMD_INIT_POST_PCI_PROBE &&
+   strncmp(RTE_PMD_BOND, devargs->virtual.drv_name,
+   strlen(RTE_PMD_BOND)) == 0) {
+   if (driver == NULL) {
+   rte_panic("no driver found for bonded device 
%s\n",
+   devargs->virtual.drv_name);
+   }
}
}

-   /* Once the vdevs

[dpdk-dev] [PATCH v5 5/6] testpmd link bonding additions

2014-06-18 Thread Declan Doherty

 - Includes the ability to create new bonded devices.
 - Add /remove bonding slave devices.
 - Interogate bonded device stats/configuration
 - Change bonding modes and select balance transmit polices

Signed-off-by: Declan Doherty 
---
 app/test-pmd/cmdline.c|  579 +
 app/test-pmd/config.c |4 +-
 app/test-pmd/parameters.c |3 +
 app/test-pmd/testpmd.c|   40 +++-
 app/test-pmd/testpmd.h|2 +
 5 files changed, 619 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index e3e51fc..967c058 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -84,6 +84,9 @@
 #include 
 #include 
 #include 
+#ifdef RTE_LIBRTE_PMD_BOND
+#include 
+#endif

 #include "testpmd.h"

@@ -404,6 +407,31 @@ static void cmd_help_long_parsed(void *parsed_result,
"   Show the bypass configuration for a bypass enabled 
NIC"
" using the lowest port on the NIC.\n\n"
 #endif
+#ifdef RTE_LIBRTE_PMD_BOND
+   "create bonded device (mode) (socket)\n"
+   "   Create a new bonded device with specific 
bonding mode and socket.\n\n"
+
+   "add bonding slave (slave_id) (port_id)\n"
+   "   Add a slave device to a bonded device.\n\n"
+
+   "remove bonding slave (slave_id) (port_id)\n"
+   "   Remove a slave device from a bonded device.\n\n"
+
+   "set bonding mode (value) (port_id)\n"
+   "   Set the bonding mode on a bonded device.\n\n"
+
+   "set bonding primary (slave_id) (port_id)\n"
+   "   Set the primary slave for a bonded device.\n\n"
+
+   "show bonding config (port_id)\n"
+   "   Show the bonding config for port_id.\n\n"
+
+   "set bonding mac_addr (port_id) (address)\n"
+   "   Set the MAC address of a bonded device.\n\n"
+
+   "set bonding xmit_balance_policy (port_id) 
(l2|l23|l34)\n"
+   "   Set the transmit balance policy for bonded 
device running in balance mode.\n\n"
+#endif

, list_pkt_forwarding_modes()
);
@@ -3031,6 +3059,547 @@ cmdline_parse_inst_t cmd_show_bypass_config = {
 };
 #endif

+#ifdef RTE_LIBRTE_PMD_BOND
+/* *** SET BONDING MODE *** */
+struct cmd_set_bonding_mode_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t mode;
+   uint8_t value;
+   uint8_t port_id;
+};
+
+static void cmd_set_bonding_mode_parsed(void *parsed_result,
+   __attribute__((unused))  struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_set_bonding_mode_result *res = parsed_result;
+   portid_t port_id = res->port_id;
+
+   /* Set the bonding mode for the relevant port. */
+   if (0 != rte_eth_bond_mode_set(port_id, res->value))
+   printf("\t Failed to set bonding mode for port = %d.\n", 
port_id);
+}
+
+cmdline_parse_token_string_t cmd_setbonding_mode_set =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   set, "set");
+cmdline_parse_token_string_t cmd_setbonding_mode_bonding =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   bonding, "bonding");
+cmdline_parse_token_string_t cmd_setbonding_mode_mode =
+TOKEN_STRING_INITIALIZER(struct cmd_set_bonding_mode_result,
+   mode, "mode");
+cmdline_parse_token_num_t cmd_setbonding_mode_value =
+TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_mode_result,
+   value, UINT8);
+cmdline_parse_token_num_t cmd_setbonding_mode_port =
+TOKEN_NUM_INITIALIZER(struct cmd_set_bonding_mode_result,
+   port_id, UINT8);
+
+cmdline_parse_inst_t cmd_set_bonding_mode = {
+   .f = cmd_set_bonding_mode_parsed,
+   .help_str = "set bonding mode (mode_value) (port_id): Set the 
bonding mode for port_id",
+   .data = NULL,
+   .tokens = {
+   (void *) &cmd_setbonding_mode_set,
+   (void *) &cmd_setbonding_mode_bonding,
+   (void *) &cmd_setbonding_mode_mode,
+   (void *) &cmd_setbonding_mode_value,
+   (void *) &cmd_setbonding_mode_port,
+   NULL
+   }
+};
+
+/* *** SET BALANCE XMIT POLICY *** */
+struct cmd_set_bonding_balance_xmit_policy_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t bonding;
+   cmdline_fixed_string_t balance_xmit_policy;
+   uint8_t port_id;
+   cmdline_fixed_string_t policy;
+};
+
+static void cmd_set_bonding_balance_xmit_policy_parsed(void *parsed_result,
+

[dpdk-dev] [PATCH v5 0/6] Link Bonding Library

2014-06-18 Thread Neil Horman

On Wed, Jun 18, 2014 at 05:14:17PM +0100, Declan Doherty wrote:
> This patch contains the initial release of the Link Bonding PMD Library
> 
> Supporting bonding modes:
>  0 - Round Robin
>  1 - Active Backup
>  2 - Balance (Supporting 3 transmission polices)
>   layer 2, layer 2+3, layer 3+4
>  3 - Broadcast
> 
> Version 5 of patch set:
> Contains changes to EAL code to allow initialisation of Bonded devices from 
> application startup options. rte_eal_init now calls rte_eal_pci_probe 
> between calling rte_eal_dev_init with PRE and POST PCI probe flags. This gets
> around polluting the eal pci code with references to link bonding devices.
> Also rte_eal_pci_probe can now be called multiple times and will not try to
> re-initialize the driver if one already exists, this means that existing
> applications which currently call rte_eal_pci_probe will not be affected
> by this change
> 
> 
> Patch Set Description:
>  0001 - librte_pmd_bond + makefile changes
>  0002 - librte_ether changes to support unique naming of pmds 
>  0003 - librte_eal changes to support bonding device intialization
>  0005 - link bonding unti test suite
>  0005 - testpmd link bonding support changes
>  0006 - doxygen additions
> 
> 
> Declan Doherty (6):
>   Link Bonding Library (lib/librte_pmd_bond)
>   Support for unique interface naming of pmds
>   EAL support for link bonding device initialization
>   Link bonding Unit Tests
>   testpmd link bonding additions
>   Link Bonding Library doxygen additions
> 
>  app/test-pmd/cmdline.c  |  579 
>  app/test-pmd/config.c   |4 +-
>  app/test-pmd/parameters.c   |3 +
>  app/test-pmd/testpmd.c  |   40 +-
>  app/test-pmd/testpmd.h  |2 +
>  app/test/Makefile   |4 +-
>  app/test/commands.c |7 +
>  app/test/packet_burst_generator.c   |  287 ++
>  app/test/packet_burst_generator.h   |   78 +
>  app/test/test.h |1 +
>  app/test/test_link_bonding.c| 3958 
> +++
>  app/test/virtual_pmd.c  |  574 
>  app/test/virtual_pmd.h  |   74 +
>  config/common_bsdapp|5 +
>  config/common_linuxapp  |5 +
>  doc/doxy-api-index.md   |1 +
>  doc/doxy-api.conf   |1 +
>  lib/Makefile|1 +
>  lib/librte_eal/bsdapp/eal/eal.c |   10 +-
>  lib/librte_eal/common/eal_common_dev.c  |   58 +-
>  lib/librte_eal/common/eal_common_pci.c  |3 +
>  lib/librte_eal/common/include/eal_private.h |7 -
>  lib/librte_eal/common/include/rte_dev.h |   13 +-
>  lib/librte_eal/linuxapp/eal/eal.c   |   11 +-
>  lib/librte_ether/rte_ethdev.c   |   32 +-
>  lib/librte_ether/rte_ethdev.h   |7 +-
>  lib/librte_pmd_bond/Makefile|   32 +
>  lib/librte_pmd_bond/rte_eth_bond.c  | 2148 +++
>  lib/librte_pmd_bond/rte_eth_bond.h  |  255 ++
>  lib/librte_pmd_pcap/rte_eth_pcap.c  |   22 +-
>  lib/librte_pmd_ring/rte_eth_ring.c  |   32 +-
>  lib/librte_pmd_ring/rte_eth_ring.h  |3 +-
>  lib/librte_pmd_xenvirt/rte_eth_xenvirt.c|2 +-
>  mk/rte.app.mk   |5 +
>  34 files changed, 8193 insertions(+), 71 deletions(-)
>  create mode 100644 app/test/packet_burst_generator.c
>  create mode 100644 app/test/packet_burst_generator.h
>  create mode 100644 app/test/test_link_bonding.c
>  create mode 100644 app/test/virtual_pmd.c
>  create mode 100644 app/test/virtual_pmd.h
>  create mode 100644 lib/librte_pmd_bond/Makefile
>  create mode 100644 lib/librte_pmd_bond/rte_eth_bond.c
>  create mode 100644 lib/librte_pmd_bond/rte_eth_bond.h
> 
> 
For the series
Acked-by: Neil Horman 

Thanks for all the hard work!
Neil

[dpdk-dev] vfio detection

2014-06-18 Thread Richardson, Bruce

> -Original Message-
> From: Burakov, Anatoly
> Sent: Wednesday, June 18, 2014 4:01 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: RE: vfio detection
> 
> Hi Bruce,
> 
> > > > I have a number of NIC ports which were working correctly yesterday
> > > > and are bound correctly to the igb_uio driver - and I want to keep
> > > > using them through the igb_uio driver for now, not vfio. However,
> > > > whenever I run a dpdk application today, I find that the vfio kernel
> > > > module is getting loaded each time - even after I manually remove
> > > > it, and verify that it has been removed by checking lsmod. Is this
> > > > expected? If so, why are we loading the vfio driver when I just want to
> > continue using igb_uio which works fine?
> > >
> > > Can you elaborate a bit on what do you mean by "loading vfio driver"?
> > > Do you mean the vfio-pci kernel gets loaded by DPDK? I certainly
> > > didn't put in any code that would automatically load that driver, and
> > certainly not binding devices to it.
> >
> > The kernel module called just "vfio" is constantly getting reloaded, and 
> > there
> > is always a "/dev/vfio" directory, which triggers the vfio code handling 
> > every
> > time I run dpdk.
> 
> I can't reproduce this.
> 
> Please note that VFIO actually consists of three drivers (on an x86 system, 
> that
> is) - vfio (the core VFIO infrastructure such as containers), vfio_iommu_type1
> (support for x86-style IOMMU) and vfio-pci (the generic PCI driver). I have
> unloaded all three and ran dpdk_nic_bind and testpmd - it worked fine and no
> VFIO kernel drivers were loaded as a result.
> 
Ok, maybe a one-off. The proposed patches to dpdk to improve the detection of 
vfio fix the real issue for me anyway, so I'm no longer concerned.

[dpdk-dev] [PATCH] dpdk_nic_bind: unbind ports that were erroneously bound

2014-06-18 Thread Anatoly Burakov

When binding devices to a generic driver (i.e. one that doesn't have a
PCI ID table, some devices that are not bound to any other driver could
be bound even if no one has asked them to. hence, we check the list of
drivers again, and see if some of the previously-unbound devices were
erroneously bound. if such devices are found, they are unbound back.

Signed-off-by: Anatoly Burakov 
---
 tools/dpdk_nic_bind.py | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tools/dpdk_nic_bind.py b/tools/dpdk_nic_bind.py
index 42e845f..334bf47 100755
--- a/tools/dpdk_nic_bind.py
+++ b/tools/dpdk_nic_bind.py
@@ -383,10 +383,32 @@ def unbind_all(dev_list, force=False):

 def bind_all(dev_list, driver, force=False):
 """Unbind method, takes a list of device locations"""
+global devices
+
 dev_list = map(dev_id_from_dev_name, dev_list)
+
 for d in dev_list:
 bind_one(d, driver, force)

+# when binding devices to a generic driver (i.e. one that doesn't have a
+# PCI ID table), some devices that are not bound to any other driver could
+# be bound even if no one has asked them to. hence, we check the list of
+# drivers again, and see if some of the previously-unbound devices were
+# erroneously bound.
+for d in devices.keys():
+# skip devices that were already bound or that we know should be bound
+if "Driver_str" in devices[d] or d in dev_list:
+continue
+
+# update information about this device
+devices[d] = dict(devices[d].items() +
+  get_pci_device_details(d).items())
+
+# check if updated information indicates that the device was bound
+if "Driver_str" in devices[d]:
+unbind_one(d, force)
+
+
 def display_devices(title, dev_list, extra_params = None):
 '''Displays to the user the details of a list of devices given in 
"dev_list"
 The "extra_params" parameter, if given, should contain a string with
-- 
1.8.1.4

[dpdk-dev] [PATCH v3 0/2] Fix issues with VFIO

2014-06-18 Thread Richardson, Bruce

> -Original Message-
> From: Burakov, Anatoly
> Sent: Wednesday, June 18, 2014 8:07 AM
> To: dev at dpdk.org
> Cc: Richardson, Bruce; nhorman at tuxdriver.com
> Subject: [PATCH v3 0/2] Fix issues with VFIO
> 
> This patchset fixes an issue with VFIO where DPDK initialization could
> fail even if the user didn't want to use VFIO in the first place. Also,
> more verbose and descriptive error messages were added to VFIO code, for
> example distinguishing between a failed ioctl() call and an unsupported
> VFIO API version.
> 
> Anatoly Burakov (2):
>   vfio: open VFIO container at startup rather than during init
>   vfio: more verbose error messages
> 
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 
> --
>  1 file changed, 34 insertions(+), 29 deletions(-)
> 
> --

Both patches work fine for me. Thanks Anatoly and Neil!

Acked-by: Bruce Richardson

[dpdk-dev] [PATCH v3] cpu_layout.py: adjust output format to align

2014-06-18 Thread De Lara Guarch, Pablo

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shannon Zhao
> Sent: Wednesday, June 18, 2014 5:18 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v3] cpu_layout.py: adjust output format to align
> 
> Bug: when "core id" is greater than 9, the cpu_layout.py output doesn't align.
> 
> Socket 0Socket 1
> -   -
> Core 9  [4, 16] [10, 22]
> 
> Core 10 [5, 17] [11, 23]
> 
> Solution: adjust output format to align based on the maximum length of the
> "core id" and "processor"
> 
> Socket 0Socket 1
> 
> Core 9  [4, 16] [10, 22]
> 
> Core 10 [5, 17] [11, 23]
> 
> Signed-off-by: Shannon Zhao 

Acked-by: Pablo de Lara

[dpdk-dev] [PATCH v3 0/2] Fix issues with VFIO

2014-06-18 Thread Neil Horman

On Wed, Jun 18, 2014 at 04:07:15PM +0100, Anatoly Burakov wrote:
> This patchset fixes an issue with VFIO where DPDK initialization could
> fail even if the user didn't want to use VFIO in the first place. Also,
> more verbose and descriptive error messages were added to VFIO code, for
> example distinguishing between a failed ioctl() call and an unsupported
> VFIO API version.
> 
> Anatoly Burakov (2):
>   vfio: open VFIO container at startup rather than during init
>   vfio: more verbose error messages
> 
>  lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 63 
> --
>  1 file changed, 34 insertions(+), 29 deletions(-)
> 
> -- 
> 1.8.1.4
> 
> 

Series
Acked-by: Neil Horman 
Thanks Anatoly!
Neil

[dpdk-dev] [PATCH 0/4] minor fixes from BSD and clang testing

2014-06-18 Thread Bruce Richardson

When testing compilation on BSD systems and with the clang compiler a
number of minor issues were encountered. This patchset fixes some of these.

NOTE: compilation targets for using clang on BSD and Linux I'm planning to 
upstream post-1.7 release. These are just issues found when doing testing for
it.

Bruce Richardson (4):
  testpmd: fix commandline variable types
  test app: remove unused variable definition
  scripts: fix filtering of comments on bsd
  i40e: remove endian.h include

 app/test-pmd/cmdline.c  | 18 +-
 app/test/test_table_acl.c   |  2 --
 lib/librte_pmd_i40e/i40e_rxtx.c |  1 -
 scripts/gen-config-h.sh |  2 +-
 4 files changed, 10 insertions(+), 13 deletions(-)

-- 
1.9.3

[dpdk-dev] [PATCH 1/4] testpmd: fix commandline variable types

2014-06-18 Thread Bruce Richardson

A number of commandline entries in the testpmd commandline were actually
defined as being string type values when in fact they were being
initialized as integer types. Correct this by specifying them as integer
type values in the type definition.

Signed-off-by: Bruce Richardson 
---
 app/test-pmd/cmdline.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index e3e51fc..3298360 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -1367,7 +1367,7 @@ cmdline_parse_token_string_t cmd_config_rss_hash_key_port 
=
 cmdline_parse_token_string_t cmd_config_rss_hash_key_config =
TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key, config,
 "config");
-cmdline_parse_token_string_t cmd_config_rss_hash_key_port_id =
+cmdline_parse_token_num_t cmd_config_rss_hash_key_port_id =
TOKEN_NUM_INITIALIZER(struct cmd_config_rss_hash_key, port_id, UINT8);
 cmdline_parse_token_string_t cmd_config_rss_hash_key_rss_hash_key =
TOKEN_STRING_INITIALIZER(struct cmd_config_rss_hash_key,
@@ -5367,7 +5367,7 @@ cmdline_parse_token_string_t cmd_mirror_mask_set =
 cmdline_parse_token_string_t cmd_mirror_mask_port =
TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_mask_result,
port, "port");
-cmdline_parse_token_string_t cmd_mirror_mask_portid =
+cmdline_parse_token_num_t cmd_mirror_mask_portid =
TOKEN_NUM_INITIALIZER(struct cmd_set_mirror_mask_result,
port_id, UINT8);
 cmdline_parse_token_string_t cmd_mirror_mask_mirror =
@@ -5477,7 +5477,7 @@ cmdline_parse_token_string_t cmd_mirror_link_set =
 cmdline_parse_token_string_t cmd_mirror_link_port =
TOKEN_STRING_INITIALIZER(struct cmd_set_mirror_link_result,
port, "port");
-cmdline_parse_token_string_t cmd_mirror_link_portid =
+cmdline_parse_token_num_t cmd_mirror_link_portid =
TOKEN_NUM_INITIALIZER(struct cmd_set_mirror_link_result,
port_id, UINT8);
 cmdline_parse_token_string_t cmd_mirror_link_mirror =
@@ -5563,7 +5563,7 @@ cmdline_parse_token_string_t cmd_rm_mirror_rule_reset =
 cmdline_parse_token_string_t cmd_rm_mirror_rule_port =
TOKEN_STRING_INITIALIZER(struct cmd_rm_mirror_rule_result,
port, "port");
-cmdline_parse_token_string_t cmd_rm_mirror_rule_portid =
+cmdline_parse_token_num_t cmd_rm_mirror_rule_portid =
TOKEN_NUM_INITIALIZER(struct cmd_rm_mirror_rule_result,
port_id, UINT8);
 cmdline_parse_token_string_t cmd_rm_mirror_rule_mirror =
@@ -5872,7 +5872,7 @@ cmd_set_syn_filter_parsed(void *parsed_result,
printf("syn filter setting error: (%s)\n", strerror(-ret));

 }
-cmdline_parse_token_string_t cmd_syn_filter_portid =
+cmdline_parse_token_num_t cmd_syn_filter_portid =
TOKEN_NUM_INITIALIZER(struct cmd_set_syn_filter_result,
port_id, UINT8);
 cmdline_parse_token_string_t cmd_syn_filter_priority =
@@ -5990,7 +5990,7 @@ cmdline_parse_token_num_t cmd_2tuple_filter_port_id =
 cmdline_parse_token_string_t cmd_2tuple_filter_protocol =
TOKEN_STRING_INITIALIZER(struct cmd_2tuple_filter_result,
 protocol, "protocol");
-cmdline_parse_token_string_t cmd_2tuple_filter_protocol_value =
+cmdline_parse_token_num_t cmd_2tuple_filter_protocol_value =
TOKEN_NUM_INITIALIZER(struct cmd_2tuple_filter_result,
 protocol_value, UINT8);
 cmdline_parse_token_num_t cmd_2tuple_filter_protocol_mask =
@@ -6008,7 +6008,7 @@ cmdline_parse_token_num_t cmd_2tuple_filter_dst_port_mask 
=
 cmdline_parse_token_string_t cmd_2tuple_filter_flags =
TOKEN_STRING_INITIALIZER(struct cmd_2tuple_filter_result,
flags, "flags");
-cmdline_parse_token_string_t cmd_2tuple_filter_flags_value =
+cmdline_parse_token_num_t cmd_2tuple_filter_flags_value =
TOKEN_NUM_INITIALIZER(struct cmd_2tuple_filter_result,
flags_value, UINT8);
 cmdline_parse_token_string_t cmd_2tuple_filter_priority =
@@ -6202,7 +6202,7 @@ cmdline_parse_token_num_t 
cmd_5tuple_filter_src_port_value =
 cmdline_parse_token_string_t cmd_5tuple_filter_protocol =
TOKEN_STRING_INITIALIZER(struct cmd_5tuple_filter_result,
protocol, "protocol");
-cmdline_parse_token_string_t cmd_5tuple_filter_protocol_value =
+cmdline_parse_token_num_t cmd_5tuple_filter_protocol_value =
TOKEN_NUM_INITIALIZER(struct cmd_5tuple_filter_result,
protocol_value, UINT8);
 cmdline_parse_token_string_t cmd_5tuple_filter_mask =
@@ -6448,7 +6448,7 @@ cmdline_parse_token_num_t cmd_flex_filter_port_id =
 cmdline_parse_token_string_t cmd_flex_filter_len =
TOKEN_STRING_INITIALIZER(struct cmd_flex_filter_result,

[dpdk-dev] [PATCH 2/4] test app: remove unused variable definition

2014-06-18 Thread Bruce Richardson

Remove an unused variable definition in test_table_acl.c

Signed-off-by: Bruce Richardson 
---
 app/test/test_table_acl.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/app/test/test_table_acl.c b/app/test/test_table_acl.c
index afc234a..ad0e6f1 100644
--- a/app/test/test_table_acl.c
+++ b/app/test/test_table_acl.c
@@ -42,8 +42,6 @@
(((c) & 0xff) << 8) |   \
((d) & 0xff))

-static const char cb_port_delim[] = ":";
-
 /*
  * Rule and trace formats definitions.
  **/
-- 
1.9.3

[dpdk-dev] [PATCH 4/4] i40e: remove endian.h include

2014-06-18 Thread Bruce Richardson

endian.h is not needed for the compilation of i40e_rxtx.c and its
inclusion prevents building on FreeBSD systems.

Signed-off-by: Bruce Richardson 
---
 lib/librte_pmd_i40e/i40e_rxtx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_pmd_i40e/i40e_rxtx.c b/lib/librte_pmd_i40e/i40e_rxtx.c
index d802894..9fccbee 100644
--- a/lib/librte_pmd_i40e/i40e_rxtx.c
+++ b/lib/librte_pmd_i40e/i40e_rxtx.c
@@ -31,7 +31,6 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */

-#include 
 #include 
 #include 
 #include 
-- 
1.9.3

[dpdk-dev] [PATCH 3/4] scripts: fix filtering of comments on bsd

2014-06-18 Thread Bruce Richardson

On BSD 10, the cpp binary behaves a little differently and often leaves
lines starting with a space before the inital '#' character. This change
ensures those lines are filtered out properly.

Signed-off-by: Bruce Richardson 
---
 scripts/gen-config-h.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/gen-config-h.sh b/scripts/gen-config-h.sh
index 86b41ab..efd7667 100755
--- a/scripts/gen-config-h.sh
+++ b/scripts/gen-config-h.sh
@@ -34,7 +34,7 @@
 echo "#ifndef __RTE_CONFIG_H"
 echo "#define __RTE_CONFIG_H"
 grep CONFIG_ $1 \
-| grep -v '^#'  \
+| grep -v '^[ \t]*#'\
 | sed 's,CONFIG_\(.*\)=y.*$,#define \1 1,'  \
 | sed 's,CONFIG_\(.*\)=n.*$,#undef \1,' \
 | sed 's,CONFIG_\(.*\)=\(.*\)$,#define \1 \2,'  \
-- 
1.9.3

[dpdk-dev] [PATCH 0/3] * Upgrade NIC share codes *

2014-06-18 Thread Thomas Monjalon

Hi Jijiang,

2014-05-27 01:51, Liu, Jijiang:
> This is a batch update of the code in DPDK to align it with the latest
> versions of the common device driver code for Intel network devices, such
> as is used in the Linux and BSD drivers. Intel DPDK team get periodic
> updates from the networking division at Intel, and we apply those without
> modifications to the code in the ixgbe and e1000 subfolders of the DPDK
> codebase. Each code-drop provides us with the latest device support and bug
> fixes.
> 
> The code-drop was tested by the ND team ,and Intel DPDK team receive the
> code-drop without individual commit history, as mentioned earlier, this is
> a batch update. Furthermore, before sending patches to dpdk.org, in order
> to guarantee no new issues are introduced by upgrading, Intel DPDK team has
> done development and test of integration, and don't change any source codes
> in these drivers in the meantime. At present, the common device code should
> be treated as read-only, any bugs found here will be collected and reported
> to Intel.

It is now applied in master branch for version 1.7.0.

I've splitted this code drop in 31 atomic and understandable commits.
Please look at them carefully.
It is what we may expect to be sent for next upgrade of base drivers.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH]Upgrade NIC share codes: fix a compilation error when RTE_NIC_BYPASS=y

2014-06-18 Thread Thomas Monjalon

2014-05-29 16:47, Jijiang Liu:
> There is a compilation error using latest NIC share codes when
> RTE_NIC_BYPASS=y, the root cause is that the setup_link API have already
> changed in the share codes, so change
> ixgbe_setup_mac_link_multispeed_fixed_fiber() for eliminating the
> compilation error.
> 
> Signed-off-by: jijiangl 
> Tested-by: Waterman Cao 

It is merged in this commit:
http://dpdk.org/browse/dpdk/commit/?id=8ef32003772a
Thanks
-- 
Thomas

[dpdk-dev] [dpdk-announce] release candidate 1.7.0-rc1

2014-06-18 Thread Thomas Monjalon

The first release candidate of version 1.7.0 can be downloaded here:
http://dpdk.org/browse/dpdk/tag/?id=v1.7.0-rc1
Please test it.
The release 1.7.0 should be ready by end of June if things go well.

Some patches are pending for integration in next release candidate:
- link bonding
- IP fragmentation fixes
- KNI fixes
- BSD fixes
- socket id detection fallback
- new core set option
- malloc optimization

Others could be integrated in a "rebased-next" branch waiting 1.7.1 window:
- igb_uio enhancements
- virtio enhancements
- vmxnet3 enhancements
- tailq fully local

Thank you to everyone,
-- 
Thomas

79 matches

Mail list logo