[dpdk-dev] Inputs needed - testing l2fwd

2015-03-25 Thread Shankari Vaidyalingam
I'm trying to run sample DPDK  applications by injecting packets from an
external traffic generator.
I'm not able to get the frames to reach the RX queue of the corresponding
port and make the PMD detect those packets and get the application process
them.

My configuration is:

 I'm using Oracle VirtualBox to have 2 virtual network adapters for use in
the sample application. I have configured both the network adapters in the
"Bridged Networking" mode.
When I open the VM I can see the interfaces - eth0 and eth1
I bound the 2 interfaces to igb_uio driver and then configured the hugepage
mapping.
When I try sending the frames from an external source I had set the dest
MAC to the MAC addr of the ports bound to DPDK.
But still I'm not able to see either the "Frames received" or "Frames
dropped" count getting incremented.

I tried googling and tried various options for networking mode as
(Bridged/Host Only...) but none of them seem to work fine.

I'm sure that I'm missing something.
Can you please help me in this regard.

Regards
Shankari.V


[dpdk-dev] [PATCH] i40e: remove ALLOW_LB flag on SRIOV vsi

2015-03-25 Thread Xu, HuilongX
Tested-by:huilong xu 

 - Tested Commit: 0095bb6dd77a6b4570af27320187e63bf37500c6
 - OS: FC20 3.11.10-301.fc20.x86_64
 - GCC: gcc version 4.8.3 20140911 (Red Hat 4.8.3-7) (GCC)
 - CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
 - NIC: Ethernet controller [0200]: Intel Corporation Ethernet Controller XL710 
for 40GbE QSFP+ [8086:1584]

 - Default x86_64-native-linuxapp-gcc configuration
 - Total 6 cases, 6 passed, 0 failed

   1vf/1pf test environment set up:
1.build and install dpdk driver, bind igb_uio to PF
2.create 1vf in host
  echo 1 > ./devices/pci:80/:80:02.0/:83:00.0/max_vfs
3. dettach VF nic
  virsh nodedev-dettach pci__83_02_0
4. run testpmd in host
   ./testpmd -c f -n 4 -- -i --txqflags
5. exec cmd in testpmd
   a) vlan set strip off all
   b) rx_vlan add 1 0
6. start vm
   taskset -c 6-10 qemu-system-x86_64 \
   -enable-kvm -m 8192  -smp 2 -cpu host -name dpdk15-vm2 \
   -drive file=/home/image/vdisk02-sriov-fc20.img \
   -net tap,script=/etc/qemu-ifup \
   -device pci-assign,bus=pci.0,addr=0xb,host=83:02.0 \
   -mem-path /dev/hugepages -mem-prealloc \
   -vnc :12 -daemonize
7. in VM build and install dpdk driver ,bind igb_uio to VF
8. run testpmd in VM
   ./testpmd -c f -n 4 -- -i --txqflags
9. exec cmd lin in vm
set promisc all off
vlan set strip off all
rx_vlan add 1 0
set fwd io
start
 - Case 1:  send package dst mac is VF MAC, not include VLAN 
VF can received package and fwd , ixia can received package, this 
cas passed
 - Case 2:  send package dst mac isn't VF mac,not include VLAN
VF can received package and fwd , ixia can received package, this 
cas passed
 - Case 3:  send package dst mac is VF mac and include VLAN ID 1
VF can received package and fwd , ixia can received package and 
include VLAN ID 1, this cas passed

  2vf/1pf test environment set up:
1.build and install dpdk driver, bind igb_uio to PF
2.create 1vf in host
  echo 2 > ./devices/pci:80/:80:02.0/:83:00.0/max_vfs
3. dettach VF nic
  virsh nodedev-dettach pci__83_02_0
  virsh nodedev-dettach pci__83_02_1
4. run testpmd in host
   ./testpmd -c f -n 4 -- -i --txqflags
5. exec cmd in testpmd
   a) vlan set strip off all
   b) rx_vlan add 1 0
6. start vm
   taskset -c 6-10 qemu-system-x86_64 \
   -enable-kvm -m 8192  -smp 2 -cpu host -name dpdk15-vm2 \
   -drive file=/home/image/vdisk02-sriov-fc20.img \
   -net tap,script=/etc/qemu-ifup \
   -device pci-assign,bus=pci.0,addr=0xb,host=83:02.0 \
   -device pci-assign,bus=pci.0,addr=0xb,host=83:02.1 \
   -mem-path /dev/hugepages -mem-prealloc \
   -vnc :12 -daemonize
7. in VM build and install dpdk driver ,bind igb_uio to VF
8. run testpmd in VM
   ./testpmd -c f -n 4 -- -i --txqflags
9. exec cmd lin in vm
set promisc all off
vlan set strip off all
rx_vlan add 1 0
set fwd io
start
 - Case 1:  send package dst mac is VF MAC, not include VLAN 
VF can received package and fwd , ixia can received package, this 
cas passed
 - Case 2:  send package dst mac isn't VF mac,not include VLAN
VF can received package and fwd , ixia can received package, this 
cas passed
 - Case 3:  send package dst mac is VF mac and include VLAN ID 1
VF can received package and fwd , ixia can received package and 
include VLAN ID 1, this cas passed

-Original Message-
From: Wu, Jingjing 
Sent: Friday, March 20, 2015 3:32 PM
To: dev at dpdk.org
Cc: Wu, Jingjing; Xu, HuilongX; Zhang, Helin
Subject: [PATCH] i40e: remove ALLOW_LB flag on SRIOV vsi

Disable VEB switching by removing ALLOW_LB on SRIOV vsi.

If the source mac address of packet sent from VF is not listed in the
VEB's mac table, the VEB will switch the packet back to the VF.
It's a hardware issue. Enabling ALLOW_LB flag will block VF functions.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_i40e/i40e_ethdev.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index cf6685e..28ea5dc 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -3059,11 +3059,15 @@ i40e_vsi_setup(struct i40e_pf *pf,
ctxt.connection_type = 0x1;
ctxt.flags = I40E_AQ_VSI_TYPE_VF;

-   /* Configure switch ID */
-   ctxt.info.valid_sections |=
-   rte_cpu_to_le_16(I40E_AQ_VSI_PROP_SWITCH_VALID);
-   ctxt.info.switch_id =
-   rte_cpu_to_le

[dpdk-dev] [PATCH v2 2/6] eal: Close file descriptor of uio configuration

2015-03-25 Thread Tetsuya Mukawa
On 2015/03/25 3:33, Stephen Hemminger wrote:
> On Tue, 24 Mar 2015 13:18:33 +0900
> Tetsuya Mukawa  wrote:
>
>> When pci_uio_unmap_resource() is called, a file descriptor that is used
>> for uio configuration should be closed.
>>
>> Signed-off-by: Tetsuya Mukawa 
>> ---
>>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
>> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> index 9cdf24f..f0277be 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
>> @@ -459,8 +459,12 @@ pci_uio_unmap_resource(struct rte_pci_device *dev)
>>  
>>  /* close fd if in primary process */
>>  close(dev->intr_handle.fd);
>> -
>>  dev->intr_handle.fd = -1;
>> +
>> +/* close cfg_fd if in primary process */
>> +close(dev->intr_handle.uio_cfg_fd);
>> +dev->intr_handle.uio_cfg_fd = -1;
>> +
>>  dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
>>  }
>>  #endif /* RTE_LIBRTE_EAL_HOTPLUG */
>
> For the Qlogic/Broadcom driver it needed the config fd handle, and I added
> generic config space access functions.

Hi Stephen,

Is this the patch you mentioned?
http://dpdk.org/dev/patchwork/patch/3024/


Hi David, Bernard, Stephen

I guess here are works we will need to do.
1. Add close(dev->config_fd) in Stephen's patch.
2. Write a patch for uio to merge "dev->intr_handle->uio_cfg_fd" and
"dev->config_fd".
3. Write a patch for vfio to merge "dev->intr_handle->vfio_cfg_fd" and
"dev->config_fd".

If we already have these patches, I guess it may be nice to merge above
patches first.
Do you have a suggestion how to merge patches related with pci config fd?

Thanks,
Tetsuya



[dpdk-dev] [PATCH] scripts: enable extended tag of PCIe

2015-03-25 Thread Liu, Yong
Hi Helin,
This patch look fine for me. Just need add some descriptions about the extended 
tag. 
If this script work only on FVL device, maybe it should be renamed like 
"set_fvl_extended_tag". 

> -Original Message-
> From: Zhang, Helin
> Sent: Tuesday, March 24, 2015 9:08 AM
> To: Thomas Monjalon; Liu, Yong
> Cc: dev at dpdk.org; Butler, Siobhan A
> Subject: RE: [dpdk-dev] [PATCH] scripts: enable extended tag of PCIe
> 
> Hi Thomas
> 
> Zhida is our intern who has already been back to university. I think Yong
> might have reviewed it.
> It is good supplementation for setting extended tag on Linux, though not
> necessary. I am OK to have it merged or not. Thanks!
> 
> Marvin, could you help to ack it, as I know you have reviewed it?
> 
> Regards,
> Helin
> 
> > -Original Message-
> > From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > Sent: Monday, March 23, 2015 7:53 PM
> > To: Zang, Zhida
> > Cc: dev at dpdk.org; Butler, Siobhan A; Zhang, Helin
> > Subject: Re: [dpdk-dev] [PATCH] scripts: enable extended tag of PCIe
> >
> > Hi,
> >
> > This patch needs review and documentation.
> > It's going to be dropped if nobody cares.
> >
> > There were some previous discussions about it:
> > http://dpdk.org/ml/archives/dev/2015-February/012708.html
> >
> >
> > 2015-01-30 12:57, zhida zang:
> > > As 'extended tag' of PCIe needs to be enabled for i40e high
> > > performance, Linux command of 'setpci' can be used to check and set
> > > the corresponding bit of 'extended tag' of PCIe configuration space.
> > > The script is to check and set the right bit in PCIe configuration
> space to
> > enable 'extended tag'.
> > >
> > > Signed-off-by: Zhida Zang 
> > > ---
> > >  tools/set_pci.py | 124
> > > +++
> > >  1 file changed, 124 insertions(+)
> > >  create mode 100755 tools/set_pci.py
> > >
> > > diff --git a/tools/set_pci.py b/tools/set_pci.py new file mode 100755
> > > index 000..e242efb
> > > --- /dev/null
> > > +++ b/tools/set_pci.py
> > > @@ -0,0 +1,124 @@
> > > +#! /usr/bin/python
> > > +import sys
> > > +import os
> > > +import subprocess
> > > +import getopt
> > > +from os.path import basename
> > > +
> > > +# The register to check if extended tag is supported or not.
> > > +PCI_DEV_CAP_REG = 0xA4
> > > +# The control register which contains the bit to enable/disable
> 'extended
> > tag'.
> > > +PCI_DEV_CTRL_REG = 0xA8
> > > +# The mask of 'extended tag' in capability register.
> > > +PCI_DEV_CAP_EXT_TAG_MASK = 0x20
> > > +# The mask of 'extended tag' in control register.
> > > +PCI_DEV_CTRL_EXT_TAG_MASK = 0x100
> > > +
> > > +dev_ids = {}
> > > +flag = "Set"
> > > +
> > > +
> > > +def usage():
> > > +'''Print usage information for the program'''
> > > +argv0 = basename(sys.argv[0])
> > > +print """
> > > +Usage:
> > > +--
> > > +
> > > +%(argv0)s [options] DEVICE1 DEVICE2 
> > > +
> > > +where DEVICE1, DEVICE2 etc, are specified via PCI
> > > +"domain:bus:slot.func" syntax or "bus:slot.func" syntax. For devices
> > > +bound to Linux kernel drivers, they may also be referred to by Linux
> interface
> > name e.g. eth0, eth1, em0, em1, etc.
> > > +
> > > +Options:
> > > +--help, --usage:
> > > +Display usage information and quit
> > > +
> > > +-s --set:
> > > +Set the following pci device
> > > +
> > > +-u --Unset:
> > > +Unset the following pci device
> > > +
> > > +Examples:
> > > +-
> > > +To set pci 0a:00.0
> > > +%(argv0)s -s 0a:00.0
> > > +%(argv0)s --set 0a:00.0
> > > +
> > > +To unset :01:00.0
> > > +%(argv0)s -u :01:00.0
> > > +%(argv0)s --unset :01:00.0
> > > +
> > > +To set :02:00.0 and :02:00.1
> > > +%(argv0)s -s 02:00.0 02:00.1
> > > +
> > > +""" % locals()  # replace items from local variables
> > > +
> > > +
> > > +def parse_args():
> > > +global flag
> > > +global dev_ids
> > > +if len(sys.argv) <= 1:
> > > +usage()
> > > +sys.exit(0)
> > > +try:
> > > +opts, dev_ids = getopt.getopt(
> > > +sys.argv[1:],
> > > +"su",
> > > +["help", "usage", "set", "unset"]
> > > +)
> > > +except getopt.GetoptError, error:
> > > +print str(error)
> > > +print "Run '%s --usage' for further information" % sys.argv[0]
> > > +sys.exit(1)
> > > +
> > > +for opt, arg in opts:
> > > +if opt == "--help" or opt == "--usage":
> > > +usage()
> > > +sys.exit(0)
> > > +if opt == "-s" or opt == "--set":
> > > +flag = "Set"
> > > +if opt == "-u" or opt == "--unset":
> > > +flag = "Unset"
> > > +
> > > +
> > > +def check_output(args, stderr=None):
> > > +'''Run a command and capture its output'''
> > > +return subprocess.Popen(
> > > +args,
> > > +stdout=subprocess.PIPE,
> > > +stderr=stderr
> 

[dpdk-dev] Inputs needed - testing l2fwd

2015-03-25 Thread Shankari Vaidyalingam
Hi,

Can anyone please help me whether I'm missing something in the below
exercise

Regards
Shankari.V

On Wed, Mar 25, 2015 at 12:25 AM, Shankari Vaidyalingam <
shankari.v2k6 at gmail.com> wrote:

> I'm trying to run sample DPDK  applications by injecting packets from an
> external traffic generator.
> I'm not able to get the frames to reach the RX queue of the corresponding
> port and make the PMD detect those packets and get the application process
> them.
>
> My configuration is:
>
>  I'm using Oracle VirtualBox to have 2 virtual network adapters for use in
> the sample application. I have configured both the network adapters in the
> "Bridged Networking" mode.
> When I open the VM I can see the interfaces - eth0 and eth1
> I bound the 2 interfaces to igb_uio driver and then configured the
> hugepage mapping.
> When I try sending the frames from an external source I had set the dest
> MAC to the MAC addr of the ports bound to DPDK.
> But still I'm not able to see either the "Frames received" or "Frames
> dropped" count getting incremented.
>
> I tried googling and tried various options for networking mode as
> (Bridged/Host Only...) but none of them seem to work fine.
>
> I'm sure that I'm missing something.
> Can you please help me in this regard.
>
> Regards
> Shankari.V
>
>


[dpdk-dev] [PATCH] ixgbe: fix the issue second 5tuple filter overwrites the first one

2015-03-25 Thread Jingjing Wu
This patch corrects the index to fix the issue that is second 5tuple filter
overwrites the first one.

Signed-off-by: Jingjing Wu 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 92d75db..5caee22 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -3882,10 +3882,10 @@ ixgbe_add_5tuple_filter(struct rte_eth_dev *dev,
ftqf |= IXGBE_FTQF_POOL_MASK_EN;
ftqf |= IXGBE_FTQF_QUEUE_ENABLE;

-   IXGBE_WRITE_REG(hw, IXGBE_DAQF(idx), filter->filter_info.dst_ip);
-   IXGBE_WRITE_REG(hw, IXGBE_SAQF(idx), filter->filter_info.src_ip);
-   IXGBE_WRITE_REG(hw, IXGBE_SDPQF(idx), sdpqf);
-   IXGBE_WRITE_REG(hw, IXGBE_FTQF(idx), ftqf);
+   IXGBE_WRITE_REG(hw, IXGBE_DAQF(i), filter->filter_info.dst_ip);
+   IXGBE_WRITE_REG(hw, IXGBE_SAQF(i), filter->filter_info.src_ip);
+   IXGBE_WRITE_REG(hw, IXGBE_SDPQF(i), sdpqf);
+   IXGBE_WRITE_REG(hw, IXGBE_FTQF(i), ftqf);

l34timir |= IXGBE_L34T_IMIR_RESERVE;
l34timir |= (uint32_t)(filter->queue <<
-- 
1.9.3



[dpdk-dev] [PATCH] ixgbe: fix the issue second 5tuple filter overwrites the first one

2015-03-25 Thread Liu, Yong
> -Original Message-
> From: Wu, Jingjing
> Sent: Wednesday, March 25, 2015 12:47 PM
> To: dev at dpdk.org
> Cc: Wu, Jingjing; Liu, Yong
> Subject: [PATCH] ixgbe: fix the issue second 5tuple filter overwrites the
> first one
> 
> This patch corrects the index to fix the issue that is second 5tuple
> filter
> overwrites the first one.
> 
> Signed-off-by: Jingjing Wu 

Acked-by: Marvin Liu 
> ---
>  lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> index 92d75db..5caee22 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> +++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
> @@ -3882,10 +3882,10 @@ ixgbe_add_5tuple_filter(struct rte_eth_dev *dev,
>   ftqf |= IXGBE_FTQF_POOL_MASK_EN;
>   ftqf |= IXGBE_FTQF_QUEUE_ENABLE;
> 
> - IXGBE_WRITE_REG(hw, IXGBE_DAQF(idx), filter->filter_info.dst_ip);
> - IXGBE_WRITE_REG(hw, IXGBE_SAQF(idx), filter->filter_info.src_ip);
> - IXGBE_WRITE_REG(hw, IXGBE_SDPQF(idx), sdpqf);
> - IXGBE_WRITE_REG(hw, IXGBE_FTQF(idx), ftqf);
> + IXGBE_WRITE_REG(hw, IXGBE_DAQF(i), filter->filter_info.dst_ip);
> + IXGBE_WRITE_REG(hw, IXGBE_SAQF(i), filter->filter_info.src_ip);
> + IXGBE_WRITE_REG(hw, IXGBE_SDPQF(i), sdpqf);
> + IXGBE_WRITE_REG(hw, IXGBE_FTQF(i), ftqf);
> 
>   l34timir |= IXGBE_L34T_IMIR_RESERVE;
>   l34timir |= (uint32_t)(filter->queue <<
> --
> 1.9.3

Acked-by: Marvin Liu 


[dpdk-dev] [PULL REQUEST] i40e: removel of switch flag of ALLOW_LB from VF VSI

2015-03-25 Thread Helin Zhang
The following changes since commit 91a8743eb9bcbf83011bb8b6073cfc0ac11a8c85:

  eal: remove argument need of --create_uio_dev option (2015-03-23 17:34:23 
+0100)

are available in the git repository at:

  helin at dpdk.org:dpdk-i40e-next.git master

for you to fetch changes up to 91c2106115250ebf855477ee556035e832c4156b:

  i40e: remove ALLOW_LB flag on SRIOV vsi (2015-03-24 23:37:23 -0400)


Jingjing Wu (1):
  i40e: remove ALLOW_LB flag on SRIOV vsi

 lib/librte_pmd_i40e/i40e_ethdev.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)


[dpdk-dev] Inputs needed - testing l2fwd

2015-03-25 Thread Shankari Vaidyalingam
+dpdk-dev

> Thanks for the response.
>
> I'm using Intel 82545 EM as the network adapter type and not using
> virtio-net
> I also checked whether 82545 NIC is supported by DPDK and found that it is
> supported.
>
> Regards
> Shankari.V
>
> On Wed, Mar 25, 2015 at 10:38 AM, Stephen Hemminger <
> stephen at networkplumber.org> wrote:
>
>> On Wed, 25 Mar 2015 09:40:47 +0530
>> Shankari Vaidyalingam  wrote:
>>
>> > Hi,
>> >
>> > Can anyone please help me whether I'm missing something in the below
>> > exercise
>> >
>> > Regards
>> > Shankari.V
>> >
>> > On Wed, Mar 25, 2015 at 12:25 AM, Shankari Vaidyalingam <
>> > shankari.v2k6 at gmail.com> wrote:
>> >
>> > > I'm trying to run sample DPDK  applications by injecting packets from
>> an
>> > > external traffic generator.
>> > > I'm not able to get the frames to reach the RX queue of the
>> corresponding
>> > > port and make the PMD detect those packets and get the application
>> process
>> > > them.
>> > >
>> > > My configuration is:
>> > >
>> > >  I'm using Oracle VirtualBox to have 2 virtual network adapters for
>> use in
>> > > the sample application. I have configured both the network adapters
>> in the
>> > > "Bridged Networking" mode.
>> > > When I open the VM I can see the interfaces - eth0 and eth1
>> > > I bound the 2 interfaces to igb_uio driver and then configured the
>> > > hugepage mapping.
>> > > When I try sending the frames from an external source I had set the
>> dest
>> > > MAC to the MAC addr of the ports bound to DPDK.
>> > > But still I'm not able to see either the "Frames received" or "Frames
>> > > dropped" count getting incremented.
>> > >
>> > > I tried googling and tried various options for networking mode as
>> > > (Bridged/Host Only...) but none of them seem to work fine.
>> > >
>> > > I'm sure that I'm missing something.
>> > > Can you please help me in this regard.
>> > >
>> > > Regards
>> > > Shankari.V
>> > >
>> > >
>>
>> VirtualBox has its own version of virtio which is not compatiable
>> with the current version of DPDK virtio driver.
>>
>
>


[dpdk-dev] [PATCH] ixgbe: fix ixgbe PCI access endian issue

2015-03-25 Thread Zhang, Helin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> xuelin.shi at freescale.com
> Sent: Thursday, February 12, 2015 9:20 AM
> To: thomas.monjalon at 6wind.com
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] ixgbe: fix ixgbe PCI access endian issue
> 
> From: Xuelin Shi 
> 
> ixgbe is little endian, but cpu maybe not.
> add necessary conversions.
> rte_cpu_to_le_32(...) for PCI write
> rte_le_to_cpu_32(...) for PCI read.
> 
> Signed-off-by: Xuelin Shi 
Acked-by: Helin Zhang 

> ---
>  lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h
> b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h
> index 2d40bfd..f8bfb3f 100644
> --- a/lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h
> +++ b/lib/librte_pmd_ixgbe/ixgbe/ixgbe_osdep.h
> @@ -119,11 +119,11 @@ typedef int bool;
> 
>  static inline uint32_t ixgbe_read_addr(volatile void* addr)  {
> - return IXGBE_PCI_REG(addr);
> + return rte_le_to_cpu_32(IXGBE_PCI_REG(addr));
>  }
> 
>  #define IXGBE_PCI_REG_WRITE(reg, value) do { \
> - IXGBE_PCI_REG((reg)) = (value); \
> + IXGBE_PCI_REG((reg)) = (rte_cpu_to_le_32(value)); \
>  } while(0)
> 
>  #define IXGBE_PCI_REG_ADDR(hw, reg) \
> --
> 1.9.1



[dpdk-dev] [PATCH] e1000: fix e1000 PCI access endian issue.

2015-03-25 Thread Zhang, Helin


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> xuelin.shi at freescale.com
> Sent: Thursday, February 12, 2015 9:27 AM
> To: thomas.monjalon at 6wind.com
> Cc: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] e1000: fix e1000 PCI access endian issue.
> 
> From: Xuelin Shi 
> 
> e1000 is little endian, but cpu maybe not.
> add necessary conversions.
> 
> rte_cpu_to_le_32(...) for PCI write
> rte_le_to_cpu_32(...) for PCI read.
> 
> Signed-off-by: Xuelin Shi 
Acked-by: Helin Zhang 

> ---
>  lib/librte_pmd_e1000/e1000/e1000_osdep.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_pmd_e1000/e1000/e1000_osdep.h
> b/lib/librte_pmd_e1000/e1000/e1000_osdep.h
> index 438641e..d04ec73 100644
> --- a/lib/librte_pmd_e1000/e1000/e1000_osdep.h
> +++ b/lib/librte_pmd_e1000/e1000/e1000_osdep.h
> @@ -43,6 +43,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "../e1000_logs.h"
> 
> @@ -96,7 +97,7 @@ typedef int bool;
>  #define E1000_PCI_REG(reg) (*((volatile uint32_t *)(reg)))
> 
>  #define E1000_PCI_REG_WRITE(reg, value) do { \
> - E1000_PCI_REG((reg)) = (value); \
> + E1000_PCI_REG((reg)) = (rte_cpu_to_le_32(value)); \
>  } while (0)
> 
>  #define E1000_PCI_REG_ADDR(hw, reg) \
> @@ -107,7 +108,7 @@ typedef int   bool;
> 
>  static inline uint32_t e1000_read_addr(volatile void* addr)  {
> - return E1000_PCI_REG(addr);
> + return rte_le_to_cpu_32(E1000_PCI_REG(addr));
>  }
> 
>  /* Necessary defines */
> --
> 1.9.1



[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-25 Thread Dor Green
The printout:
PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 11, SFP+: 4
PMD: eth_ixgbe_dev_init(): port 0 vendorID=0x8086 deviceID=0x154d
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f80c0af0e40
hw_ring=0x7f811630ce00 dma_addr=0xf1630ce00
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
not satisfied, Scattered Rx is requested, or
RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f80c0af0900
hw_ring=0x7f811631ce80 dma_addr=0xf1631ce80
PMD: set_tx_function(): Using full-featured tx code path
PMD: set_tx_function():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]

Can't seem to get any example app to crash. Is there something I can
run on one port which will look at the actual data of the packets?

The mempool is (I think) set up normally:

pktmbuf_pool = rte_mempool_create("mbuf_pool", MBUFNB, MBUFSZ, 0,
  sizeof(struct rte_pktmbuf_pool_private),
  rte_pktmbuf_pool_init, NULL,
  rte_pktmbuf_init, NULL, NUMA_SOCKET, 0);


For good measure, here's the rest of the port setup (shortened, in
addition to what I showed below):

static struct rte_eth_rxconf const rxconf = {
.rx_thresh = {
.pthresh = 8,
.hthresh = 8,
.wthresh = 100,
},
.rx_free_thresh = 0,
.rx_drop_en = 0,
};

rte_eth_dev_configure(port, 1, 1, ðconf);
rte_eth_rx_queue_setup(port, 0, hwsize, NUMA_SOCKET, &rxconf, pktmbuf_pool);
rte_eth_dev_start(port);


On Tue, Mar 24, 2015 at 6:21 PM, Bruce Richardson
 wrote:
> On Tue, Mar 24, 2015 at 04:10:18PM +0200, Dor Green wrote:
>> 1 . The eth_conf is:
>>
>> static struct rte_eth_conf const ethconf = {
>> .link_speed = 0,
>> .link_duplex = 0,
>>
>> .rxmode = {
>> .mq_mode = ETH_MQ_RX_RSS,
>> .max_rx_pkt_len = ETHER_MAX_LEN,
>> .split_hdr_size = 0,
>> .header_split = 0,
>> .hw_ip_checksum = 0,
>> .hw_vlan_filter = 0,
>> .jumbo_frame = 0,
>> .hw_strip_crc = 0,   /**< CRC stripped by hardware */
>> },
>>
>> .txmode = {
>> },
>>
>> .rx_adv_conf = {
>> .rss_conf = {
>> .rss_key = NULL,
>> .rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
>> }
>> },
>>
>> .fdir_conf = {
>> .mode = RTE_FDIR_MODE_SIGNATURE,
>>
>> },
>>
>> .intr_conf = {
>> .lsc = 0,
>> },
>> };
>>
>> I've tried setting jumbo frames on with a larger packet length and
>> even turning off RSS/FDIR. No luck.
>>
>> I don't see anything relating to the port in the initial prints, what
>> are you looking for?
>
> I'm looking for the PMD initialization text, like that shown below (from 
> testpmd):
> Configuring Port 0 (socket 0)
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cd700 
> hw_ring=0x7f9ba0b00080 dma_addr=0x36d00080
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08cce80 
> hw_ring=0x7f9ba0b10080 dma_addr=0x36d10080
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
> size no less than 32.
> Port 0: 68:05:CA:04:51:3A
> Configuring Port 1 (socket 0)
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cab40 
> hw_ring=0x7f9ba0b20100 dma_addr=0x36d20100
> PMD: ixgbe_set_tx_function(): Using simple tx code path
> PMD: ixgbe_set_tx_function(): Vector tx enabled.
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08ca2c0 
> hw_ring=0x7f9ba0b30100 dma_addr=0x36d30100
> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
> size no less than 32.
> Port 1: 68:05:CA:04:51:38
>
> This tells us what RX and TX functions are going to be used for each port.
>
>>
>> 2. The packet is a normal, albeit somewhat large (1239 bytes) TCP data
>> packet (SSL certificate data, specifically).
>> One important thing of note that I've just realised is that it's not
>> this "packet of death" which causes the segmentation fault (i.e. has
>> an out-of-bounds address for its data), but the packet afterwards-- no
>> matter what packet it is.
>>
> Can this problem be reproduced using testpmd or any of the standard dpdk
> example apps, by sending in the same packet sequence?
>
> Is there anything unusual being done in the setup of the mempool used for the
> packet buffers?
>
> /Bruce
>
>>
>> On Tue, Mar 24, 2015 at 3:17 PM, Bruce Richardson
>>  wrote:
>> > On Tue, Mar 24, 2015 at 12:54:14PM +0200, Dor Green wrote:
>> >> I've managed to fix it so 1.8 works, and the segmentation fault still 
>> >> occurs.
>> >>
>> >> O

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-25 Thread Dor Green
After being able to see the codepath used in 1.8, I modified my
free_thresh and other flags so that Rx Burst Bulk alloc will be used.

This solved the problem (while also increasing performance). I'm not sure why.
This is good enough for me, but I'm willing to keep investigating if
it's of any interest to you.

On Wed, Mar 25, 2015 at 10:22 AM, Dor Green  wrote:
> The printout:
> PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 11, SFP+: 4
> PMD: eth_ixgbe_dev_init(): port 0 vendorID=0x8086 deviceID=0x154d
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f80c0af0e40
> hw_ring=0x7f811630ce00 dma_addr=0xf1630ce00
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
> not satisfied, Scattered Rx is requested, or
> RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f80c0af0900
> hw_ring=0x7f811631ce80 dma_addr=0xf1631ce80
> PMD: set_tx_function(): Using full-featured tx code path
> PMD: set_tx_function():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
> PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]
>
> Can't seem to get any example app to crash. Is there something I can
> run on one port which will look at the actual data of the packets?
>
> The mempool is (I think) set up normally:
>
> pktmbuf_pool = rte_mempool_create("mbuf_pool", MBUFNB, MBUFSZ, 0,
>   sizeof(struct rte_pktmbuf_pool_private),
>   rte_pktmbuf_pool_init, NULL,
>   rte_pktmbuf_init, NULL, NUMA_SOCKET, 0);
>
>
> For good measure, here's the rest of the port setup (shortened, in
> addition to what I showed below):
>
> static struct rte_eth_rxconf const rxconf = {
> .rx_thresh = {
> .pthresh = 8,
> .hthresh = 8,
> .wthresh = 100,
> },
> .rx_free_thresh = 0,
> .rx_drop_en = 0,
> };
>
> rte_eth_dev_configure(port, 1, 1, ðconf);
> rte_eth_rx_queue_setup(port, 0, hwsize, NUMA_SOCKET, &rxconf, pktmbuf_pool);
> rte_eth_dev_start(port);
>
>
> On Tue, Mar 24, 2015 at 6:21 PM, Bruce Richardson
>  wrote:
>> On Tue, Mar 24, 2015 at 04:10:18PM +0200, Dor Green wrote:
>>> 1 . The eth_conf is:
>>>
>>> static struct rte_eth_conf const ethconf = {
>>> .link_speed = 0,
>>> .link_duplex = 0,
>>>
>>> .rxmode = {
>>> .mq_mode = ETH_MQ_RX_RSS,
>>> .max_rx_pkt_len = ETHER_MAX_LEN,
>>> .split_hdr_size = 0,
>>> .header_split = 0,
>>> .hw_ip_checksum = 0,
>>> .hw_vlan_filter = 0,
>>> .jumbo_frame = 0,
>>> .hw_strip_crc = 0,   /**< CRC stripped by hardware */
>>> },
>>>
>>> .txmode = {
>>> },
>>>
>>> .rx_adv_conf = {
>>> .rss_conf = {
>>> .rss_key = NULL,
>>> .rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
>>> }
>>> },
>>>
>>> .fdir_conf = {
>>> .mode = RTE_FDIR_MODE_SIGNATURE,
>>>
>>> },
>>>
>>> .intr_conf = {
>>> .lsc = 0,
>>> },
>>> };
>>>
>>> I've tried setting jumbo frames on with a larger packet length and
>>> even turning off RSS/FDIR. No luck.
>>>
>>> I don't see anything relating to the port in the initial prints, what
>>> are you looking for?
>>
>> I'm looking for the PMD initialization text, like that shown below (from 
>> testpmd):
>> Configuring Port 0 (socket 0)
>> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cd700 
>> hw_ring=0x7f9ba0b00080 dma_addr=0x36d00080
>> PMD: ixgbe_set_tx_function(): Using simple tx code path
>> PMD: ixgbe_set_tx_function(): Vector tx enabled.
>> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08cce80 
>> hw_ring=0x7f9ba0b10080 dma_addr=0x36d10080
>> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
>> size no less than 32.
>> Port 0: 68:05:CA:04:51:3A
>> Configuring Port 1 (socket 0)
>> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cab40 
>> hw_ring=0x7f9ba0b20100 dma_addr=0x36d20100
>> PMD: ixgbe_set_tx_function(): Using simple tx code path
>> PMD: ixgbe_set_tx_function(): Vector tx enabled.
>> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08ca2c0 
>> hw_ring=0x7f9ba0b30100 dma_addr=0x36d30100
>> PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
>> size no less than 32.
>> Port 1: 68:05:CA:04:51:38
>>
>> This tells us what RX and TX functions are going to be used for each port.
>>
>>>
>>> 2. The packet is a normal, albeit somewhat large (1239 bytes) TCP data
>>> packet (SSL certificate data, specifically).
>>> One important thing of note that I've just realised is that it's not
>>> this "packet of death" which causes the segmentation fault (i.e. has
>>> an out-of-bounds address for its data), but the packe

[dpdk-dev] Packet data out of bounds after rte_eth_rx_burst

2015-03-25 Thread Bruce Richardson
On Wed, Mar 25, 2015 at 10:22:49AM +0200, Dor Green wrote:
> The printout:
> PMD: eth_ixgbe_dev_init(): MAC: 2, PHY: 11, SFP+: 4
> PMD: eth_ixgbe_dev_init(): port 0 vendorID=0x8086 deviceID=0x154d
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f80c0af0e40
> hw_ring=0x7f811630ce00 dma_addr=0xf1630ce00
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
> not satisfied, Scattered Rx is requested, or
> RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f80c0af0900
> hw_ring=0x7f811631ce80 dma_addr=0xf1631ce80
> PMD: set_tx_function(): Using full-featured tx code path
> PMD: set_tx_function():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
> PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]
> 
> Can't seem to get any example app to crash. Is there something I can
> run on one port which will look at the actual data of the packets?
> 
> The mempool is (I think) set up normally:
> 
> pktmbuf_pool = rte_mempool_create("mbuf_pool", MBUFNB, MBUFSZ, 0,
>   sizeof(struct rte_pktmbuf_pool_private),
>   rte_pktmbuf_pool_init, NULL,
>   rte_pktmbuf_init, NULL, NUMA_SOCKET, 0);
> 
> 
> For good measure, here's the rest of the port setup (shortened, in
> addition to what I showed below):
> 
> static struct rte_eth_rxconf const rxconf = {
> .rx_thresh = {
> .pthresh = 8,
> .hthresh = 8,
> .wthresh = 100,

This value for wthresh looks very high. Can you perhaps just try using the
defaults for the thresholds. [Passing in a NULL instead of the rxconf will just
use the defaults for rx_queue_setup in latest DPDK versions.]

/Bruce


> },
> .rx_free_thresh = 0,
> .rx_drop_en = 0,
> };
> 
> rte_eth_dev_configure(port, 1, 1, ðconf);
> rte_eth_rx_queue_setup(port, 0, hwsize, NUMA_SOCKET, &rxconf, pktmbuf_pool);
> rte_eth_dev_start(port);
> 
> 
> On Tue, Mar 24, 2015 at 6:21 PM, Bruce Richardson
>  wrote:
> > On Tue, Mar 24, 2015 at 04:10:18PM +0200, Dor Green wrote:
> >> 1 . The eth_conf is:
> >>
> >> static struct rte_eth_conf const ethconf = {
> >> .link_speed = 0,
> >> .link_duplex = 0,
> >>
> >> .rxmode = {
> >> .mq_mode = ETH_MQ_RX_RSS,
> >> .max_rx_pkt_len = ETHER_MAX_LEN,
> >> .split_hdr_size = 0,
> >> .header_split = 0,
> >> .hw_ip_checksum = 0,
> >> .hw_vlan_filter = 0,
> >> .jumbo_frame = 0,
> >> .hw_strip_crc = 0,   /**< CRC stripped by hardware */
> >> },
> >>
> >> .txmode = {
> >> },
> >>
> >> .rx_adv_conf = {
> >> .rss_conf = {
> >> .rss_key = NULL,
> >> .rss_hf = ETH_RSS_IPV4 | ETH_RSS_IPV6,
> >> }
> >> },
> >>
> >> .fdir_conf = {
> >> .mode = RTE_FDIR_MODE_SIGNATURE,
> >>
> >> },
> >>
> >> .intr_conf = {
> >> .lsc = 0,
> >> },
> >> };
> >>
> >> I've tried setting jumbo frames on with a larger packet length and
> >> even turning off RSS/FDIR. No luck.
> >>
> >> I don't see anything relating to the port in the initial prints, what
> >> are you looking for?
> >
> > I'm looking for the PMD initialization text, like that shown below (from 
> > testpmd):
> > Configuring Port 0 (socket 0)
> > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cd700 
> > hw_ring=0x7f9ba0b00080 dma_addr=0x36d00080
> > PMD: ixgbe_set_tx_function(): Using simple tx code path
> > PMD: ixgbe_set_tx_function(): Vector tx enabled.
> > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08cce80 
> > hw_ring=0x7f9ba0b10080 dma_addr=0x36d10080
> > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
> > size no less than 32.
> > Port 0: 68:05:CA:04:51:3A
> > Configuring Port 1 (socket 0)
> > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f9ba08cab40 
> > hw_ring=0x7f9ba0b20100 dma_addr=0x36d20100
> > PMD: ixgbe_set_tx_function(): Using simple tx code path
> > PMD: ixgbe_set_tx_function(): Vector tx enabled.
> > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f9ba08ca2c0 
> > hw_ring=0x7f9ba0b30100 dma_addr=0x36d30100
> > PMD: ixgbe_set_rx_function(): Vector rx enabled, please make sure RX burst 
> > size no less than 32.
> > Port 1: 68:05:CA:04:51:38
> >
> > This tells us what RX and TX functions are going to be used for each port.
> >
> >>
> >> 2. The packet is a normal, albeit somewhat large (1239 bytes) TCP data
> >> packet (SSL certificate data, specifically).
> >> One important thing of note that I've just realised is that it's not
> >> this "packet of death" which causes the segmentation fault (i.e. has
> >> an out-of-bounds address for its data), but the

[dpdk-dev] [PATCH] mlx4: remove old VMware compatibility code

2015-03-25 Thread Adrien Mazarguil
CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE has no effect since this option enables
MLX4_PMD_COMPAT_VMWARE. This macro is not used by the PMD which expects
MLX4_COMPAT_VMWARE instead.

Because this option does not work and the related code is no longer useful
for VMware (as it actually supports the flow steering API), remove it
entirely.

Signed-off-by: Olga Shern 
Signed-off-by: Adrien Mazarguil 
---
 config/common_bsdapp |  1 -
 config/common_linuxapp   |  1 -
 doc/guides/prog_guide/mlx4_poll_mode_drv.rst | 11 -
 lib/librte_pmd_mlx4/Makefile |  4 --
 lib/librte_pmd_mlx4/mlx4.c   | 64 +---
 lib/librte_pmd_mlx4/mlx4.h   |  8 
 6 files changed, 1 insertion(+), 88 deletions(-)

diff --git a/config/common_bsdapp b/config/common_bsdapp
index 8ff4dc2..5c7ca43 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -208,7 +208,6 @@ CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
-CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1

 #
 # Compile burst-oriented Cisco ENIC PMD driver
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 09a58ac..5cbb8c3 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -205,7 +205,6 @@ CONFIG_RTE_LIBRTE_MLX4_SGE_WR_N=4
 CONFIG_RTE_LIBRTE_MLX4_MAX_INLINE=0
 CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
-CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1

 #
 # Compile burst-oriented Cisco ENIC PMD driver
diff --git a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst 
b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
index 35570c3..b26c219 100644
--- a/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
+++ b/doc/guides/prog_guide/mlx4_poll_mode_drv.rst
@@ -125,11 +125,6 @@ Compilation options
   Toggle software counters. No counters are available if this option is
   disabled since hardware counters are not supported.

-- ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE`` (default **1**)
-
-  Toggle VMware compatibility code. It also requires the environment
-  variable ``MLX4_COMPAT_VMWARE`` set to a nonzero value at runtime.
-
 Environment variables
 ~

@@ -139,12 +134,6 @@ Environment variables
   significantly improve performance in some cases but lower it in
   others. Requires careful testing.

-- ``MLX4_COMPAT_VMWARE``
-
-  Only supported when compiled with
-  ``CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE=1``. Adds workarounds to run in
-  VMware systems that do not support the flows API properly.
-
 Run-time configuration
 ~~

diff --git a/lib/librte_pmd_mlx4/Makefile b/lib/librte_pmd_mlx4/Makefile
index 813..97b364a 100644
--- a/lib/librte_pmd_mlx4/Makefile
+++ b/lib/librte_pmd_mlx4/Makefile
@@ -87,10 +87,6 @@ ifdef CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS
 CFLAGS += -DMLX4_PMD_SOFT_COUNTERS=$(CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS)
 endif

-ifdef CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE
-CFLAGS += -DMLX4_PMD_COMPAT_VMWARE=$(CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE)
-endif
-
 include $(RTE_SDK)/mk/rte.lib.mk

 # Generate and clean-up mlx4_autoconf.h.
diff --git a/lib/librte_pmd_mlx4/mlx4.c b/lib/librte_pmd_mlx4/mlx4.c
index 3a45746..fa749f4 100644
--- a/lib/librte_pmd_mlx4/mlx4.c
+++ b/lib/librte_pmd_mlx4/mlx4.c
@@ -278,9 +278,6 @@ struct priv {
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
unsigned int rss:1; /* RSS is enabled. */
-#ifdef MLX4_COMPAT_VMWARE
-   unsigned int vmware:1; /* Use VMware compatibility. */
-#endif
unsigned int vf:1; /* This is a VF device. */
 #ifdef INLINE_RECV
unsigned int inl_recv_size; /* Inline recv size */
@@ -1825,7 +1822,7 @@ rxq_free_elts(struct rxq *rxq)
 static void
 rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
 {
-#if defined(NDEBUG) || defined(MLX4_COMPAT_VMWARE)
+#ifndef NDEBUG
struct priv *priv = rxq->priv;
const uint8_t (*mac)[ETHER_ADDR_LEN] =
(const uint8_t (*)[ETHER_ADDR_LEN])
@@ -1842,16 +1839,6 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
  (void *)rxq,
  (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
  mac_index);
-#ifdef MLX4_COMPAT_VMWARE
-   if (priv->vmware) {
-   union ibv_gid gid = { .raw = { 0 } };
-
-   memcpy(&gid.raw[10], *mac, sizeof(*mac));
-   claim_zero(ibv_detach_mcast(rxq->qp, &gid, 0));
-   BITFIELD_RESET(rxq->mac_configured, mac_index);
-   return;
-   }
-#endif
assert(rxq->mac_flow[mac_index] != NULL);
claim_zero(ibv_exp_destroy_flow(rxq->mac_flow[mac_index]));
rxq->mac_flow[mac_index] = NULL;
@@ -1960,22 +1947,6 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
  (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],

[dpdk-dev] [PATCH] mlx4: remove old VMware compatibility code

2015-03-25 Thread Neil Horman
On Wed, Mar 25, 2015 at 11:34:31AM +0100, Adrien Mazarguil wrote:
> CONFIG_RTE_LIBRTE_MLX4_COMPAT_VMWARE has no effect since this option enables
> MLX4_PMD_COMPAT_VMWARE. This macro is not used by the PMD which expects
> MLX4_COMPAT_VMWARE instead.
> 
> Because this option does not work and the related code is no longer useful
> for VMware (as it actually supports the flow steering API), remove it
> entirely.
> 
> Signed-off-by: Olga Shern 
> Signed-off-by: Adrien Mazarguil 
Acked-by: Neil Horman 



[dpdk-dev] [PATCH] doc: add note on needing igb_uio module for VF devs

2015-03-25 Thread Iremonger, Bernard
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Monday, March 23, 2015 4:20 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] doc: add note on needing igb_uio module for VF 
> devs
> 
> Since the uio_pci_generic module requires that the device to which it is 
> being bound supports legacy
> interrupts, there can be problems using it with VF devices. Add a note to the 
> GSG doc to document
> this fact, and provide information on loading igb_uio as a replacement.
> 
> Signed-off-by: Bruce Richardson 

Acked-by: Bernard Iremonger 



[dpdk-dev] [PATCH] fm10k: Fix queue start twice failed

2015-03-25 Thread Michael Qiu
When use "port 0 rxq 0 start" in testpmd twice, the rx queue 0 on
port 0 will failed to work.

The root casue is the rxqctl enable bit need to reset if already
enabled.

Signed-off-by: Michael Qiu 
---
 lib/librte_pmd_fm10k/fm10k_ethdev.c | 56 +
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/lib/librte_pmd_fm10k/fm10k_ethdev.c 
b/lib/librte_pmd_fm10k/fm10k_ethdev.c
index 0c7a80c..0312fad 100644
--- a/lib/librte_pmd_fm10k/fm10k_ethdev.c
+++ b/lib/librte_pmd_fm10k/fm10k_ethdev.c
@@ -72,6 +72,30 @@ fm10k_mbx_unlock(struct fm10k_hw *hw)
 }

 /*
+ * clean queue, descriptor rings, free software buffers used when stopping
+ * device.
+ */
+static inline void
+rx_queue_clean(struct fm10k_rx_queue *q)
+{
+   union fm10k_rx_desc zero = {.q = {0, 0, 0, 0} };
+   uint32_t i;
+   PMD_INIT_FUNC_TRACE();
+
+   /* zero descriptor rings */
+   for (i = 0; i < q->nb_desc; ++i)
+   q->hw_ring[i] = zero;
+
+   /* free software buffers */
+   for (i = 0; i < q->nb_desc; ++i) {
+   if (q->sw_ring[i]) {
+   rte_pktmbuf_free_seg(q->sw_ring[i]);
+   q->sw_ring[i] = NULL;
+   }
+   }
+}
+
+/*
  * reset queue to initial state, allocate software buffers used when starting
  * device.
  * return 0 on success
@@ -85,6 +109,9 @@ rx_queue_reset(struct fm10k_rx_queue *q)
int i, diag;
PMD_INIT_FUNC_TRACE();

+   /* clean the memory before allocate */
+   rx_queue_clean(q);
+
diag = rte_mempool_get_bulk(q->mp, (void **)q->sw_ring, q->nb_desc);
if (diag != 0)
return -ENOMEM;
@@ -109,30 +136,6 @@ rx_queue_reset(struct fm10k_rx_queue *q)
 }

 /*
- * clean queue, descriptor rings, free software buffers used when stopping
- * device.
- */
-static inline void
-rx_queue_clean(struct fm10k_rx_queue *q)
-{
-   union fm10k_rx_desc zero = {.q = {0, 0, 0, 0} };
-   uint32_t i;
-   PMD_INIT_FUNC_TRACE();
-
-   /* zero descriptor rings */
-   for (i = 0; i < q->nb_desc; ++i)
-   q->hw_ring[i] = zero;
-
-   /* free software buffers */
-   for (i = 0; i < q->nb_desc; ++i) {
-   if (q->sw_ring[i]) {
-   rte_pktmbuf_free_seg(q->sw_ring[i]);
-   q->sw_ring[i] = NULL;
-   }
-   }
-}
-
-/*
  * free all queue memory used when releasing the queue (i.e. configure)
  */
 static inline void
@@ -492,6 +495,11 @@ fm10k_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t 
rx_queue_id)
reg = FM10K_READ_REG(hw, FM10K_RXQCTL(rx_queue_id));
if (hw->mac.type == fm10k_mac_pf)
reg |= FM10K_RXQCTL_PF;
+
+   /* already enable? need reset to 0 */
+   if ((reg & FM10K_RXQCTL_ENABLE) == 1)
+   FM10K_WRITE_REG(hw, FM10K_RXQCTL(rx_queue_id), (reg & 
~FM10K_RXQCTL_ENABLE));
+
reg |= FM10K_RXQCTL_ENABLE;
/* enable RX queue */
FM10K_WRITE_REG(hw, FM10K_RXQCTL(rx_queue_id), reg);
-- 
1.9.3



[dpdk-dev] [PATCH] testpmd: Fix wrong message when no port started

2015-03-25 Thread De Lara Guarch, Pablo


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, March 23, 2015 1:53 PM
> To: Qiu, Michael; De Lara Guarch, Pablo
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] testpmd: Fix wrong message when no port
> started
> 
> Pablo, what is your opinion on this patch?

Sorry for the delay, I missed this email.
> 
> 2015-02-03 16:37, Michael Qiu:
> > The log message is wrong when no port started.
> >
> > Signed-off-by: Michael Qiu 
> > ---
> >  app/test-pmd/testpmd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > index 773b8af..ebf9448 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -1423,7 +1423,7 @@ start_port(portid_t pid)
> > if (need_check_link_status && !no_link_check)
> > check_all_ports_link_status(nb_ports, RTE_PORT_ALL);
> > else
> > -   printf("Please stop the ports first\n");
> > +   printf("Please start at least one port first\n");
> 
> Why the word "first"?
> What could lead to this situation? Wrong pid?
> Shouldn't be an error returned?

I see no reason why we should change this.
Code has changed since, so now it only goes there if user is trying to start a 
port that has been already started.
If pid is wrong, it will show "Port invalid".

So, in a summary, NACK.

Thanks,
Pablo
> 
> >
> > printf("Done\n");
> > return 0;
> 



[dpdk-dev] DPDK testpmd, Virtual Disk IO limitation

2015-03-25 Thread Cheng Kevin
Hi all,

   I am a beginner of DPDK. Recently, i am interest in DPDK vHost app -
testpmd.

   And i have been tracing on testpmd.c and iofwd.c for a while.

   Also add some code inside iofwd.c for storing the payload of packets.

   Everything goes fine, and the performance is great as expected.

   But when i use fwrite to store the payload into a file,

   the performance decrease from 800mbps to 3mbps (input stream is 1 Gbps).

   Is is caused by the limitation of Virtual Disk IO? How can i solve it?

   I have tried to search the answer, some people say "pthread" might solve
the problem.

   Can someone give me some hint, i really appreciate for your help.


Best Regard,

Kevin Cheng


[dpdk-dev] Interface name after bound to IGB

2015-03-25 Thread Shankari Vaidyalingam
Hi

By what name is the NIC port identified after it is bound to the igb_uio
driver (i.e after it becomes a DPDK interface)
I'm asking this question because the interface does not get displayed in
the output of the "ifconfig" after it gets bound to igb_uio driver.


Regards
Shankari.V


[dpdk-dev] [PATCH v2 3/6] eal: Fix memory leaks and needless increment of pci_map_addr

2015-03-25 Thread Iremonger, Bernard


> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, March 24, 2015 4:19 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; Richardson, Bruce; david.marchand at 6wind.com; 
> Tetsuya Mukawa
> Subject: [PATCH v2 3/6] eal: Fix memory leaks and needless increment of 
> pci_map_addr
> 
> This patch fixes following memory leaks.
> - When pci_map_resource() is failed but path is allocated correctly,
>   path won't be freed in pci_uio_map_recource().
> - When open() is failed, uio_res won't be freed in
>   pci_uio_map_resource().
> - When pci_uio_unmap() is called, path should be freed.
> 
> Also, fixes below.
> - When pci_map_resource() is failed, mapaddr will be MAP_FAILED.
>   In this case, pci_map_addr should not be incremented in
>   pci_uio_map_resource().
> - To shrink code, move close().
> - Remove fail variable.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c | 35 
> ++-
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c 
> b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> index f0277be..0128cec 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
> @@ -333,7 +333,6 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   maps = uio_res->maps;
>   for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
>   int fd;
> - int fail = 0;
> 
>   /* skip empty BAR */
>   phaddr = dev->mem_resource[i].phys_addr; @@ -347,6 +346,11 @@
> pci_uio_map_resource(struct rte_pci_device *dev)
>   loc->domain, loc->bus, loc->devid, 
> loc->function,
>   i);
> 
> + /* allocate memory to keep path */
> + maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
> + if (maps[map_idx].path == NULL)
> + goto fail0;
> +
>   /*
>* open resource file, to mmap it
>*/
> @@ -354,7 +358,7 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   if (fd < 0) {
>   RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
>   devname, strerror(errno));
> - return -1;
> + goto fail1;
>   }
> 
>   /* try mapping somewhere close to the end of hugepages */ @@ 
> -363,23 +367,13 @@
> pci_uio_map_resource(struct rte_pci_device *dev)
> 
>   mapaddr = pci_map_resource(pci_map_addr, fd, 0,
>   (size_t)dev->mem_resource[i].len, 0);
> + close(fd);
>   if (mapaddr == MAP_FAILED)
> - fail = 1;
> + goto fail1;
> 
>   pci_map_addr = RTE_PTR_ADD(mapaddr,
>   (size_t)dev->mem_resource[i].len);
> 
> - maps[map_idx].path = rte_malloc(NULL, strlen(devname) + 1, 0);
> - if (maps[map_idx].path == NULL)
> - fail = 1;
> -
> - if (fail) {
> - rte_free(uio_res);
> - close(fd);
> - return -1;
> - }
> - close(fd);
> -
>   maps[map_idx].phaddr = dev->mem_resource[i].phys_addr;
>   maps[map_idx].size = dev->mem_resource[i].len;
>   maps[map_idx].addr = mapaddr;
> @@ -394,6 +388,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
> 
>   return 0;
> +
> +fail1:
> + rte_free(maps[map_idx].path);
> +fail0:
> + for (i = 0; i < map_idx; i++)

Hi Tetsuya,

fail1: falls through to  fail0:
Would it be cleaner to drop fail1: and change the for loop in fail0:  to
for (i = 0; i <= map_idx; i++) 

Regards,

Bernard.

> + rte_free(maps[i].path);
> + rte_free(uio_res);
> +
> + return -1;
>  }
> 
>  #ifdef RTE_LIBRTE_EAL_HOTPLUG
> @@ -405,9 +408,11 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
>   if (uio_res == NULL)
>   return;
> 
> - for (i = 0; i != uio_res->nb_maps; i++)
> + for (i = 0; i != uio_res->nb_maps; i++) {
>   pci_unmap_resource(uio_res->maps[i].addr,
>   (size_t)uio_res->maps[i].size);
> + rte_free(uio_res->maps[i].path);
> + }
>  }
> 
>  static struct mapped_pci_resource *
> --
> 1.9.1



[dpdk-dev] DPDK testpmd, Virtual Disk IO limitation

2015-03-25 Thread Bruce Richardson
On Wed, Mar 25, 2015 at 10:06:48PM +0800, Cheng Kevin wrote:
> Hi all,
> 
>I am a beginner of DPDK. Recently, i am interest in DPDK vHost app -
> testpmd.
> 
>And i have been tracing on testpmd.c and iofwd.c for a while.
> 
>Also add some code inside iofwd.c for storing the payload of packets.
> 
>Everything goes fine, and the performance is great as expected.
> 
>But when i use fwrite to store the payload into a file,
> 
>the performance decrease from 800mbps to 3mbps (input stream is 1 Gbps).
> 
>Is is caused by the limitation of Virtual Disk IO? How can i solve it?
> 
>I have tried to search the answer, some people say "pthread" might solve
> the problem.
> 
>Can someone give me some hint, i really appreciate for your help.
> 
> 
> Best Regard,
> 
> Kevin Cheng

Two general issues you will hit writing to disk:
1) IO, including disk IO, is slow
2) System calls are slow.

You are probably hitting both bottlenecks.

/Bruce


[dpdk-dev] [PATCH] doc: add note on needing igb_uio module for VF devs

2015-03-25 Thread Butler, Siobhan A


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Iremonger, Bernard
> Sent: Wednesday, March 25, 2015 10:43 AM
> To: Richardson, Bruce; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] doc: add note on needing igb_uio module
> for VF devs
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Monday, March 23, 2015 4:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH] doc: add note on needing igb_uio module
> > for VF devs
> >
> > Since the uio_pci_generic module requires that the device to which it
> > is being bound supports legacy interrupts, there can be problems using
> > it with VF devices. Add a note to the GSG doc to document this fact, and
> provide information on loading igb_uio as a replacement.
> >
> > Signed-off-by: Bruce Richardson 
> 
> Acked-by: Bernard Iremonger 

Acked-by: Siobhan Butler 


[dpdk-dev] Interface name after bound to IGB

2015-03-25 Thread Bruce Richardson
On Wed, Mar 25, 2015 at 08:00:02PM +0530, Shankari Vaidyalingam wrote:
> Hi
> 
> By what name is the NIC port identified after it is bound to the igb_uio
> driver (i.e after it becomes a DPDK interface)
> I'm asking this question because the interface does not get displayed in
> the output of the "ifconfig" after it gets bound to igb_uio driver.
> 
> 
> Regards
> Shankari.V

Identified where, or to what? It still maintains the same PCI 
bus-device-function
which can be used to identify it, if that is what you mean.

/Bruce


[dpdk-dev] [PATCH v2 6/6] eal: Fix interface of pci_map_resource()

2015-03-25 Thread Iremonger, Bernard
> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, March 24, 2015 4:19 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; Richardson, Bruce; david.marchand at 6wind.com; 
> Tetsuya Mukawa
> Subject: [PATCH v2 6/6] eal: Fix interface of pci_map_resource()
> 
> The function is implemented in both linuxapp and bsdapp, but interface is 
> different. The patch fixes
> the function of bsdapp to do same as linuxapp. After applying it, file 
> descriptor should be opened and
> closed out of pci_map_resource().
> Also, remove redundant error messages from linuxapp.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/bsdapp/eal/eal_pci.c   | 111 
> ++
>  lib/librte_eal/linuxapp/eal/eal_pci_uio.c |  21 +++---
>  2 files changed, 77 insertions(+), 55 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 08b91b4..d83916b 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -100,7 +100,7 @@ struct mapped_pci_resource {
> 
>   struct rte_pci_addr pci_addr;
>   char path[PATH_MAX];
> - size_t nb_maps;
> + int nb_maps;
>   struct pci_map maps[PCI_MAX_RESOURCE];  };
> 
> @@ -122,47 +122,30 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
> __rte_unused)
> 
>  /* map a particular resource from a file */  static void * 
> -pci_map_resource(void *requested_addr,
> const char *devname, off_t offset,
> -  size_t size)
> +pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size,
> +  int additional_flags)
>  {
> - int fd;
>   void *mapaddr;
> 
> - /*
> -  * open devname, to mmap it
> -  */
> - fd = open(devname, O_RDWR);
> - if (fd < 0) {
> - RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> - devname, strerror(errno));
> - goto fail;
> - }
> -
>   /* Map the PCI memory resource of device */
>   mapaddr = mmap(requested_addr, size, PROT_READ | PROT_WRITE,
> - MAP_SHARED, fd, offset);
> - close(fd);
> - if (mapaddr == MAP_FAILED ||
> - (requested_addr != NULL && mapaddr != requested_addr)) {
> - RTE_LOG(ERR, EAL, "%s(): cannot mmap(%s(%d), %p, 0x%lx, 0x%lx):"
> - " %s (%p)\n", __func__, devname, fd, requested_addr,
> + MAP_SHARED | additional_flags, fd, offset);
> + if (mapaddr == MAP_FAILED) {
> + RTE_LOG(ERR, EAL,
> + "%s(): cannot mmap(%d, %p, 0x%lx, 0x%lx): %s (%p)\n",
> + __func__, fd, requested_addr,
>   (unsigned long)size, (unsigned long)offset,
>   strerror(errno), mapaddr);
> - goto fail;
> - }
> -
> - RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
> + } else
> + RTE_LOG(DEBUG, EAL, "  PCI memory mapped at %p\n", mapaddr);
> 
>   return mapaddr;
> -
> -fail:
> - return NULL;
>  }
> 
>  static int
>  pci_uio_map_secondary(struct rte_pci_device *dev)  {
> - size_t i;
> + int i, fd;
>   struct mapped_pci_resource *uio_res;
>   struct mapped_pci_res_list *uio_res_list =
>   RTE_TAILQ_CAST(rte_uio_tailq.head, 
> mapped_pci_res_list); @@ -170,19
> +153,34 @@ pci_uio_map_secondary(struct rte_pci_device *dev)
>   TAILQ_FOREACH(uio_res, uio_res_list, next) {
> 
>   /* skip this element if it doesn't match our PCI address */
> - if (memcmp(&uio_res->pci_addr, &dev->addr, sizeof(dev->addr)))
> + if (rte_eal_compare_pci_addr(&uio_res->pci_addr, &dev->addr))
>   continue;
> 
>   for (i = 0; i != uio_res->nb_maps; i++) {
> - if (pci_map_resource(uio_res->maps[i].addr,
> -  uio_res->path,
> -  (off_t)uio_res->maps[i].offset,
> -  (size_t)uio_res->maps[i].size)
> - != uio_res->maps[i].addr) {
> + /*
> +  * open devname, to mmap it
> +  */
> + fd = open(uio_res->maps[i].path, O_RDWR);
> + if (fd < 0) {
> + RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> + uio_res->maps[i].path, strerror(errno));
> + return -1;
> + }
> +
> + void *mapaddr = pci_map_resource(uio_res->maps[i].addr,
> + fd, (off_t)uio_res->maps[i].offset,
> + (size_t)uio_res->maps[i].size, 0);
> + if (mapaddr != uio_res->maps[i].addr) {
>   RTE_LOG(ERR, EAL,
> - 

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Olivier MATZ
Hi,

On 03/24/2015 06:00 PM, Neil Horman wrote:
> On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
>> Added a 'make system_info' target to print out system info
>> related to DPDK. This is intended as output that can be
>> attached to bug reports.
>> ---
>>   mk/rte.sdkroot.mk | 33 +
>>   1 file changed, 33 insertions(+)
>>
>> diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
>> index e8423b0..b477d09 100644
>> --- a/mk/rte.sdkroot.mk
>> +++ b/mk/rte.sdkroot.mk
>> @@ -123,3 +123,36 @@ examples examples_clean:
>>   %:
>>  $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
>>  $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
>> +
>> +.PHONY: system_info
>> +system_info:
>> +$(Q)echo
>> +$(Q)echo "CC version"
>> +$(Q)echo "=="
>> +$(Q)$(CC) --version
>> +$(Q)echo
>> +
>> +$(Q)echo "DPDK version"
>> +$(Q)echo ""
>> +$(Q)$(MAKE) showversion
>> +$(Q)echo
>> +
>> +$(Q)echo "Git commit"
>> +$(Q)echo "=="
>> +$(Q)git log --pretty=format:'%H' -1
>> +$(Q)echo
>> +
>> +$(Q)echo "Uname"
>> +$(Q)echo "="
>> +$(Q)uname -srvmpio
>> +$(Q)echo
>> +
>> +$(Q)echo "Hugepages"
>> +$(Q)echo "="
>> +$(Q)grep -i huge /proc/meminfo
>> +$(Q)echo
>> +
>> +$(Q)tools/cpu_layout.py
>> +
>> +$(Q)tools/dpdk_nic_bind.py --status
>> +$(Q)echo
>> --
>> 1.8.1.4
>>
>>
> Nak, for a few reasons:
>
> 1) While this target is in a common makefile, at least some of the information
> it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't going
> to work on BSD, or other operating systems that we might support in the future
>
> 2) This is tied to the build system.  Theres no guarantee that users will
> diagnose problems only on the system that they built the DPDK on.
>
> A better solution might be to simply document the sort of information that a 
> bug
> reporter is expected to gather, along with some sample tools for doing so.
> There are numerous tools to get the above information, both in isolation and 
> in
> aggregate.

I agree with Neil that the Makefile is probably not the best place to
put that because the target machine may not be the build machine. What
about doing the same in a script? Therefore it could be embedded and
executed on the target.

Neil, you talk about tools that do the same kind of things. What tool
are you thinking about? The problem of using external tools is that it
adds a dependency with them.


Regards,
Olivier



[dpdk-dev] Interface name after bound to IGB

2015-03-25 Thread Shankari Vaidyalingam
Hi Bruce,


If I want to capture the packets received by the interface bound to IGB
(DPDK interface) then I'd like to know what would be the interface name
that must be given.

Regards
Shankari.V


On Wed, Mar 25, 2015 at 8:33 PM, Bruce Richardson <
bruce.richardson at intel.com> wrote:

> On Wed, Mar 25, 2015 at 08:00:02PM +0530, Shankari Vaidyalingam wrote:
> > Hi
> >
> > By what name is the NIC port identified after it is bound to the igb_uio
> > driver (i.e after it becomes a DPDK interface)
> > I'm asking this question because the interface does not get displayed in
> > the output of the "ifconfig" after it gets bound to igb_uio driver.
> >
> >
> > Regards
> > Shankari.V
>
> Identified where, or to what? It still maintains the same PCI
> bus-device-function
> which can be used to identify it, if that is what you mean.
>
> /Bruce
>


[dpdk-dev] Interface name after bound to IGB

2015-03-25 Thread Bruce Richardson
On Wed, Mar 25, 2015 at 08:44:51PM +0530, Shankari Vaidyalingam wrote:
> Hi Bruce,
> 
> 
> If I want to capture the packets received by the interface bound to IGB
> (DPDK interface) then I'd like to know what would be the interface name
> that must be given.
> 
> Regards
> Shankari.V
> 

To capture using DPDK, the port will be picked up automatically by DPDK init
sequence, or you can explicitly whitelist it via BDF, if you really want. To
capture using something that uses the kernel driver, e.g. using libpcap, you
need to unbind from igb_uio and bind back to the kernel driver again.

Regards,
/Bruce

> 
> On Wed, Mar 25, 2015 at 8:33 PM, Bruce Richardson <
> bruce.richardson at intel.com> wrote:
> 
> > On Wed, Mar 25, 2015 at 08:00:02PM +0530, Shankari Vaidyalingam wrote:
> > > Hi
> > >
> > > By what name is the NIC port identified after it is bound to the igb_uio
> > > driver (i.e after it becomes a DPDK interface)
> > > I'm asking this question because the interface does not get displayed in
> > > the output of the "ifconfig" after it gets bound to igb_uio driver.
> > >
> > >
> > > Regards
> > > Shankari.V
> >
> > Identified where, or to what? It still maintains the same PCI
> > bus-device-function
> > which can be used to identify it, if that is what you mean.
> >
> > /Bruce
> >


[dpdk-dev] Interface name after bound to IGB

2015-03-25 Thread Olivier MATZ
Hi Shankari,

On 03/25/2015 04:14 PM, Shankari Vaidyalingam wrote:
> Hi Bruce,
>
>
> If I want to capture the packets received by the interface bound to IGB
> (DPDK interface) then I'd like to know what would be the interface name
> that must be given.

If you want to capture the packets, you have to write your program
(maybe based on an example) that displays the packet or save them
into a pcap format. For that, you don't need the interface name but
the portid, which is displayed with the PCI id when you start the dpdk.

Regards,
Olivier



[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Neil Horman
On Wed, Mar 25, 2015 at 04:06:10PM +0100, Olivier MATZ wrote:
> Hi,
> 
> On 03/24/2015 06:00 PM, Neil Horman wrote:
> >On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
> >>Added a 'make system_info' target to print out system info
> >>related to DPDK. This is intended as output that can be
> >>attached to bug reports.
> >>---
> >>  mk/rte.sdkroot.mk | 33 +
> >>  1 file changed, 33 insertions(+)
> >>
> >>diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
> >>index e8423b0..b477d09 100644
> >>--- a/mk/rte.sdkroot.mk
> >>+++ b/mk/rte.sdkroot.mk
> >>@@ -123,3 +123,36 @@ examples examples_clean:
> >>  %:
> >>$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
> >>$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
> >>+
> >>+.PHONY: system_info
> >>+system_info:
> >>+   $(Q)echo
> >>+   $(Q)echo "CC version"
> >>+   $(Q)echo "=="
> >>+   $(Q)$(CC) --version
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "DPDK version"
> >>+   $(Q)echo ""
> >>+   $(Q)$(MAKE) showversion
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Git commit"
> >>+   $(Q)echo "=="
> >>+   $(Q)git log --pretty=format:'%H' -1
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Uname"
> >>+   $(Q)echo "="
> >>+   $(Q)uname -srvmpio
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Hugepages"
> >>+   $(Q)echo "="
> >>+   $(Q)grep -i huge /proc/meminfo
> >>+   $(Q)echo
> >>+
> >>+   $(Q)tools/cpu_layout.py
> >>+
> >>+   $(Q)tools/dpdk_nic_bind.py --status
> >>+   $(Q)echo
> >>--
> >>1.8.1.4
> >>
> >>
> >Nak, for a few reasons:
> >
> >1) While this target is in a common makefile, at least some of the 
> >information
> >it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't 
> >going
> >to work on BSD, or other operating systems that we might support in the 
> >future
> >
> >2) This is tied to the build system.  Theres no guarantee that users will
> >diagnose problems only on the system that they built the DPDK on.
> >
> >A better solution might be to simply document the sort of information that a 
> >bug
> >reporter is expected to gather, along with some sample tools for doing so.
> >There are numerous tools to get the above information, both in isolation and 
> >in
> >aggregate.
> 
> I agree with Neil that the Makefile is probably not the best place to
> put that because the target machine may not be the build machine. What
> about doing the same in a script? Therefore it could be embedded and
> executed on the target.
> 
A script would be fine, as long as its cased for tools available on every OS.

> Neil, you talk about tools that do the same kind of things. What tool
> are you thinking about? The problem of using external tools is that it
> adds a dependency with them.
> 
Yes, but how is that different from the above?  running cat /proc/meminfo has a
dependency on the existance of /proc/meminfo, which is involate on BSD.  Theres
another file there that hold simmilar memory information, though, or perhaps a
memstat tool (I cant recall which).  The point being, to have an appropriate bug
reporting tool like this, you need to determine what information you need, then
for each operating system you have to do the right things to get it, be that
read a file, run a tool, or some other operation. 

Neil

> 
> Regards,
> Olivier
> 
> 


[dpdk-dev] [PATCH v2 5/6] eal: Use map_idx in pci_uio_map_resource() of bsdapp to work same as linuxapp

2015-03-25 Thread Iremonger, Bernard


> -Original Message-
> From: Tetsuya Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Tuesday, March 24, 2015 4:19 AM
> To: dev at dpdk.org
> Cc: Iremonger, Bernard; Richardson, Bruce; david.marchand at 6wind.com; 
> Tetsuya Mukawa
> Subject: [PATCH v2 5/6] eal: Use map_idx in pci_uio_map_resource() of bsdapp 
> to work same as
> linuxapp
> 
> This patch changes code that maps pci resources in bsdapp.
> Linuxapp has almost same code. To consolidate both, fix implementation of 
> bsdapp to work same as
> linuxapp.
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  lib/librte_eal/bsdapp/eal/eal_pci.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
> b/lib/librte_eal/bsdapp/eal/eal_pci.c
> index 85f8671..08b91b4 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_pci.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
> @@ -195,7 +195,7 @@ pci_uio_map_secondary(struct rte_pci_device *dev)  static 
> int
> pci_uio_map_resource(struct rte_pci_device *dev)  {
> - int i, j;
> + int i, map_idx;
>   char devname[PATH_MAX]; /* contains the /dev/uioX */
>   void *mapaddr;
>   uint64_t phaddr;
> @@ -247,31 +247,31 @@ pci_uio_map_resource(struct rte_pci_device *dev)
>   pagesz = sysconf(_SC_PAGESIZE);
> 
>   maps = uio_res->maps;
> - for (i = uio_res->nb_maps = 0; i != PCI_MAX_RESOURCE; i++) {
> + for (i = 0, map_idx = 0; i != PCI_MAX_RESOURCE; i++) {
> 
> - j = uio_res->nb_maps;
>   /* skip empty BAR */
>   if ((phaddr = dev->mem_resource[i].phys_addr) == 0)
>   continue;
> 
>   /* if matching map is found, then use it */
>   offset = i * pagesz;
> - maps[j].offset = offset;
> - maps[j].phaddr = dev->mem_resource[i].phys_addr;
> - maps[j].size = dev->mem_resource[i].len;
> - if (maps[j].addr != NULL ||
> - (mapaddr = pci_map_resource(NULL, devname, (off_t)offset,
> - (size_t)maps[j].size)
> - ) == NULL) {
> + maps[map_idx].offset = offset;
> + maps[map_idx].phaddr = dev->mem_resource[i].phys_addr;
> + maps[map_idx].size = dev->mem_resource[i].len;
> + mapaddr = pci_map_resource(NULL, devname, (off_t)offset,
> + (size_t)maps[map_idx].size);
> + if ((maps[map_idx].addr != NULL) || (mapaddr == NULL)) {

Hi Tetsuya,

Should be checking for  if  (mapaddr == MAP_FAILED) here.
Seems to be fixed in patch 6/6 though.

Regards,

Bernard.

>   rte_free(uio_res);
>   return -1;
>   }
> 
> - maps[j].addr = mapaddr;
> - uio_res->nb_maps++;
> + maps[map_idx].addr = mapaddr;
> + map_idx++;
>   dev->mem_resource[i].addr = mapaddr;
>   }
> 
> + uio_res->nb_maps = map_idx;
> +
>   TAILQ_INSERT_TAIL(uio_res_list, uio_res, next);
> 
>   return 0;
> --
> 1.9.1



[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Olivier MATZ
On 03/25/2015 04:22 PM, Neil Horman wrote:
> On Wed, Mar 25, 2015 at 04:06:10PM +0100, Olivier MATZ wrote:
>> Hi,
>>
>> On 03/24/2015 06:00 PM, Neil Horman wrote:
>>> On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
 Added a 'make system_info' target to print out system info
 related to DPDK. This is intended as output that can be
 attached to bug reports.
 ---
   mk/rte.sdkroot.mk | 33 +
   1 file changed, 33 insertions(+)

 diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
 index e8423b0..b477d09 100644
 --- a/mk/rte.sdkroot.mk
 +++ b/mk/rte.sdkroot.mk
 @@ -123,3 +123,36 @@ examples examples_clean:
   %:
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
 +
 +.PHONY: system_info
 +system_info:
 +  $(Q)echo
 +  $(Q)echo "CC version"
 +  $(Q)echo "=="
 +  $(Q)$(CC) --version
 +  $(Q)echo
 +
 +  $(Q)echo "DPDK version"
 +  $(Q)echo ""
 +  $(Q)$(MAKE) showversion
 +  $(Q)echo
 +
 +  $(Q)echo "Git commit"
 +  $(Q)echo "=="
 +  $(Q)git log --pretty=format:'%H' -1
 +  $(Q)echo
 +
 +  $(Q)echo "Uname"
 +  $(Q)echo "="
 +  $(Q)uname -srvmpio
 +  $(Q)echo
 +
 +  $(Q)echo "Hugepages"
 +  $(Q)echo "="
 +  $(Q)grep -i huge /proc/meminfo
 +  $(Q)echo
 +
 +  $(Q)tools/cpu_layout.py
 +
 +  $(Q)tools/dpdk_nic_bind.py --status
 +  $(Q)echo
 --
 1.8.1.4


>>> Nak, for a few reasons:
>>>
>>> 1) While this target is in a common makefile, at least some of the 
>>> information
>>> it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't 
>>> going
>>> to work on BSD, or other operating systems that we might support in the 
>>> future
>>>
>>> 2) This is tied to the build system.  Theres no guarantee that users will
>>> diagnose problems only on the system that they built the DPDK on.
>>>
>>> A better solution might be to simply document the sort of information that 
>>> a bug
>>> reporter is expected to gather, along with some sample tools for doing so.
>>> There are numerous tools to get the above information, both in isolation 
>>> and in
>>> aggregate.
>>
>> I agree with Neil that the Makefile is probably not the best place to
>> put that because the target machine may not be the build machine. What
>> about doing the same in a script? Therefore it could be embedded and
>> executed on the target.
>>
> A script would be fine, as long as its cased for tools available on every OS.
>
>> Neil, you talk about tools that do the same kind of things. What tool
>> are you thinking about? The problem of using external tools is that it
>> adds a dependency with them.
>>
> Yes, but how is that different from the above?  running cat /proc/meminfo has 
> a
> dependency on the existance of /proc/meminfo, which is involate on BSD.  
> Theres
> another file there that hold simmilar memory information, though, or perhaps a
> memstat tool (I cant recall which).  The point being, to have an appropriate 
> bug
> reporting tool like this, you need to determine what information you need, 
> then
> for each operating system you have to do the right things to get it, be that
> read a file, run a tool, or some other operation.

Agree, there's no guarantee that /proc/some/file exists on a linux
distribution as there is no guarantee that an application is available.

For instance, using applications that are packaged in coreutils or
procps should not be an issue. But I would say that using applications
included in specific packages should be avoided, and in this case
the /proc interface can be better.

Regards,
Olivier



[dpdk-dev] [PATCH 0/5] mbuf: enhancements of mbuf clones

2015-03-25 Thread Olivier Matz
This series fixes the support of indirect mbufs when the application
reserves a private area in mbufs. This is done adding a new field
in each mbuf storing the size of this private area. Another option
would have been to store that in the mbuf pool private info, but
as we have enough room in mbuf, it's faster to have it in the mbuf.

The series also removes the limitation that rte_pktmbuf_clone() is
only allowed on direct (non-cloned) mbufs.

Olivier Matz (5):
  mbuf: fix clone support when application uses private mbuf data
  mbuf: allow to clone an indirect mbuf
  test/mbuf: rename mc variable in m
  test/mbuf: enhance mbuf refcnt test
  test/mbuf: verify that cloning a clone works properly

 app/test-pmd/testpmd.c |  1 +
 app/test/test_mbuf.c   | 88 ++---
 examples/vhost/main.c  |  6 ++--
 lib/librte_mbuf/rte_mbuf.c |  1 +
 lib/librte_mbuf/rte_mbuf.h | 90 +++---
 5 files changed, 140 insertions(+), 46 deletions(-)

-- 
2.1.4



[dpdk-dev] [PATCH 1/5] mbuf: fix clone support when application uses private mbuf data

2015-03-25 Thread Olivier Matz
Add a new private_size field in mbuf structure that should
be initialized at mbuf pool creation. This field contains the
size of the application private data in mbufs.

Introduce new static inline functions rte_mbuf_from_baddr()
and rte_mbuf_to_baddr() to replace the existing macros, which
take the private size in account when attaching and detaching
mbufs.

Signed-off-by: Olivier Matz 
---
 app/test-pmd/testpmd.c |  1 +
 examples/vhost/main.c  |  6 ++
 lib/librte_mbuf/rte_mbuf.c |  1 +
 lib/librte_mbuf/rte_mbuf.h | 44 +++-
 4 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 3057791..c5a195a 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -425,6 +425,7 @@ testpmd_mbuf_ctor(struct rte_mempool *mp,
mb->tx_offload   = 0;
mb->vlan_tci = 0;
mb->hash.rss = 0;
+   mb->priv_size= 0;
 }

 static void
diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c3fcb80..050f3ac 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -139,8 +139,6 @@
 /* Number of descriptors per cacheline. */
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))

-#define MBUF_EXT_MEM(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
-
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -1590,7 +1588,7 @@ txmbuf_clean_zcp(struct virtio_net *dev, struct vpool 
*vpool)

for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
-   if (likely(MBUF_EXT_MEM(mbuf)))
+   if (likely(RTE_MBUF_INDIRECT(mbuf)))
pktmbuf_detach_zcp(mbuf);
rte_ring_sp_enqueue(vpool->ring, mbuf);

@@ -1653,7 +1651,7 @@ static void mbuf_destroy_zcp(struct vpool *vpool)
for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
if (likely(mbuf != NULL)) {
-   if (likely(MBUF_EXT_MEM(mbuf)))
+   if (likely(RTE_MBUF_INDIRECT(mbuf)))
pktmbuf_detach_zcp(mbuf);
rte_ring_sp_enqueue(vpool->ring, (void *)mbuf);
}
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 526b18d..e095999 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -125,6 +125,7 @@ rte_pktmbuf_init(struct rte_mempool *mp,
m->pool = mp;
m->nb_segs = 1;
m->port = 0xff;
+   m->priv_size = 0;
 }

 /* do some sanity checks on a mbuf: panic if it fails */
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 17ba791..4ced6d3 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -268,7 +268,7 @@ struct rte_mbuf {
uint16_t data_len;/**< Amount of data in segment buffer. */
uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
uint16_t vlan_tci;/**< VLAN Tag Control Identifier (CPU order) 
*/
-   uint16_t reserved;
+   uint16_t priv_size;   /**< size of the application private data */
union {
uint32_t rss; /**< RSS hash result if RSS enabled */
struct {
@@ -320,15 +320,38 @@ struct rte_mbuf {
 } __rte_cache_aligned;

 /**
- * Given the buf_addr returns the pointer to corresponding mbuf.
+ * Return the mbuf owning the given data buffer address.
+ *
+ * @param mi
+ *   The pointer to the indirect mbuf.
+ * @param buffer_addr
+ *   The address of the data buffer of the direct mbuf.
+ * @return
+ *   The address of the direct mbuf corresponding to buffer_addr.
  */
-#define RTE_MBUF_FROM_BADDR(ba) (((struct rte_mbuf *)(ba)) - 1)
+static inline struct rte_mbuf *
+rte_mbuf_from_baddr(struct rte_mbuf *mi, char *buffer_addr)
+{
+   struct rte_mbuf *md;
+   md = (struct rte_mbuf *)(buffer_addr - sizeof(*mi) - mi->priv_size);
+   return md;
+}

 /**
- * Given the pointer to mbuf returns an address where it's  buf_addr
- * should point to.
+ * Return the buffer address embedded in the given mbuf.
+ *
+ * @param md
+ *   The pointer to the mbuf.
+ * @return
+ *   The address of the data buffer owned by the mbuf.
  */
-#define RTE_MBUF_TO_BADDR(mb)   (((struct rte_mbuf *)(mb)) + 1)
+static inline char *
+rte_mbuf_to_baddr(struct rte_mbuf *md)
+{
+   char *buffer_addr;
+   buffer_addr = (char *)md + sizeof(*md) + md->priv_size;
+   return buffer_addr;
+}

 /**
  * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
@@ -744,9 +767,11 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, 
struct rte_mbuf *md)
 static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
 {
const struct rte_mempool *mp = m->pool;
-   void *buf = RTE_MBUF_TO_BADDR(m);
+   void *buf = rte_mbuf_to_baddr(m);
uint32_t buf_len = m

[dpdk-dev] [PATCH 2/5] mbuf: allow to clone an indirect mbuf

2015-03-25 Thread Olivier Matz
Remove one limitation of rte_pktmbuf_attach(): "mbuf we're attaching to
must be direct".

Now, when we attach to an indirect mbuf:
- copy the all relevant fields (addr, len, offload, ...) as before
- get the pointer to the mbuf that embeds the data buffer (direct mbuf),
  and increase the reference counter of this one.

When detaching the mbuf, we can retrieve this direct mbuf as the pointer
is determined from the buffer address.

Signed-off-by: Olivier Matz 
---
 lib/librte_mbuf/rte_mbuf.h | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 4ced6d3..3dc12cb 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -714,44 +714,50 @@ static inline struct rte_mbuf *rte_pktmbuf_alloc(struct 
rte_mempool *mp)
  * After attachment we refer the mbuf we attached as 'indirect',
  * while mbuf we attached to as 'direct'.
  * Right now, not supported:
- *  - attachment to indirect mbuf (e.g. - md  has to be direct).
  *  - attachment for already indirect mbuf (e.g. - mi has to be direct).
  *  - mbuf we trying to attach (mi) is used by someone else
  *e.g. it's reference counter is greater then 1.
  *
  * @param mi
  *   The indirect packet mbuf.
- * @param md
- *   The direct packet mbuf.
+ * @param m
+ *   The packet mbuf we're attaching to.
  */

-static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *md)
+static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 {
-   RTE_MBUF_ASSERT(RTE_MBUF_DIRECT(md) &&
-   RTE_MBUF_DIRECT(mi) &&
+   struct rte_mbuf *md;
+
+   RTE_MBUF_ASSERT(RTE_MBUF_DIRECT(mi) &&
rte_mbuf_refcnt_read(mi) == 1);

+   /* if m is not direct, get the mbuf that embeds the data */
+   if (RTE_MBUF_DIRECT(m))
+   md = m;
+   else
+   md = rte_mbuf_from_baddr(m, (char *)m->buf_addr);
+
rte_mbuf_refcnt_update(md, 1);
-   mi->buf_physaddr = md->buf_physaddr;
-   mi->buf_addr = md->buf_addr;
-   mi->buf_len = md->buf_len;
-
-   mi->next = md->next;
-   mi->data_off = md->data_off;
-   mi->data_len = md->data_len;
-   mi->port = md->port;
-   mi->vlan_tci = md->vlan_tci;
-   mi->tx_offload = md->tx_offload;
-   mi->hash = md->hash;
+   mi->buf_physaddr = m->buf_physaddr;
+   mi->buf_addr = m->buf_addr;
+   mi->buf_len = m->buf_len;
+
+   mi->next = m->next;
+   mi->data_off = m->data_off;
+   mi->data_len = m->data_len;
+   mi->port = m->port;
+   mi->vlan_tci = m->vlan_tci;
+   mi->tx_offload = m->tx_offload;
+   mi->hash = m->hash;

mi->next = NULL;
mi->pkt_len = mi->data_len;
mi->nb_segs = 1;
-   mi->ol_flags = md->ol_flags | IND_ATTACHED_MBUF;
-   mi->packet_type = md->packet_type;
+   mi->ol_flags = m->ol_flags | IND_ATTACHED_MBUF;
+   mi->packet_type = m->packet_type;

__rte_mbuf_sanity_check(mi, 1);
-   __rte_mbuf_sanity_check(md, 0);
+   __rte_mbuf_sanity_check(m, 0);
 }

 /**
-- 
2.1.4



[dpdk-dev] [PATCH 3/5] test/mbuf: rename mc variable in m

2015-03-25 Thread Olivier Matz
It's better to name the mbuf 'm' instead of 'mc' as it's not a clone.

Signed-off-by: Olivier Matz 
---
 app/test/test_mbuf.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 1ff66cb..9a3cf8f 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -321,43 +321,42 @@ fail:
 static int
 testclone_testupdate_testdetach(void)
 {
-   struct rte_mbuf *mc = NULL;
+   struct rte_mbuf *m = NULL;
struct rte_mbuf *clone = NULL;

/* alloc a mbuf */
-
-   mc = rte_pktmbuf_alloc(pktmbuf_pool);
-   if (mc == NULL)
+   m = rte_pktmbuf_alloc(pktmbuf_pool);
+   if (m == NULL)
GOTO_FAIL("ooops not allocating mbuf");

-   if (rte_pktmbuf_pkt_len(mc) != 0)
+   if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("Bad length");


/* clone the allocated mbuf */
-   clone = rte_pktmbuf_clone(mc, pktmbuf_pool);
+   clone = rte_pktmbuf_clone(m, pktmbuf_pool);
if (clone == NULL)
GOTO_FAIL("cannot clone data\n");
rte_pktmbuf_free(clone);

-   mc->next = rte_pktmbuf_alloc(pktmbuf_pool);
-   if(mc->next == NULL)
+   m->next = rte_pktmbuf_alloc(pktmbuf_pool);
+   if (m->next == NULL)
GOTO_FAIL("Next Pkt Null\n");

-   clone = rte_pktmbuf_clone(mc, pktmbuf_pool);
+   clone = rte_pktmbuf_clone(m, pktmbuf_pool);
if (clone == NULL)
GOTO_FAIL("cannot clone data\n");

/* free mbuf */
-   rte_pktmbuf_free(mc);
+   rte_pktmbuf_free(m);
rte_pktmbuf_free(clone);
-   mc = NULL;
+   m = NULL;
clone = NULL;
return 0;

 fail:
-   if (mc)
-   rte_pktmbuf_free(mc);
+   if (m)
+   rte_pktmbuf_free(m);
return -1;
 }
 #undef GOTO_FAIL
-- 
2.1.4



[dpdk-dev] [PATCH 4/5] test/mbuf: enhance mbuf refcnt test

2015-03-25 Thread Olivier Matz
Check that the data in the cloned mbuf is the same than in the
reference mbuf.
Check that the reference counter is incremented for each segment.

Signed-off-by: Olivier Matz 
---
 app/test/test_mbuf.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 9a3cf8f..9d8ee4e 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -76,6 +76,8 @@
 #define REFCNT_MBUF_SIZE(sizeof (struct rte_mbuf) + 
RTE_PKTMBUF_HEADROOM)
 #define REFCNT_RING_SIZE(REFCNT_MBUF_NUM * REFCNT_MAX_REF)

+#define MAGIC_DATA  0x42424242
+
 #define MAKE_STRING(x)  # x

 static struct rte_mempool *pktmbuf_pool = NULL;
@@ -323,6 +325,7 @@ testclone_testupdate_testdetach(void)
 {
struct rte_mbuf *m = NULL;
struct rte_mbuf *clone = NULL;
+   uint32_t *data;

/* alloc a mbuf */
m = rte_pktmbuf_alloc(pktmbuf_pool);
@@ -332,21 +335,53 @@ testclone_testupdate_testdetach(void)
if (rte_pktmbuf_pkt_len(m) != 0)
GOTO_FAIL("Bad length");

+   rte_pktmbuf_append(m, sizeof(uint32_t));
+   data = rte_pktmbuf_mtod(m, uint32_t *);
+   *data = MAGIC_DATA;

/* clone the allocated mbuf */
clone = rte_pktmbuf_clone(m, pktmbuf_pool);
if (clone == NULL)
GOTO_FAIL("cannot clone data\n");
+
+   data = rte_pktmbuf_mtod(clone, uint32_t *);
+   if (*data != MAGIC_DATA)
+   GOTO_FAIL("invalid data in clone\n");
+
+   if (rte_mbuf_refcnt_read(m) != 2)
+   GOTO_FAIL("invalid refcnt in m\n");
+
+   /* free the clone */
rte_pktmbuf_free(clone);
+   clone = NULL;

+   /* same test with a chained mbuf */
m->next = rte_pktmbuf_alloc(pktmbuf_pool);
if (m->next == NULL)
GOTO_FAIL("Next Pkt Null\n");

+   rte_pktmbuf_append(m->next, sizeof(uint32_t));
+   data = rte_pktmbuf_mtod(m->next, uint32_t *);
+   *data = MAGIC_DATA;
+
clone = rte_pktmbuf_clone(m, pktmbuf_pool);
if (clone == NULL)
GOTO_FAIL("cannot clone data\n");

+   data = rte_pktmbuf_mtod(clone, uint32_t *);
+   if (*data != MAGIC_DATA)
+   GOTO_FAIL("invalid data in clone\n");
+
+   data = rte_pktmbuf_mtod(clone->next, uint32_t *);
+   if (*data != MAGIC_DATA)
+   GOTO_FAIL("invalid data in clone->next\n");
+
+   if (rte_mbuf_refcnt_read(m) != 2)
+   GOTO_FAIL("invalid refcnt in m\n");
+
+   if (rte_mbuf_refcnt_read(m->next) != 2)
+   GOTO_FAIL("invalid refcnt in m->next\n");
+
/* free mbuf */
rte_pktmbuf_free(m);
rte_pktmbuf_free(clone);
@@ -357,6 +392,8 @@ testclone_testupdate_testdetach(void)
 fail:
if (m)
rte_pktmbuf_free(m);
+   if (clone)
+   rte_pktmbuf_free(clone);
return -1;
 }
 #undef GOTO_FAIL
-- 
2.1.4



[dpdk-dev] [PATCH 5/5] test/mbuf: verify that cloning a clone works properly

2015-03-25 Thread Olivier Matz
Signed-off-by: Olivier Matz 
---
 app/test/test_mbuf.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index 9d8ee4e..68cd4de 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -325,6 +325,7 @@ testclone_testupdate_testdetach(void)
 {
struct rte_mbuf *m = NULL;
struct rte_mbuf *clone = NULL;
+   struct rte_mbuf *clone2 = NULL;
uint32_t *data;

/* alloc a mbuf */
@@ -382,11 +383,34 @@ testclone_testupdate_testdetach(void)
if (rte_mbuf_refcnt_read(m->next) != 2)
GOTO_FAIL("invalid refcnt in m->next\n");

+   /* try to clone the clone */
+
+   clone2 = rte_pktmbuf_clone(clone, pktmbuf_pool);
+   if (clone2 == NULL)
+   GOTO_FAIL("cannot clone the clone\n");
+
+   data = rte_pktmbuf_mtod(clone2, uint32_t *);
+   if (*data != MAGIC_DATA)
+   GOTO_FAIL("invalid data in clone2\n");
+
+   data = rte_pktmbuf_mtod(clone2->next, uint32_t *);
+   if (*data != MAGIC_DATA)
+   GOTO_FAIL("invalid data in clone2->next\n");
+
+   if (rte_mbuf_refcnt_read(m) != 3)
+   GOTO_FAIL("invalid refcnt in m\n");
+
+   if (rte_mbuf_refcnt_read(m->next) != 3)
+   GOTO_FAIL("invalid refcnt in m->next\n");
+
/* free mbuf */
rte_pktmbuf_free(m);
rte_pktmbuf_free(clone);
+   rte_pktmbuf_free(clone2);
+
m = NULL;
clone = NULL;
+   clone2 = NULL;
return 0;

 fail:
@@ -394,6 +418,8 @@ fail:
rte_pktmbuf_free(m);
if (clone)
rte_pktmbuf_free(clone);
+   if (clone2)
+   rte_pktmbuf_free(clone);
return -1;
 }
 #undef GOTO_FAIL
-- 
2.1.4



[dpdk-dev] ovs-dpdk: placing the metadata

2015-03-25 Thread Olivier MATZ
Hi Zoltan,

On 03/24/2015 06:42 PM, Zoltan Kiss wrote:
> Hi,
>
> I've noticed in lib/netdev-dpdk.c that __rte_pktmbuf_init() stores the
> packet metadata right after "struct rte_mbuf", and before the buffer data:
>
>  /* start of buffer is just after mbuf structure */
>  m->buf_addr = (char *)m + sizeof(struct dp_packet);
>
> (struct dp_packet has the rte_mbuf as first member if DPDK enabled)
>
> However, lib/librte_mbuf/rte_mbuf.h seems to codify that the buffer
> should start right after the rte_mbuf:
>
> /**
>   * Given the buf_addr returns the pointer to corresponding mbuf.
>   */
> #define RTE_MBUF_FROM_BADDR(ba) (((struct rte_mbuf *)(ba)) - 1)
>
> /**
>   * Given the pointer to mbuf returns an address where it's  buf_addr
>   * should point to.
>   */
> #define RTE_MBUF_TO_BADDR(mb)   (((struct rte_mbuf *)(mb)) + 1)
>
> These macros are used for attaching/detaching mbuf's to each other. This
> is the way the code retrieves the direct buffer from an indirect one,
> and vica versa. I think if we want to keep the metadata feature (which I
> guess is quite important), we need to add a pointer to rte_mbuf, which
> helps the direct and indirect structs to find each other. Something like:
>
>  struct rte_mbuf *attach;/**< Points to the other buffer if this
> one
>   is (in)direct. Otherwise NULL.  */
>
> What do you think?

I've just sent a patch that should fix this issue.
http://dpdk.org/ml/archives/dev/2015-March/015722.html

Let me know if you have any comment on it.

Regards,
Olivier



[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Neil Horman
On Wed, Mar 25, 2015 at 04:42:23PM +0100, Olivier MATZ wrote:
> On 03/25/2015 04:22 PM, Neil Horman wrote:
> >On Wed, Mar 25, 2015 at 04:06:10PM +0100, Olivier MATZ wrote:
> >>Hi,
> >>
> >>On 03/24/2015 06:00 PM, Neil Horman wrote:
> >>>On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
> Added a 'make system_info' target to print out system info
> related to DPDK. This is intended as output that can be
> attached to bug reports.
> ---
>   mk/rte.sdkroot.mk | 33 +
>   1 file changed, 33 insertions(+)
> 
> diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
> index e8423b0..b477d09 100644
> --- a/mk/rte.sdkroot.mk
> +++ b/mk/rte.sdkroot.mk
> @@ -123,3 +123,36 @@ examples examples_clean:
>   %:
>   $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
>   $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
> +
> +.PHONY: system_info
> +system_info:
> + $(Q)echo
> + $(Q)echo "CC version"
> + $(Q)echo "=="
> + $(Q)$(CC) --version
> + $(Q)echo
> +
> + $(Q)echo "DPDK version"
> + $(Q)echo ""
> + $(Q)$(MAKE) showversion
> + $(Q)echo
> +
> + $(Q)echo "Git commit"
> + $(Q)echo "=="
> + $(Q)git log --pretty=format:'%H' -1
> + $(Q)echo
> +
> + $(Q)echo "Uname"
> + $(Q)echo "="
> + $(Q)uname -srvmpio
> + $(Q)echo
> +
> + $(Q)echo "Hugepages"
> + $(Q)echo "="
> + $(Q)grep -i huge /proc/meminfo
> + $(Q)echo
> +
> + $(Q)tools/cpu_layout.py
> +
> + $(Q)tools/dpdk_nic_bind.py --status
> + $(Q)echo
> --
> 1.8.1.4
> 
> 
> >>>Nak, for a few reasons:
> >>>
> >>>1) While this target is in a common makefile, at least some of the 
> >>>information
> >>>it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't 
> >>>going
> >>>to work on BSD, or other operating systems that we might support in the 
> >>>future
> >>>
> >>>2) This is tied to the build system.  Theres no guarantee that users will
> >>>diagnose problems only on the system that they built the DPDK on.
> >>>
> >>>A better solution might be to simply document the sort of information that 
> >>>a bug
> >>>reporter is expected to gather, along with some sample tools for doing so.
> >>>There are numerous tools to get the above information, both in isolation 
> >>>and in
> >>>aggregate.
> >>
> >>I agree with Neil that the Makefile is probably not the best place to
> >>put that because the target machine may not be the build machine. What
> >>about doing the same in a script? Therefore it could be embedded and
> >>executed on the target.
> >>
> >A script would be fine, as long as its cased for tools available on every OS.
> >
> >>Neil, you talk about tools that do the same kind of things. What tool
> >>are you thinking about? The problem of using external tools is that it
> >>adds a dependency with them.
> >>
> >Yes, but how is that different from the above?  running cat /proc/meminfo 
> >has a
> >dependency on the existance of /proc/meminfo, which is involate on BSD.  
> >Theres
> >another file there that hold simmilar memory information, though, or perhaps 
> >a
> >memstat tool (I cant recall which).  The point being, to have an appropriate 
> >bug
> >reporting tool like this, you need to determine what information you need, 
> >then
> >for each operating system you have to do the right things to get it, be that
> >read a file, run a tool, or some other operation.
> 
> Agree, there's no guarantee that /proc/some/file exists on a linux
> distribution as there is no guarantee that an application is available.
> 
Agreed.

> For instance, using applications that are packaged in coreutils or
> procps should not be an issue. But I would say that using applications
> included in specific packages should be avoided, and in this case
> the /proc interface can be better.
> 
Why?  We just agreed that there is no guarantee that a file exists in /proc, so
its no better or worse than using an application which may or may not be
installed.  If the file is available, then great, you can use it, but otherwise
you have to provide some alternate method for getting the data. Just not
collecting some of it in my mind makes such a script not worthwhile

All I'm saying here is that if we want to provide this functionality we need to
do one of the following:

1) Write a script (to remove ourselves from being bound to a build environment),
which codifies the data items we wish to collect for debugging.  For each items
we need a case statement of the form:
switch $PLATFORM {
CASE BSD:

CASE LINUX:

CASE OSV:

}

Where each case either cats a file or runs an appropriate tool (making the
appropriate check for its avilability when needed).

Or

2) Document the kind of data that we need when debugging, and make sugge

[dpdk-dev] DPDK testpmd, Virtual Disk IO limitation

2015-03-25 Thread Bruce Richardson
On Thu, Mar 26, 2015 at 12:20:42AM +0800, Cheng Kevin wrote:
> Mr. Bruce Richardson
> 
> Yes, you are right. This really bother me.
> 
> Is there any way to get rid of system call? Maybe some DPDK threading API?
> Maybe i should use a extra nic card for posting the data out through
> internet, instead of writing on the disk - ex. fwrite.
> 
> Or you have some better advises??
> 
> Thanks
> Kevin
> 

Hi Kevin,

what is your end-goal that you are trying to get to? 

/Bruce

> 
> 2015-03-25 23:01 GMT+08:00 Bruce Richardson :
> 
> > On Wed, Mar 25, 2015 at 10:06:48PM +0800, Cheng Kevin wrote:
> > > Hi all,
> > >
> > >I am a beginner of DPDK. Recently, i am interest in DPDK vHost app -
> > > testpmd.
> > >
> > >And i have been tracing on testpmd.c and iofwd.c for a while.
> > >
> > >Also add some code inside iofwd.c for storing the payload of packets.
> > >
> > >Everything goes fine, and the performance is great as expected.
> > >
> > >But when i use fwrite to store the payload into a file,
> > >
> > >the performance decrease from 800mbps to 3mbps (input stream is 1
> > Gbps).
> > >
> > >Is is caused by the limitation of Virtual Disk IO? How can i solve it?
> > >
> > >I have tried to search the answer, some people say "pthread" might
> > solve
> > > the problem.
> > >
> > >Can someone give me some hint, i really appreciate for your help.
> > >
> > >
> > > Best Regard,
> > >
> > > Kevin Cheng
> >
> > Two general issues you will hit writing to disk:
> > 1) IO, including disk IO, is slow
> > 2) System calls are slow.
> >
> > You are probably hitting both bottlenecks.
> >
> > /Bruce
> >


[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Olivier MATZ
On 03/25/2015 06:22 PM, Neil Horman wrote:
> On Wed, Mar 25, 2015 at 04:42:23PM +0100, Olivier MATZ wrote:
>> On 03/25/2015 04:22 PM, Neil Horman wrote:
>>> On Wed, Mar 25, 2015 at 04:06:10PM +0100, Olivier MATZ wrote:
 Hi,

 On 03/24/2015 06:00 PM, Neil Horman wrote:
> On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
>> Added a 'make system_info' target to print out system info
>> related to DPDK. This is intended as output that can be
>> attached to bug reports.
>> ---
>>   mk/rte.sdkroot.mk | 33 +
>>   1 file changed, 33 insertions(+)
>>
>> diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
>> index e8423b0..b477d09 100644
>> --- a/mk/rte.sdkroot.mk
>> +++ b/mk/rte.sdkroot.mk
>> @@ -123,3 +123,36 @@ examples examples_clean:
>>   %:
>>  $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
>>  $(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
>> +
>> +.PHONY: system_info
>> +system_info:
>> +$(Q)echo
>> +$(Q)echo "CC version"
>> +$(Q)echo "=="
>> +$(Q)$(CC) --version
>> +$(Q)echo
>> +
>> +$(Q)echo "DPDK version"
>> +$(Q)echo ""
>> +$(Q)$(MAKE) showversion
>> +$(Q)echo
>> +
>> +$(Q)echo "Git commit"
>> +$(Q)echo "=="
>> +$(Q)git log --pretty=format:'%H' -1
>> +$(Q)echo
>> +
>> +$(Q)echo "Uname"
>> +$(Q)echo "="
>> +$(Q)uname -srvmpio
>> +$(Q)echo
>> +
>> +$(Q)echo "Hugepages"
>> +$(Q)echo "="
>> +$(Q)grep -i huge /proc/meminfo
>> +$(Q)echo
>> +
>> +$(Q)tools/cpu_layout.py
>> +
>> +$(Q)tools/dpdk_nic_bind.py --status
>> +$(Q)echo
>> --
>> 1.8.1.4
>>
>>
> Nak, for a few reasons:
>
> 1) While this target is in a common makefile, at least some of the 
> information
> it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't 
> going
> to work on BSD, or other operating systems that we might support in the 
> future
>
> 2) This is tied to the build system.  Theres no guarantee that users will
> diagnose problems only on the system that they built the DPDK on.
>
> A better solution might be to simply document the sort of information 
> that a bug
> reporter is expected to gather, along with some sample tools for doing so.
> There are numerous tools to get the above information, both in isolation 
> and in
> aggregate.

 I agree with Neil that the Makefile is probably not the best place to
 put that because the target machine may not be the build machine. What
 about doing the same in a script? Therefore it could be embedded and
 executed on the target.

>>> A script would be fine, as long as its cased for tools available on every 
>>> OS.
>>>
 Neil, you talk about tools that do the same kind of things. What tool
 are you thinking about? The problem of using external tools is that it
 adds a dependency with them.

>>> Yes, but how is that different from the above?  running cat /proc/meminfo 
>>> has a
>>> dependency on the existance of /proc/meminfo, which is involate on BSD.  
>>> Theres
>>> another file there that hold simmilar memory information, though, or 
>>> perhaps a
>>> memstat tool (I cant recall which).  The point being, to have an 
>>> appropriate bug
>>> reporting tool like this, you need to determine what information you need, 
>>> then
>>> for each operating system you have to do the right things to get it, be that
>>> read a file, run a tool, or some other operation.
>>
>> Agree, there's no guarantee that /proc/some/file exists on a linux
>> distribution as there is no guarantee that an application is available.
>>
> Agreed.
>
>> For instance, using applications that are packaged in coreutils or
>> procps should not be an issue. But I would say that using applications
>> included in specific packages should be avoided, and in this case
>> the /proc interface can be better.
>>
> Why?  We just agreed that there is no guarantee that a file exists in /proc, 
> so
> its no better or worse than using an application which may or may not be
> installed.  If the file is available, then great, you can use it, but 
> otherwise
> you have to provide some alternate method for getting the data. Just not
> collecting some of it in my mind makes such a script not worthwhile

I'm just saying that on linux it is much more likely to have
/proc/meminfo (which is available since at least 2.4.x kernel) instead
of having a rare package providing a tool able to format /proc/meminfo.

On the other hand, using a common tool is preferable if we can expect
it is installed on most distribut

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Neil Horman
On Wed, Mar 25, 2015 at 06:42:34PM +0100, Olivier MATZ wrote:
> On 03/25/2015 06:22 PM, Neil Horman wrote:
> >On Wed, Mar 25, 2015 at 04:42:23PM +0100, Olivier MATZ wrote:
> >>On 03/25/2015 04:22 PM, Neil Horman wrote:
> >>>On Wed, Mar 25, 2015 at 04:06:10PM +0100, Olivier MATZ wrote:
> Hi,
> 
> On 03/24/2015 06:00 PM, Neil Horman wrote:
> >On Tue, Mar 24, 2015 at 02:52:59PM +, John McNamara wrote:
> >>Added a 'make system_info' target to print out system info
> >>related to DPDK. This is intended as output that can be
> >>attached to bug reports.
> >>---
> >>  mk/rte.sdkroot.mk | 33 +
> >>  1 file changed, 33 insertions(+)
> >>
> >>diff --git a/mk/rte.sdkroot.mk b/mk/rte.sdkroot.mk
> >>index e8423b0..b477d09 100644
> >>--- a/mk/rte.sdkroot.mk
> >>+++ b/mk/rte.sdkroot.mk
> >>@@ -123,3 +123,36 @@ examples examples_clean:
> >>  %:
> >>$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkconfig.mk checkconfig
> >>$(Q)$(MAKE) -f $(RTE_SDK)/mk/rte.sdkbuild.mk $@
> >>+
> >>+.PHONY: system_info
> >>+system_info:
> >>+   $(Q)echo
> >>+   $(Q)echo "CC version"
> >>+   $(Q)echo "=="
> >>+   $(Q)$(CC) --version
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "DPDK version"
> >>+   $(Q)echo ""
> >>+   $(Q)$(MAKE) showversion
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Git commit"
> >>+   $(Q)echo "=="
> >>+   $(Q)git log --pretty=format:'%H' -1
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Uname"
> >>+   $(Q)echo "="
> >>+   $(Q)uname -srvmpio
> >>+   $(Q)echo
> >>+
> >>+   $(Q)echo "Hugepages"
> >>+   $(Q)echo "="
> >>+   $(Q)grep -i huge /proc/meminfo
> >>+   $(Q)echo
> >>+
> >>+   $(Q)tools/cpu_layout.py
> >>+
> >>+   $(Q)tools/dpdk_nic_bind.py --status
> >>+   $(Q)echo
> >>--
> >>1.8.1.4
> >>
> >>
> >Nak, for a few reasons:
> >
> >1) While this target is in a common makefile, at least some of the 
> >information
> >it gathers is operating system specfic (e.g. /proc/meminfo).  This isn't 
> >going
> >to work on BSD, or other operating systems that we might support in the 
> >future
> >
> >2) This is tied to the build system.  Theres no guarantee that users will
> >diagnose problems only on the system that they built the DPDK on.
> >
> >A better solution might be to simply document the sort of information 
> >that a bug
> >reporter is expected to gather, along with some sample tools for doing 
> >so.
> >There are numerous tools to get the above information, both in isolation 
> >and in
> >aggregate.
> 
> I agree with Neil that the Makefile is probably not the best place to
> put that because the target machine may not be the build machine. What
> about doing the same in a script? Therefore it could be embedded and
> executed on the target.
> 
> >>>A script would be fine, as long as its cased for tools available on every 
> >>>OS.
> >>>
> Neil, you talk about tools that do the same kind of things. What tool
> are you thinking about? The problem of using external tools is that it
> adds a dependency with them.
> 
> >>>Yes, but how is that different from the above?  running cat /proc/meminfo 
> >>>has a
> >>>dependency on the existance of /proc/meminfo, which is involate on BSD.  
> >>>Theres
> >>>another file there that hold simmilar memory information, though, or 
> >>>perhaps a
> >>>memstat tool (I cant recall which).  The point being, to have an 
> >>>appropriate bug
> >>>reporting tool like this, you need to determine what information you need, 
> >>>then
> >>>for each operating system you have to do the right things to get it, be 
> >>>that
> >>>read a file, run a tool, or some other operation.
> >>
> >>Agree, there's no guarantee that /proc/some/file exists on a linux
> >>distribution as there is no guarantee that an application is available.
> >>
> >Agreed.
> >
> >>For instance, using applications that are packaged in coreutils or
> >>procps should not be an issue. But I would say that using applications
> >>included in specific packages should be avoided, and in this case
> >>the /proc interface can be better.
> >>
> >Why?  We just agreed that there is no guarantee that a file exists in /proc, 
> >so
> >its no better or worse than using an application which may or may not be
> >installed.  If the file is available, then great, you can use it, but 
> >otherwise
> >you have to provide some alternate method for getting the data. Just not
> >collecting some of it in my mind makes such a script not worthwhile
> 
> I'm just saying that on linux it is much more likely to have
> /proc/meminfo (which is available since at least 2.4.x

[dpdk-dev] [PATCH] mk: added make target to print out system info

2015-03-25 Thread Mcnamara, John
> -Original Message-
> From: Neil Horman [mailto:nhorman at tuxdriver.com]
> Sent: Wednesday, March 25, 2015 5:22 PM
> To: Olivier MATZ
> Cc: Mcnamara, John; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: added make target to print out system
> info
> 
> > For instance, using applications that are packaged in coreutils or
> > procps should not be an issue. But I would say that using applications
> > included in specific packages should be avoided, and in this case the
> > /proc interface can be better.
> >
> Why?  We just agreed that there is no guarantee that a file exists in
> /proc, so its no better or worse than using an application which may or
> may not be installed.  If the file is available, then great, you can use
> it, but otherwise you have to provide some alternate method for getting
> the data. Just not collecting some of it in my mind makes such a script
> not worthwhile
> 
> All I'm saying here is that if we want to provide this functionality we
> need to do one of the following:
> 
> 1) Write a script (to remove ourselves from being bound to a build
> environment), which codifies the data items we wish to collect for
> debugging.  For each items we need a case statement of the form:
> switch $PLATFORM {
>   CASE BSD:
>   
>   CASE LINUX:
>   
>   CASE OSV:
>   
> }
> 
> Where each case either cats a file or runs an appropriate tool (making the
> appropriate check for its avilability when needed).
> 
> Or
> 
> 2) Document the kind of data that we need when debugging, and make
> suggestions in said document for what types of tools/files might provide
> that data, and leaving it up to users to do the collection on their own.
> 
> Given that we are likely to be talking about developers here, I'm inclined
> to go with option 2, given that its less maintenence to keep up with.
> 

Hi Neil,

I think you are probably right that documentation is the way to deal with this. 

I'll drop the patch and submit a checklist document with information that 
should be supplied when reporting bugs. It doesn't have to be added to the DPDK 
docs. It could be added to dpdk.org or just live in an email on the mailing 
list that we can point people to.

The main goal is to avoid having to pull relevant information out of people 
over a series of emails. Perhaps it may prove not to be necessary in practice.

I can add sample shell scripts for Linux/FreeBSD at the end of the doc, to 
cover Oliver's suggestion about consistency of reporting. Users of other OSes 
can add similar text if they think it is useful.

John




[dpdk-dev] [PATCH v2 0/7] Hyperv PMD patches

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

This is update/rebase of Hyper-V poll mode driver based on current DPDK
upstream. No functional changes in this version, just patch conflict resolution.

Stephen Hemminger (7):
  ether: add function to query for link state interrupt
  pmd: change drivers initialization for pci
  hv: add basic vmbus support
  hv: uio driver
  hv: poll mode driver
  hv: enable driver in common config
  hv: add kernel patch

 config/common_linuxapp |9 +
 lib/Makefile   |1 +
 lib/librte_eal/common/Makefile |2 +-
 lib/librte_eal/common/eal_common_options.c |5 +
 lib/librte_eal/common/eal_internal_cfg.h   |1 +
 lib/librte_eal/common/eal_options.h|2 +
 lib/librte_eal/common/eal_private.h|   10 +
 lib/librte_eal/common/include/rte_vmbus.h  |  153 ++
 lib/librte_eal/linuxapp/Makefile   |3 +
 lib/librte_eal/linuxapp/eal/Makefile   |3 +
 lib/librte_eal/linuxapp/eal/eal.c  |   11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c|  639 
 lib/librte_eal/linuxapp/hv_uio/Makefile|   57 +
 lib/librte_eal/linuxapp/hv_uio/hv_uio.c|  551 +++
 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h|  907 +++
 .../linuxapp/hv_uio/vmbus-get-pages.patch  |   53 +
 lib/librte_ether/rte_ethdev.c  |   98 +-
 lib/librte_ether/rte_ethdev.h  |   22 +-
 lib/librte_pmd_e1000/em_ethdev.c   |2 +-
 lib/librte_pmd_e1000/igb_ethdev.c  |4 +-
 lib/librte_pmd_enic/enic_ethdev.c  |2 +-
 lib/librte_pmd_fm10k/fm10k_ethdev.c|2 +-
 lib/librte_pmd_hyperv/Makefile |   28 +
 lib/librte_pmd_hyperv/hyperv.h |  169 ++
 lib/librte_pmd_hyperv/hyperv_drv.c | 1660 
 lib/librte_pmd_hyperv/hyperv_drv.h |  558 +++
 lib/librte_pmd_hyperv/hyperv_ethdev.c  |  334 
 lib/librte_pmd_hyperv/hyperv_logs.h|   68 +
 lib/librte_pmd_hyperv/hyperv_rxtx.c|  402 +
 lib/librte_pmd_hyperv/hyperv_rxtx.h|   35 +
 lib/librte_pmd_i40e/i40e_ethdev.c  |2 +-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c   |2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c|4 +-
 lib/librte_pmd_virtio/virtio_ethdev.c  |2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c|2 +-
 mk/rte.app.mk  |4 +
 36 files changed, 5790 insertions(+), 17 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hv_uio.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
 create mode 100644 lib/librte_pmd_hyperv/Makefile
 create mode 100644 lib/librte_pmd_hyperv/hyperv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_ethdev.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_logs.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.h

-- 
2.1.4



[dpdk-dev] [PATCH v2 1/7] ether: add function to query for link state interrupt

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

Allow application to query whether link state will work.
This is also part of abstracting dependency on PCI.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_ether/rte_ethdev.c | 14 ++
 lib/librte_ether/rte_ethdev.h | 12 
 2 files changed, 26 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 03fce08..afe6923 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1376,6 +1376,20 @@ rte_eth_dev_start(uint8_t port_id)
return 0;
 }

+int
+rte_eth_has_link_state(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return 0;
+   }
+   dev = &rte_eth_devices[port_id];
+
+   return (dev->pci_dev->driver->drv_flags & RTE_PCI_DRV_INTR_LSC) != 0;
+}
+
 void
 rte_eth_dev_stop(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 21aa359..124117a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2037,6 +2037,18 @@ extern void rte_eth_link_get_nowait(uint8_t port_id,
struct rte_eth_link *link);

 /**
+ * Test whether device supports link state interrupt mode.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (1) if link state interrupt is supported
+ *   - (0) if link state interrupt is not supported
+ */
+extern int
+rte_eth_has_link_state(uint8_t port_id);
+
+/**
  * Retrieve the general I/O statistics of an Ethernet device.
  *
  * @param port_id
-- 
2.1.4



[dpdk-dev] [PATCH v2 2/7] pmd: change drivers initialization for pci

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

The change to generic ether device structure to support multiple
bus types requires a change to all existing PMD but only in the
initialization (and the change is backwards compatiable).

Signed-off-by: Stephen Hemminger 
`
---
 lib/librte_pmd_e1000/em_ethdev.c| 2 +-
 lib/librte_pmd_e1000/igb_ethdev.c   | 4 ++--
 lib/librte_pmd_enic/enic_ethdev.c   | 2 +-
 lib/librte_pmd_i40e/i40e_ethdev.c   | 2 +-
 lib/librte_pmd_i40e/i40e_ethdev_vf.c| 2 +-
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 4 ++--
 lib/librte_pmd_virtio/virtio_ethdev.c   | 2 +-
 lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c | 2 +-
 8 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/lib/librte_pmd_e1000/em_ethdev.c b/lib/librte_pmd_e1000/em_ethdev.c
index 76f45c9..c249528 100644
--- a/lib/librte_pmd_e1000/em_ethdev.c
+++ b/lib/librte_pmd_e1000/em_ethdev.c
@@ -281,7 +281,7 @@ eth_em_dev_init(struct rte_eth_dev *eth_dev)
 }

 static struct eth_driver rte_em_pmd = {
-   {
+   .pci_drv = {
.name = "rte_em_pmd",
.id_table = pci_id_em_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/lib/librte_pmd_e1000/igb_ethdev.c 
b/lib/librte_pmd_e1000/igb_ethdev.c
index 49843c1..e90d2e1 100644
--- a/lib/librte_pmd_e1000/igb_ethdev.c
+++ b/lib/librte_pmd_e1000/igb_ethdev.c
@@ -680,7 +680,7 @@ eth_igbvf_dev_init(struct rte_eth_dev *eth_dev)
 }

 static struct eth_driver rte_igb_pmd = {
-   {
+   .pci_drv = {
.name = "rte_igb_pmd",
.id_table = pci_id_igb_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -693,7 +693,7 @@ static struct eth_driver rte_igb_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_igbvf_pmd = {
-   {
+   .pci_drv = {
.name = "rte_igbvf_pmd",
.id_table = pci_id_igbvf_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_enic/enic_ethdev.c 
b/lib/librte_pmd_enic/enic_ethdev.c
index 4950ede..b5de9ce 100644
--- a/lib/librte_pmd_enic/enic_ethdev.c
+++ b/lib/librte_pmd_enic/enic_ethdev.c
@@ -579,7 +579,7 @@ static int eth_enicpmd_dev_init(struct rte_eth_dev *eth_dev)
 }

 static struct eth_driver rte_enic_pmd = {
-   {
+   .pci_drv = {
.name = "rte_enic_pmd",
.id_table = pci_id_enic_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c 
b/lib/librte_pmd_i40e/i40e_ethdev.c
index cf6685e..a6d6e6b 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev.c
@@ -265,7 +265,7 @@ static struct eth_dev_ops i40e_eth_dev_ops = {
 };

 static struct eth_driver rte_i40e_pmd = {
-   {
+   .pci_drv = {
.name = "rte_i40e_pmd",
.id_table = pci_id_i40e_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
diff --git a/lib/librte_pmd_i40e/i40e_ethdev_vf.c 
b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
index c985e4a..09a4a37 100644
--- a/lib/librte_pmd_i40e/i40e_ethdev_vf.c
+++ b/lib/librte_pmd_i40e/i40e_ethdev_vf.c
@@ -1201,7 +1201,7 @@ i40evf_dev_init(struct rte_eth_dev *eth_dev)
  * virtual function driver struct
  */
 static struct eth_driver rte_i40evf_pmd = {
-   {
+   .pci_drv = {
.name = "rte_i40evf_pmd",
.id_table = pci_id_i40evf_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 92d75db..5e655a5 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -1088,7 +1088,7 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
 }

 static struct eth_driver rte_ixgbe_pmd = {
-   {
+   .pci_drv = {
.name = "rte_ixgbe_pmd",
.id_table = pci_id_ixgbe_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
@@ -1101,7 +1101,7 @@ static struct eth_driver rte_ixgbe_pmd = {
  * virtual function driver struct
  */
 static struct eth_driver rte_ixgbevf_pmd = {
-   {
+   .pci_drv = {
.name = "rte_ixgbevf_pmd",
.id_table = pci_id_ixgbevf_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING,
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 603be2d..68c5b3c 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1227,7 +1227,7 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
 }

 static struct eth_driver rte_virtio_pmd = {
-   {
+   .pci_drv = {
.name = "rte_virtio_pmd",
.id_table = pci_id_virtio_map,
},
diff --git a/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c 
b/lib/librte_pmd_vmxnet3/vmxnet3_ethdev.c
index 458dce5..bd60602 100644
--- a/lib/librte_pmd_vmxnet3/vmxnet

[dpdk-dev] [PATCH v2 3/7] hv: add basic vmbus support

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

The hyper-v device driver forces the base EAL code to change
to support multiple bus types. This is done changing the pci_device
in ether driver to a generic union.

As much as possible this is done in a backwards source compatiable
way. It will break ABI for device drivers.

Signed-off-by: Stephen Hemminger 
---
 lib/librte_eal/common/Makefile |   2 +-
 lib/librte_eal/common/eal_common_options.c |   5 +
 lib/librte_eal/common/eal_internal_cfg.h   |   1 +
 lib/librte_eal/common/eal_options.h|   2 +
 lib/librte_eal/common/eal_private.h|  10 +
 lib/librte_eal/common/include/rte_vmbus.h  | 153 +++
 lib/librte_eal/linuxapp/eal/Makefile   |   3 +
 lib/librte_eal/linuxapp/eal/eal.c  |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c| 639 +
 lib/librte_ether/rte_ethdev.c  |  84 +++-
 lib/librte_ether/rte_ethdev.h  |  10 +-
 lib/librte_pmd_fm10k/fm10k_ethdev.c|   2 +-
 12 files changed, 915 insertions(+), 7 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index 3ea3bbf..202485e 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk

 INC := rte_branch_prediction.h rte_common.h
 INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
-INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
+INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
 INC += rte_pci_dev_ids.h rte_per_lcore.h rte_random.h
 INC += rte_rwlock.h rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_version.h
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 8fcb1ab..76a3394 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -80,6 +80,7 @@ eal_long_options[] = {
{OPT_NO_HPET,   0, NULL, OPT_NO_HPET_NUM  },
{OPT_NO_HUGE,   0, NULL, OPT_NO_HUGE_NUM  },
{OPT_NO_PCI,0, NULL, OPT_NO_PCI_NUM   },
+   {OPT_NO_VMBUS,  0, NULL, OPT_NO_VMBUS_NUM },
{OPT_NO_SHCONF, 0, NULL, OPT_NO_SHCONF_NUM},
{OPT_PCI_BLACKLIST, 1, NULL, OPT_PCI_BLACKLIST_NUM},
{OPT_PCI_WHITELIST, 1, NULL, OPT_PCI_WHITELIST_NUM},
@@ -726,6 +727,10 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_pci = 1;
break;

+   case OPT_NO_VMBUS_NUM:
+   conf->no_vmbus = 1;
+   break;
+
case OPT_NO_HPET_NUM:
conf->no_hpet = 1;
break;
diff --git a/lib/librte_eal/common/eal_internal_cfg.h 
b/lib/librte_eal/common/eal_internal_cfg.h
index e2ecb0d..0e7de34 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -66,6 +66,7 @@ struct internal_config {
volatile unsigned no_hugetlbfs;   /**< true to disable hugetlbfs */
volatile unsigned xen_dom0_support; /**< support app running on Xen 
Dom0*/
volatile unsigned no_pci; /**< true to disable PCI */
+   volatile unsigned no_vmbus;   /**< true to disable VMBUS */
volatile unsigned no_hpet;/**< true to disable HPET */
volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping

* instead of native TSC */
diff --git a/lib/librte_eal/common/eal_options.h 
b/lib/librte_eal/common/eal_options.h
index f6714d9..54f03dc 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -67,6 +67,8 @@ enum {
OPT_NO_HUGE_NUM,
 #define OPT_NO_PCI"no-pci"
OPT_NO_PCI_NUM,
+#define OPT_NO_VMBUS  "no-vmbus"
+   OPT_NO_VMBUS_NUM,
 #define OPT_NO_SHCONF "no-shconf"
OPT_NO_SHCONF_NUM,
 #define OPT_SOCKET_MEM"socket-mem"
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 4acf5a0..039e9f3 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -180,6 +180,16 @@ int rte_eal_pci_close_one_driver(struct rte_pci_driver *dr,
struct rte_pci_device *dev);

 /**
+ * VMBUS related functions and structures
+ */
+int rte_eal_vmbus_init(void);
+
+struct rte_vmbus_driver;
+struct rte_vmbus_device;
+
+int rte_eal_vmbus_probe_one_driver(struct rte_vmbus_driver *dr,
+   struct rte_vmbus_device *dev);
+/**
  * Init tail queues for non-EAL library structures. This is to allow
  * the rings, mempools, etc. lists to be shared among multiple processes
  *
diff --git a/lib/librte_eal/common/include/rte_vmbus.h 
b/lib/librte_eal

[dpdk-dev] [PATCH v2 4/7] hv: uio driver

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

Add new UIO driver in kernel to support DPDK Poll Mode Driver.

Signed-off-by: Stas Egorov 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_eal/linuxapp/Makefile|   3 +
 lib/librte_eal/linuxapp/hv_uio/Makefile |  57 ++
 lib/librte_eal/linuxapp/hv_uio/hv_uio.c | 551 +
 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h | 907 
 4 files changed, 1518 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/Makefile
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hv_uio.c
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/hyperv_net.h

diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
index 8fcfdf6..a28d289 100644
--- a/lib/librte_eal/linuxapp/Makefile
+++ b/lib/librte_eal/linuxapp/Makefile
@@ -41,5 +41,8 @@ endif
 ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y)
 DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0
 endif
+ifeq ($(CONFIG_RTE_LIBRTE_HV_PMD),y)
+DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += hv_uio
+endif

 include $(RTE_SDK)/mk/rte.subdir.mk
diff --git a/lib/librte_eal/linuxapp/hv_uio/Makefile 
b/lib/librte_eal/linuxapp/hv_uio/Makefile
new file mode 100644
index 000..2ed7771
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/Makefile
@@ -0,0 +1,57 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
+#   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+# * Redistributions of source code must retain the above copyright
+#   notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+#   notice, this list of conditions and the following disclaimer in
+#   the documentation and/or other materials provided with the
+#   distribution.
+# * Neither the name of Intel Corporation nor the names of its
+#   contributors may be used to endorse or promote products derived
+#   from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# module name and path
+#
+MODULE = hv_uio
+MODULE_PATH = drivers/net/hv_uio
+
+#
+# CFLAGS
+#
+MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=100
+MODULE_CFLAGS += -I$(RTE_OUTPUT)/include
+MODULE_CFLAGS += -Winline -Wall -Werror
+MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h
+ifeq ($(CONFIG_RTE_LIBRTE_HV_DEBUG),y)
+MODULE_CFLAGS += -DDBG
+endif
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-y := hv_uio.c
+
+include $(RTE_SDK)/mk/rte.module.mk
diff --git a/lib/librte_eal/linuxapp/hv_uio/hv_uio.c 
b/lib/librte_eal/linuxapp/hv_uio/hv_uio.c
new file mode 100644
index 000..4cac075
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/hv_uio.c
@@ -0,0 +1,551 @@
+/*
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, see .
+ *
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hyperv_net.h"
+
+#define HV_DEVICE_ADD0
+#define HV_DEVICE_REMOVE 1
+#define HV_RING_SIZE512
+
+static uint mtu = ETH_DATA_LEN;
+/*
+ * List of resources to be mapped to uspace
+ * can be extended up to MAX_UIO_MAPS(5) items
+ */
+enum {
+   TXRX_RING_MAP,
+   INT_PAGE_MAP,
+   MON_PAGE_MAP,
+   RECV_BUF_MAP
+};
+
+struct hyperv_private_data {
+   struct netvsc

[dpdk-dev] [PATCH v2 5/7] hv: poll mode driver

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

This is new Poll Mode driver for using hyper-v virtual network
interface.

Signed-off-by: Stas Egorov 
Signed-off-by: Stephen Hemminger 
---
 lib/Makefile  |1 +
 lib/librte_pmd_hyperv/Makefile|   28 +
 lib/librte_pmd_hyperv/hyperv.h|  169 
 lib/librte_pmd_hyperv/hyperv_drv.c| 1660 +
 lib/librte_pmd_hyperv/hyperv_drv.h|  558 +++
 lib/librte_pmd_hyperv/hyperv_ethdev.c |  334 +++
 lib/librte_pmd_hyperv/hyperv_logs.h   |   68 ++
 lib/librte_pmd_hyperv/hyperv_rxtx.c   |  402 
 lib/librte_pmd_hyperv/hyperv_rxtx.h   |   35 +
 mk/rte.app.mk |4 +
 10 files changed, 3259 insertions(+)
 create mode 100644 lib/librte_pmd_hyperv/Makefile
 create mode 100644 lib/librte_pmd_hyperv/hyperv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_drv.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_ethdev.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_logs.h
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.c
 create mode 100644 lib/librte_pmd_hyperv/hyperv_rxtx.h

diff --git a/lib/Makefile b/lib/Makefile
index d94355d..6c1daf2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += librte_pmd_i40e
 DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += librte_pmd_fm10k
 DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += librte_pmd_mlx4
 DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += librte_pmd_enic
+DIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += librte_pmd_hyperv
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring
 DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap
diff --git a/lib/librte_pmd_hyperv/Makefile b/lib/librte_pmd_hyperv/Makefile
new file mode 100644
index 000..4ba08c8
--- /dev/null
+++ b/lib/librte_pmd_hyperv/Makefile
@@ -0,0 +1,28 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2013-2015 Brocade Communications Systems, Inc.
+#   All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_hyperv.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_HV_PMD) += hyperv_drv.c
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_eal lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_mempool lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_HV_PMD) += lib/librte_malloc
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_pmd_hyperv/hyperv.h b/lib/librte_pmd_hyperv/hyperv.h
new file mode 100644
index 000..b011b6d
--- /dev/null
+++ b/lib/librte_pmd_hyperv/hyperv.h
@@ -0,0 +1,169 @@
+/*-
+ * Copyright (c) 2013-2015 Brocade Communications Systems, Inc.
+ * All rights reserved.
+ */
+
+#ifndef _HYPERV_H_
+#define _HYPERV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hyperv_logs.h"
+
+#define PAGE_SHIFT 12
+#define PAGE_SIZE  (1 << PAGE_SHIFT)
+
+/*
+ * Tunable ethdev params
+ */
+#define HV_MIN_RX_BUF_SIZE 1024
+#define HV_MAX_RX_PKT_LEN  4096
+#define HV_MAX_MAC_ADDRS   1
+#define HV_MAX_RX_QUEUES   1
+#define HV_MAX_TX_QUEUES   1
+#define HV_MAX_PKT_BURST   32
+#define HV_MAX_LINK_REQ10
+
+/*
+ * List of resources mapped from kspace
+ * need to be the same as defined in hv_uio.c
+ */
+enum {
+   TXRX_RING_MAP,
+   INT_PAGE_MAP,
+   MON_PAGE_MAP,
+   RECV_BUF_MAP
+};
+
+/*
+ * Statistics
+ */
+struct hv_stats {
+   uint64_t opkts;
+   uint64_t obytes;
+   uint64_t oerrors;
+
+   uint64_t ipkts;
+   uint64_t ibytes;
+   uint64_t ierrors;
+   uint64_t rx_nombuf;
+};
+
+struct hv_data;
+struct netvsc_packet;
+struct rndis_msg;
+typedef void (*receive_callback_t)(struct hv_data *hv, struct rndis_msg *msg,
+   struct netvsc_packet *pkt);
+
+/*
+ * Main driver structure
+ */
+struct hv_data {
+   int vmbus_device;
+   uint8_t monitor_bit;
+   uint8_t monitor_group;
+   uint8_t kernel_initialized;
+   int uio_fd;
+   /* Flag indicates channel state. If closed, RX/TX shouldn't work 
further */
+   uint8_t closed;
+   /* Flag indicates whether HALT rndis request was received by host */
+   uint8_t hlt_req_sent;
+   /* Flag indicates pending state for HALT request */
+   uint8_t hlt_req_pending;
+   /* Counter for RNDIS requests */
+   uint32_t new_request_id;
+   /* State of RNDIS device */
+   uint8_t rndis_dev_state;
+   /* Number of transmitted packets but not completed yet by Hyper-V */
+   int num_outstanding_sends;
+   /* Max pkt len to fit in rx mbufs */
+   uint32_t max_rx_pkt_len;
+
+   uint8_t jumbo_frame_support;
+
+   struct hv_vmbus_ring_buffer *in;
+   struct

[dpdk-dev] [PATCH v2 6/7] hv: enable driver in common config

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

Add hyperv driver config to enable it.

Signed-off-by: Stephen Hemminger 
---
 config/common_linuxapp | 9 +
 1 file changed, 9 insertions(+)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 09a58ac..96d1be7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -233,6 +233,15 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
 CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_DRIVER=n

 #
+# Compile burst-mode Hyperv PMD driver
+#
+CONFIG_RTE_LIBRTE_HV_PMD=y
+CONFIG_RTE_LIBRTE_HV_DEBUG=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_HV_DEBUG_TX=n
+
+#
 # Compile example software rings based PMD
 #
 CONFIG_RTE_LIBRTE_PMD_RING=y
-- 
2.1.4



[dpdk-dev] [PATCH v2 7/7] hv: add kernel patch

2015-03-25 Thread Stephen Hemminger
From: Stephen Hemminger 

For users using non latest kernels, put kernel patch in for
them to use.

Signed-off-by: Stephen Hemminger 
---
 .../linuxapp/hv_uio/vmbus-get-pages.patch  | 53 ++
 1 file changed, 53 insertions(+)
 create mode 100644 lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch

diff --git a/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch 
b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
new file mode 100644
index 000..e1a4b13
--- /dev/null
+++ b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
@@ -0,0 +1,53 @@
+hyper-v: allow access to vmbus from userspace driver
+
+This is patch from  to allow access to hyper-v vmbus from UIO driver.
+
+Signed-off-by: Stas Egorov 
+Signed-off-by: Stephen Hemminger 
+
+---
+v2 - simplify and rename to vmbus_get_monitor_pages
+
+ drivers/hv/connection.c |   20 +---
+ include/linux/hyperv.h  |3 +++
+ 2 files changed, 20 insertions(+), 3 deletions(-)
+
+--- a/drivers/hv/connection.c
 b/drivers/hv/connection.c
+@@ -64,6 +64,15 @@
+   }
+ }
+ 
++void vmbus_get_monitor_pages(unsigned long *int_page,
++   unsigned long monitor_pages[2])
++{
++  *int_page = (unsigned long)vmbus_connection.int_page;
++  monitor_pages[0] = (unsigned long)vmbus_connection.monitor_pages[0];
++  monitor_pages[1] = (unsigned long)vmbus_connection.monitor_pages[1];
++}
++EXPORT_SYMBOL_GPL(vmbus_get_monitor_pages);
++
+ static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
+   __u32 version)
+ {
+@@ -327,8 +336,6 @@
+   else
+   bytes_to_read = 0;
+   } while (read_state && (bytes_to_read != 0));
+-  } else {
+-  pr_err("no channel callback for relid - %u\n", relid);
+   }
+ 
+   spin_unlock_irqrestore(&channel->inbound_lock, flags);
+--- a/include/linux/hyperv.h
 b/include/linux/hyperv.h
+@@ -1162,6 +1162,9 @@
+ 
+ extern void vmbus_ontimer(unsigned long data);
+ 
++extern void vmbus_get_monitor_pages(unsigned long *int_page,
++  unsigned long monitor_pages[2]);
++
+ /* Base driver object */
+ struct hv_driver {
+   const char *name;
-- 
2.1.4



[dpdk-dev] Kernel deadlock due to rte_kni

2015-03-25 Thread Dey, Souvik
Hi All,
There looks like an issue will rte_kni.ko which gets kernel 
into deadlock. We are trying to run rte_kni.ko with multiple thread support 
which are pinned to different non-isolated cores. When we test with tcp/tls the 
kernel is getting hanged in on race condition. Below is the kernel stack trace.

PID: 19942  TASK: 880227a71950  CPU: 3   COMMAND: "CE_2N_Comp_SamP"
#0 [88043fd87ec0] crash_nmi_callback at 8101d4a8
-- MORE --  forward: ,  or j  backward: b or k  quit: q
#1 [88043fd87ed0] notifier_call_chain at 81055b68
#2 [88043fd87f00] notify_die at 81055be0
#3 [88043fd87f30] do_nmi at 81009ddd
#4 [88043fd87f50] nmi at 812ea9d0
[exception RIP: _raw_spin_lock_bh+25]
RIP: 812ea2a4  RSP: 880189439c88  RFLAGS: 0293
RAX: 5b59  RBX: 880291708ec8  RCX: 045a
RDX: 880189439d90  RSI:   RDI: 880291708ec8
RBP: 880291708e80   R8: 047fef78   R9: 0001
R10: 0009  R11: 8126c658  R12: 880423799a40
R13: 880189439e08  R14: 045a  R15: 0017
ORIG_RAX:   CS: 0010  SS: 0018
---  ---
#5 [880189439c88] _raw_spin_lock_bh at 812ea2a4
#6 [880189439c90] lock_sock_nested at 8122e948
#7 [880189439ca0] tcp_sendmsg at 8126c676
#8 [880189439d50] sock_aio_write at 8122bb12
#9 [880189439e00] do_sync_write at 810c61c6
#10 [880189439f10] vfs_write at 810c68a9
#11 [880189439f40] sys_write at 810c6dfe
#12 [880189439f80] system_call_fastpath at 812eab92
RIP: 7fc7909bc0ed  RSP: 7fc787ffe108  RFLAGS: 0202
RAX: 0001  RBX: 812eab92  RCX: 7fc7880aa170
RDX: 045a  RSI: 04d56546  RDI: 002b
RBP: 04d56546   R8: 047fef78   R9: 0001
R10: 0009  R11: 0293  R12: 04d56546
R13: 0483de10  R14: 045a  R15: 0001880008b0
ORIG_RAX: 0001  CS: 0033  SS: 002b

PID: 3598   TASK: 88043db21310  CPU: 1   COMMAND: "kni_pkt0"
#0 [88043fc87ec0] crash_nmi_callback at 8101d4a8
#1 [88043fc87ed0] notifier_call_chain at 81055b68
#2 [88043fc87f00] notify_die at 81055be0
#3 [88043fc87f30] do_nmi at 81009ddd
#4 [88043fc87f50] nmi at 812ea9d0
[exception RIP: _raw_spin_lock+16]
RIP: 812ea0b1  RSP: 88043fc83e78  RFLAGS: 0297
RAX: 5a59  RBX: 880291708e80  RCX: 0001
RDX: 88043fc83ec0  RSI: 2f82  RDI: 880291708ec8
RBP: 88043d8f4000   R8: 813a8d20   R9: 0001
R10: 88043d9d8098  R11: 8101e62a  R12: 81279a3c
R13: 88042d9b3fd8  R14: 880291708e80  R15: 88043fc83ec0
ORIG_RAX:   CS: 0010  SS: 0018
---  ---
#5 [88043fc83e78] _raw_spin_lock at 812ea0b1
#6 [88043fc83e78] tcp_delack_timer at 81279a4e
#7 [88043fc83e98] run_timer_softirq at 8104642d
#8 [88043fc83f08] __do_softirq at 81041539
#9 [88043fc83f48] call_softirq at 812ebd9c
#10 [88043fc83f60] do_softirq at 8100b037
#11 [88043fc83f80] irq_exit at 8104185a
#12 [88043fc83f90] smp_apic_timer_interrupt at 8101eaef
#13 [88043fc83fb0] apic_timer_interrupt at 812eb553
---  ---
#14 [88042d9b3ae8] apic_timer_interrupt at 812eb553
[exception RIP: tcp_rcv_established+1732]
RIP: 812756ab  RSP: 88042d9b3b90  RFLAGS: 0202
RAX: 0020  RBX: 88042d86f470  RCX: 020a
RDX: 8801fc163864  RSI: 88032f8b2380  RDI: 880291708e80
RBP: 88032f8b2380   R8: 88032f8b2380   R9: 81327a60
R10: 000e  R11: 8112ae8f  R12: 812eb54e
R13: 8122ea88  R14: 010c  R15: 05a8
-- MORE --  forward: ,  or j  backward: b or k  quit: q
ORIG_RAX: ff10  CS: 0010  SS: 0018
#15 [88042d9b3bd8] tcp_v4_do_rcv at 8127b483
#16 [88042d9b3c48] tcp_v4_rcv at 8127d89f
#17 [88042d9b3cb8] ip_local_deliver_finish at 81260cc2
#18 [88042d9b3cd8] __netif_receive_skb at 81239de6
#19 [88042d9b3d28] netif_receive_skb at 81239e7f
#20 [88042d9b3d58] kni_net_rx_normal at a022b06f [rte_kni]
#21 [88042d9b3ec8] kni_thread_multiple at a022a2df [rte_kni]
#22 [88042d9b3ee8] kthread at 81051b27
#23 [88042d9b3f48] kernel_thread_helper at 812ebca4


One further investigation, I found that in the file kni_net.c , function 
kni_net_rx_normal(), we are calling netif_receive_skb(), which normally is 
called in the softirq context in the kernel but in this case

[dpdk-dev] [ovs-dev] ovs-dpdk: ofpbuf reinitialization

2015-03-25 Thread Pravin Shelar
On Wed, Mar 25, 2015 at 12:25 PM, Zoltan Kiss  wrote:
> Hi,
>
> Looking around in the DPDK code I've found that it only initializes the
> packet metadata (whih contains the struct ofpbuf belonging to the packet)
> during setup, as the packet initializer of rte_mempool_create.
> That means that every time a packet buffer is released back by OVS to the
> buffer pool, it retains ofpbuf state, and it doesn't change when the poll
> mode driver use the buffer again to store a new packet. "source" and
> "allocated members of ofpbuf shouldn't change, but frame, l2_pad_size and
> the offsets does at various places. Even though I couldn't establish an
> error scenario yet, I think it's quite dangerous to leave the packet to
> inherit the previous packet's ofpbuf.
> Or am I missing some place where this piece is reinitialized?
>

l2_pad and offsets are initialized during flow extraction. These
fields should not be accessed before this step.


[dpdk-dev] Kernel deadlock due to rte_kni

2015-03-25 Thread Jay Rolette
http://patchwork.dpdk.org/ml/archives/dev/2015-February/013335.html

Jay

On Wed, Mar 25, 2015 at 2:39 PM, Dey, Souvik  wrote:

> Hi All,
> There looks like an issue will rte_kni.ko which gets
> kernel into deadlock. We are trying to run rte_kni.ko with multiple thread
> support which are pinned to different non-isolated cores. When we test with
> tcp/tls the kernel is getting hanged in on race condition. Below is the
> kernel stack trace.
>
> PID: 19942  TASK: 880227a71950  CPU: 3   COMMAND: "CE_2N_Comp_SamP"
> #0 [88043fd87ec0] crash_nmi_callback at 8101d4a8
> -- MORE --  forward: ,  or j  backward: b or k  quit: q
> #1 [88043fd87ed0] notifier_call_chain at 81055b68
> #2 [88043fd87f00] notify_die at 81055be0
> #3 [88043fd87f30] do_nmi at 81009ddd
> #4 [88043fd87f50] nmi at 812ea9d0
> [exception RIP: _raw_spin_lock_bh+25]
> RIP: 812ea2a4  RSP: 880189439c88  RFLAGS: 0293
> RAX: 5b59  RBX: 880291708ec8  RCX: 045a
> RDX: 880189439d90  RSI:   RDI: 880291708ec8
> RBP: 880291708e80   R8: 047fef78   R9: 0001
> R10: 0009  R11: 8126c658  R12: 880423799a40
> R13: 880189439e08  R14: 045a  R15: 0017
> ORIG_RAX:   CS: 0010  SS: 0018
> ---  ---
> #5 [880189439c88] _raw_spin_lock_bh at 812ea2a4
> #6 [880189439c90] lock_sock_nested at 8122e948
> #7 [880189439ca0] tcp_sendmsg at 8126c676
> #8 [880189439d50] sock_aio_write at 8122bb12
> #9 [880189439e00] do_sync_write at 810c61c6
> #10 [880189439f10] vfs_write at 810c68a9
> #11 [880189439f40] sys_write at 810c6dfe
> #12 [880189439f80] system_call_fastpath at 812eab92
> RIP: 7fc7909bc0ed  RSP: 7fc787ffe108  RFLAGS: 0202
> RAX: 0001  RBX: 812eab92  RCX: 7fc7880aa170
> RDX: 045a  RSI: 04d56546  RDI: 002b
> RBP: 04d56546   R8: 047fef78   R9: 0001
> R10: 0009  R11: 0293  R12: 04d56546
> R13: 0483de10  R14: 045a  R15: 0001880008b0
> ORIG_RAX: 0001  CS: 0033  SS: 002b
>
> PID: 3598   TASK: 88043db21310  CPU: 1   COMMAND: "kni_pkt0"
> #0 [88043fc87ec0] crash_nmi_callback at 8101d4a8
> #1 [88043fc87ed0] notifier_call_chain at 81055b68
> #2 [88043fc87f00] notify_die at 81055be0
> #3 [88043fc87f30] do_nmi at 81009ddd
> #4 [88043fc87f50] nmi at 812ea9d0
> [exception RIP: _raw_spin_lock+16]
> RIP: 812ea0b1  RSP: 88043fc83e78  RFLAGS: 0297
> RAX: 5a59  RBX: 880291708e80  RCX: 0001
> RDX: 88043fc83ec0  RSI: 2f82  RDI: 880291708ec8
> RBP: 88043d8f4000   R8: 813a8d20   R9: 0001
> R10: 88043d9d8098  R11: 8101e62a  R12: 81279a3c
> R13: 88042d9b3fd8  R14: 880291708e80  R15: 88043fc83ec0
> ORIG_RAX:   CS: 0010  SS: 0018
> ---  ---
> #5 [88043fc83e78] _raw_spin_lock at 812ea0b1
> #6 [88043fc83e78] tcp_delack_timer at 81279a4e
> #7 [88043fc83e98] run_timer_softirq at 8104642d
> #8 [88043fc83f08] __do_softirq at 81041539
> #9 [88043fc83f48] call_softirq at 812ebd9c
> #10 [88043fc83f60] do_softirq at 8100b037
> #11 [88043fc83f80] irq_exit at 8104185a
> #12 [88043fc83f90] smp_apic_timer_interrupt at 8101eaef
> #13 [88043fc83fb0] apic_timer_interrupt at 812eb553
> ---  ---
> #14 [88042d9b3ae8] apic_timer_interrupt at 812eb553
> [exception RIP: tcp_rcv_established+1732]
> RIP: 812756ab  RSP: 88042d9b3b90  RFLAGS: 0202
> RAX: 0020  RBX: 88042d86f470  RCX: 020a
> RDX: 8801fc163864  RSI: 88032f8b2380  RDI: 880291708e80
> RBP: 88032f8b2380   R8: 88032f8b2380   R9: 81327a60
> R10: 000e  R11: 8112ae8f  R12: 812eb54e
> R13: 8122ea88  R14: 010c  R15: 05a8
> -- MORE --  forward: ,  or j  backward: b or k  quit: q
> ORIG_RAX: ff10  CS: 0010  SS: 0018
> #15 [88042d9b3bd8] tcp_v4_do_rcv at 8127b483
> #16 [88042d9b3c48] tcp_v4_rcv at 8127d89f
> #17 [88042d9b3cb8] ip_local_deliver_finish at 81260cc2
> #18 [88042d9b3cd8] __netif_receive_skb at 81239de6
> #19 [88042d9b3d28] netif_receive_skb at 81239e7f
> #20 [88042d9b3d58] kni_net_rx_normal at a022b06f [rte_kni]
> #21 [88042d9b3ec8] kni_thread_multiple at a022a2df [rte_kni]
> #22 [88042d9b3ee8] kthread at ff

[dpdk-dev] Kernel deadlock due to rte_kni

2015-03-25 Thread Dey, Souvik
Thanks Jay. Can you please point me to the release of DPDK where it is fixed as 
I am currently using DPDK1.6, I might need to back port the fix. Or just 
replacing the netif_receive_skb() call with netif_rx() is good enough.



--

Regards,

Souvik

From: Jay Rolette [mailto:role...@infiniteio.com]
Sent: Thursday, March 26, 2015 1:41 AM
To: Dey, Souvik
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Kernel deadlock due to rte_kni

http://patchwork.dpdk.org/ml/archives/dev/2015-February/013335.html

Jay

On Wed, Mar 25, 2015 at 2:39 PM, Dey, Souvik mailto:sodey at sonusnet.com>> wrote:
Hi All,
There looks like an issue will rte_kni.ko which gets kernel 
into deadlock. We are trying to run rte_kni.ko with multiple thread support 
which are pinned to different non-isolated cores. When we test with tcp/tls the 
kernel is getting hanged in on race condition. Below is the kernel stack trace.

PID: 19942  TASK: 880227a71950  CPU: 3   COMMAND: "CE_2N_Comp_SamP"
#0 [88043fd87ec0] crash_nmi_callback at 8101d4a8
-- MORE --  forward: ,  or j  backward: b or k  quit: q
#1 [88043fd87ed0] notifier_call_chain at 81055b68
#2 [88043fd87f00] notify_die at 81055be0
#3 [88043fd87f30] do_nmi at 81009ddd
#4 [88043fd87f50] nmi at 812ea9d0
[exception RIP: _raw_spin_lock_bh+25]
RIP: 812ea2a4  RSP: 880189439c88  RFLAGS: 0293
RAX: 5b59  RBX: 880291708ec8  RCX: 045a
RDX: 880189439d90  RSI:   RDI: 880291708ec8
RBP: 880291708e80   R8: 047fef78   R9: 0001
R10: 0009  R11: 8126c658  R12: 880423799a40
R13: 880189439e08  R14: 045a  R15: 0017
ORIG_RAX:   CS: 0010  SS: 0018
---  ---
#5 [880189439c88] _raw_spin_lock_bh at 812ea2a4
#6 [880189439c90] lock_sock_nested at 8122e948
#7 [880189439ca0] tcp_sendmsg at 8126c676
#8 [880189439d50] sock_aio_write at 8122bb12
#9 [880189439e00] do_sync_write at 810c61c6
#10 [880189439f10] vfs_write at 810c68a9
#11 [880189439f40] sys_write at 810c6dfe
#12 [880189439f80] system_call_fastpath at 812eab92
RIP: 7fc7909bc0ed  RSP: 7fc787ffe108  RFLAGS: 0202
RAX: 0001  RBX: 812eab92  RCX: 7fc7880aa170
RDX: 045a  RSI: 04d56546  RDI: 002b
RBP: 04d56546   R8: 047fef78   R9: 0001
R10: 0009  R11: 0293  R12: 04d56546
R13: 0483de10  R14: 045a  R15: 0001880008b0
ORIG_RAX: 0001  CS: 0033  SS: 002b

PID: 3598   TASK: 88043db21310  CPU: 1   COMMAND: "kni_pkt0"
#0 [88043fc87ec0] crash_nmi_callback at 8101d4a8
#1 [88043fc87ed0] notifier_call_chain at 81055b68
#2 [88043fc87f00] notify_die at 81055be0
#3 [88043fc87f30] do_nmi at 81009ddd
#4 [88043fc87f50] nmi at 812ea9d0
[exception RIP: _raw_spin_lock+16]
RIP: 812ea0b1  RSP: 88043fc83e78  RFLAGS: 0297
RAX: 5a59  RBX: 880291708e80  RCX: 0001
RDX: 88043fc83ec0  RSI: 2f82  RDI: 880291708ec8
RBP: 88043d8f4000   R8: 813a8d20   R9: 0001
R10: 88043d9d8098  R11: 8101e62a  R12: 81279a3c
R13: 88042d9b3fd8  R14: 880291708e80  R15: 88043fc83ec0
ORIG_RAX:   CS: 0010  SS: 0018
---  ---
#5 [88043fc83e78] _raw_spin_lock at 812ea0b1
#6 [88043fc83e78] tcp_delack_timer at 81279a4e
#7 [88043fc83e98] run_timer_softirq at 8104642d
#8 [88043fc83f08] __do_softirq at 81041539
#9 [88043fc83f48] call_softirq at 812ebd9c
#10 [88043fc83f60] do_softirq at 8100b037
#11 [88043fc83f80] irq_exit at 8104185a
#12 [88043fc83f90] smp_apic_timer_interrupt at 8101eaef
#13 [88043fc83fb0] apic_timer_interrupt at 812eb553
---  ---
#14 [88042d9b3ae8] apic_timer_interrupt at 812eb553
[exception RIP: tcp_rcv_established+1732]
RIP: 812756ab  RSP: 88042d9b3b90  RFLAGS: 0202
RAX: 0020  RBX: 88042d86f470  RCX: 020a
RDX: 8801fc163864  RSI: 88032f8b2380  RDI: 880291708e80
RBP: 88032f8b2380   R8: 88032f8b2380   R9: 81327a60
R10: 000e  R11: 8112ae8f  R12: 812eb54e
R13: 8122ea88  R14: 010c  R15: 05a8
-- MORE --  forward: ,  or j  backward: b or k  quit: q
ORIG_RAX: ff10  CS: 0010  SS: 0018
#15 [88042d9b3bd8] tcp_v4_do_rcv at 8127b483
#16 [88042d9b3c48] tcp_v4_rcv at 8127d89f
#17 [88042d9b3cb8] ip_local_deliver_finish at 81260cc2
#18 [f

[dpdk-dev] [PATCH] examples/vhost: use library routines instead of local copies

2015-03-25 Thread Zoltan Kiss
This macro and function were copies from the mbuf library, no reason to keep
them.

Signed-off-by: Zoltan Kiss 
---
 examples/vhost/main.c | 38 +-
 1 file changed, 5 insertions(+), 33 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index c3fcb80..1c998a5 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -139,8 +139,6 @@
 /* Number of descriptors per cacheline. */
 #define DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct vring_desc))

-#define MBUF_EXT_MEM(mb)   (RTE_MBUF_FROM_BADDR((mb)->buf_addr) != (mb))
-
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

@@ -1538,32 +1536,6 @@ attach_rxmbuf_zcp(struct virtio_net *dev)
return;
 }

-/*
- * Detach an attched packet mbuf -
- *  - restore original mbuf address and length values.
- *  - reset pktmbuf data and data_len to their default values.
- *  All other fields of the given packet mbuf will be left intact.
- *
- * @param m
- *   The attached packet mbuf.
- */
-static inline void pktmbuf_detach_zcp(struct rte_mbuf *m)
-{
-   const struct rte_mempool *mp = m->pool;
-   void *buf = RTE_MBUF_TO_BADDR(m);
-   uint32_t buf_ofs;
-   uint32_t buf_len = mp->elt_size - sizeof(*m);
-   m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof(*m);
-
-   m->buf_addr = buf;
-   m->buf_len = (uint16_t)buf_len;
-
-   buf_ofs = (RTE_PKTMBUF_HEADROOM <= m->buf_len) ?
-   RTE_PKTMBUF_HEADROOM : m->buf_len;
-   m->data_off = buf_ofs;
-
-   m->data_len = 0;
-}

 /*
  * This function is called after packets have been transimited. It fetchs mbuf
@@ -1590,8 +1562,8 @@ txmbuf_clean_zcp(struct virtio_net *dev, struct vpool 
*vpool)

for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
-   if (likely(MBUF_EXT_MEM(mbuf)))
-   pktmbuf_detach_zcp(mbuf);
+   if (likely(RTE_MBUF_INDIRECT(mbuf)))
+   rte_pktmbuf_detach(mbuf);
rte_ring_sp_enqueue(vpool->ring, mbuf);

/* Update used index buffer information. */
@@ -1653,8 +1625,8 @@ static void mbuf_destroy_zcp(struct vpool *vpool)
for (index = 0; index < mbuf_count; index++) {
mbuf = __rte_mbuf_raw_alloc(vpool->pool);
if (likely(mbuf != NULL)) {
-   if (likely(MBUF_EXT_MEM(mbuf)))
-   pktmbuf_detach_zcp(mbuf);
+   if (likely(RTE_MBUF_INDIRECT(mbuf)))
+   rte_pktmbuf_detach(mbuf);
rte_ring_sp_enqueue(vpool->ring, (void *)mbuf);
}
}
@@ -2149,7 +2121,7 @@ switch_worker_zcp(__attribute__((unused)) void *arg)
}
while (likely(rx_count)) {
rx_count--;
-   pktmbuf_detach_zcp(
+   rte_pktmbuf_detach(
pkts_burst[rx_count]);
rte_ring_sp_enqueue(
vpool_array[index].ring,
-- 
1.9.1



[dpdk-dev] ovs-dpdk: placing the metadata

2015-03-25 Thread Zoltan Kiss
Hi Olivier,

On 25/03/15 17:04, Olivier MATZ wrote:
> Hi Zoltan,
>
> On 03/24/2015 06:42 PM, Zoltan Kiss wrote:
>> Hi,
>>
>> I've noticed in lib/netdev-dpdk.c that __rte_pktmbuf_init() stores the
>> packet metadata right after "struct rte_mbuf", and before the buffer
>> data:
>>
>>  /* start of buffer is just after mbuf structure */
>>  m->buf_addr = (char *)m + sizeof(struct dp_packet);
>>
>> (struct dp_packet has the rte_mbuf as first member if DPDK enabled)
>>
>> However, lib/librte_mbuf/rte_mbuf.h seems to codify that the buffer
>> should start right after the rte_mbuf:
>>
>> /**
>>   * Given the buf_addr returns the pointer to corresponding mbuf.
>>   */
>> #define RTE_MBUF_FROM_BADDR(ba) (((struct rte_mbuf *)(ba)) - 1)
>>
>> /**
>>   * Given the pointer to mbuf returns an address where it's  buf_addr
>>   * should point to.
>>   */
>> #define RTE_MBUF_TO_BADDR(mb)   (((struct rte_mbuf *)(mb)) + 1)
>>
>> These macros are used for attaching/detaching mbuf's to each other. This
>> is the way the code retrieves the direct buffer from an indirect one,
>> and vica versa. I think if we want to keep the metadata feature (which I
>> guess is quite important), we need to add a pointer to rte_mbuf, which
>> helps the direct and indirect structs to find each other. Something like:
>>
>>  struct rte_mbuf *attach;/**< Points to the other buffer if this
>> one
>>   is (in)direct. Otherwise NULL.  */
>>
>> What do you think?
>
> I've just sent a patch that should fix this issue.
> http://dpdk.org/ml/archives/dev/2015-March/015722.html
>
> Let me know if you have any comment on it.

I have some comments for the first patch:

> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index c3fcb80..050f3ac 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
I've sent in a separate patch for this file, I think it's just easier to 
ditch the old copy-pasted code, see "[PATCH] examples/vhost: use library 
routines instead of local copies"

> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 17ba791..4ced6d3 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -268,7 +268,7 @@ struct rte_mbuf {
>   uint16_t data_len;/**< Amount of data in segment buffer. */
>   uint32_t pkt_len; /**< Total pkt len: sum of all segments. */
>   uint16_t vlan_tci;/**< VLAN Tag Control Identifier (CPU order) 
> */
> - uint16_t reserved;
> + uint16_t priv_size;   /**< size of the application private data */
>   union {
>   uint32_t rss; /**< RSS hash result if RSS enabled */
>   struct {
> @@ -320,15 +320,38 @@ struct rte_mbuf {
>  } __rte_cache_aligned;
>
>  /**
> - * Given the buf_addr returns the pointer to corresponding mbuf.
> + * Return the mbuf owning the given data buffer address.
> + *
> + * @param mi
> + *   The pointer to the indirect mbuf.
> + * @param buffer_addr
> + *   The address of the data buffer of the direct mbuf.
You don't need this parameter, it's mi->buf_addr.

> @@ -744,9 +767,11 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf 
> *mi, struct rte_mbuf *md)
>  static inline void rte_pktmbuf_detach(struct rte_mbuf *m)
>  {
>   const struct rte_mempool *mp = m->pool;
> - void *buf = RTE_MBUF_TO_BADDR(m);
> + void *buf = rte_mbuf_to_baddr(m);
>   uint32_t buf_len = mp->elt_size - sizeof(*m);
I don't see any reason to keep buf and buf_len, just assign straight to 
m->buf_addr and *len.
Besides that, you need to deduct m->priv_size from buf_len.

> - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof (*m);
> +
> + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + sizeof (*m) +
> + m->priv_size;
>
>   m->buf_addr = buf;
>   m->buf_len = (uint16_t)buf_len;

The rest of the series looks good,

Reviewed-by: Zoltan Kiss 


[dpdk-dev] ovs-dpdk: ofpbuf reinitialization

2015-03-25 Thread Zoltan Kiss
Hi,

Looking around in the DPDK code I've found that it only initializes the 
packet metadata (whih contains the struct ofpbuf belonging to the 
packet) during setup, as the packet initializer of rte_mempool_create.
That means that every time a packet buffer is released back by OVS to 
the buffer pool, it retains ofpbuf state, and it doesn't change when the 
poll mode driver use the buffer again to store a new packet. "source" 
and "allocated members of ofpbuf shouldn't change, but frame, 
l2_pad_size and the offsets does at various places. Even though I 
couldn't establish an error scenario yet, I think it's quite dangerous 
to leave the packet to inherit the previous packet's ofpbuf.
Or am I missing some place where this piece is reinitialized?

Regards,

Zoltan Kiss


[dpdk-dev] [PATCH v2 7/7] hv: add kernel patch

2015-03-25 Thread KY Srinivasan


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Wednesday, March 25, 2015 11:11 AM
> To: simonxiaolinux at hotmail.com; Alexander Malysh; KY Srinivasan
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [PATCH v2 7/7] hv: add kernel patch
> 
> From: Stephen Hemminger 
> 
> For users using non latest kernels, put kernel patch in for
> them to use.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  .../linuxapp/hv_uio/vmbus-get-pages.patch  | 53
> ++
>  1 file changed, 53 insertions(+)
>  create mode 100644 lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
> 
> diff --git a/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
> b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
> new file mode 100644
> index 000..e1a4b13
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/hv_uio/vmbus-get-pages.patch
> @@ -0,0 +1,53 @@
> +hyper-v: allow access to vmbus from userspace driver
> +
> +This is patch from  to allow access to hyper-v vmbus from UIO driver.
> +
> +Signed-off-by: Stas Egorov 
> +Signed-off-by: Stephen Hemminger 
> +
> +---
> +v2 - simplify and rename to vmbus_get_monitor_pages
> +
> + drivers/hv/connection.c |   20 +---
> + include/linux/hyperv.h  |3 +++
> + 2 files changed, 20 insertions(+), 3 deletions(-)
> +
> +--- a/drivers/hv/connection.c
>  b/drivers/hv/connection.c
> +@@ -64,6 +64,15 @@
> + }
> + }
> +
> ++void vmbus_get_monitor_pages(unsigned long *int_page,
> ++ unsigned long monitor_pages[2])
> ++{
> ++*int_page = (unsigned long)vmbus_connection.int_page;
> ++monitor_pages[0] = (unsigned
> long)vmbus_connection.monitor_pages[0];
> ++monitor_pages[1] = (unsigned
> long)vmbus_connection.monitor_pages[1];
> ++}
> ++EXPORT_SYMBOL_GPL(vmbus_get_monitor_pages);
> ++
> + static int vmbus_negotiate_version(struct vmbus_channel_msginfo
> *msginfo,
> + __u32 version)
> + {
> +@@ -327,8 +336,6 @@
> + else
> + bytes_to_read = 0;
> + } while (read_state && (bytes_to_read != 0));
> +-} else {
> +-pr_err("no channel callback for relid - %u\n", relid);
> + }
> +
> + spin_unlock_irqrestore(&channel->inbound_lock, flags);
> +--- a/include/linux/hyperv.h
>  b/include/linux/hyperv.h
> +@@ -1162,6 +1162,9 @@
> +
> + extern void vmbus_ontimer(unsigned long data);
> +
> ++extern void vmbus_get_monitor_pages(unsigned long *int_page,
> ++unsigned long monitor_pages[2]);
> ++
> + /* Base driver object */
> + struct hv_driver {
> + const char *name;

Are you basing this on the current Greg's tree.

K. Y
> --
> 2.1.4