[dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding

2015-08-26 Thread Liu, Jijiang


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Michael Qiu
> Sent: Friday, August 07, 2015 11:29 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH] testpmd: modify the mac of csum forwarding
> 
> For some ethnet-switch like intel RRC, all the packet forwarded out by DPDK
> will be dropped in switch side, so the packet generator will never receive the
> packet.
> 
> Signed-off-by: Michael Qiu 
> ---
>  app/test-pmd/csumonly.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index
> 1bf3485..bf8af1d 100644
> --- a/app/test-pmd/csumonly.c
> +++ b/app/test-pmd/csumonly.c
> @@ -550,6 +550,10 @@ pkt_burst_checksum_forward(struct fwd_stream
> *fs)
>* and inner headers */
> 
>   eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> + ether_addr_copy(&peer_eth_addrs[fs->peer_addr],
> + ð_hdr->d_addr);
> + ether_addr_copy(&ports[fs->tx_port].eth_addr,
> + ð_hdr->s_addr);
>   parse_ethernet(eth_hdr, &info);
>   l3_hdr = (char *)eth_hdr + info.l2_len;
> 
> --
> 1.9.3
The change will affect on the csum fwd performance.
But I also think the change is necessary, or we cannot use csumonly fwd mode in 
guest?

Acked-by: Jijiang Liu 



[dpdk-dev] [PATCH] ixgbe: fix a x550 DCB issue

2015-08-26 Thread Wenzhuo Lu
There's a DCB issue on x550. For 8 TCs, if a packet with user priority 6
or 7 is injected to the NIC, then the NIC will put 3 packets into the
queue. There's also a similar issue for 4 TCs.
The root cause is RXPBSIZE is not right. RXPBSIZE of x550 is 384. It's
different from other 10G NICs. We need to set the RXPBSIZE according to
the NIC type.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 91023b9..021229f 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2915,6 +2915,7 @@ ixgbe_rss_configure(struct rte_eth_dev *dev)

 #define NUM_VFTA_REGISTERS 128
 #define NIC_RX_BUFFER_SIZE 0x200
+#define X550_RX_BUFFER_SIZE 0x180

 static void
 ixgbe_vmdq_dcb_configure(struct rte_eth_dev *dev)
@@ -2943,7 +2944,15 @@ ixgbe_vmdq_dcb_configure(struct rte_eth_dev *dev)
 * RXPBSIZE
 * split rx buffer up into sections, each for 1 traffic class
 */
-   pbsize = (uint16_t)(NIC_RX_BUFFER_SIZE / nb_tcs);
+   switch (hw->mac.type) {
+   case ixgbe_mac_X550:
+   case ixgbe_mac_X550EM_x:
+   pbsize = (uint16_t)(X550_RX_BUFFER_SIZE / nb_tcs);
+   break;
+   default:
+   pbsize = (uint16_t)(NIC_RX_BUFFER_SIZE / nb_tcs);
+   break;
+   }
for (i = 0 ; i < nb_tcs; i++) {
uint32_t rxpbsize = IXGBE_READ_REG(hw, IXGBE_RXPBSIZE(i));
rxpbsize &= (~(0x3FF << IXGBE_RXPBSIZE_SHIFT));
@@ -3317,7 +3326,7 @@ ixgbe_dcb_hw_configure(struct rte_eth_dev *dev,
 {
int ret = 0;
uint8_t i,pfc_en,nb_tcs;
-   uint16_t pbsize;
+   uint16_t pbsize, rx_buffer_size;
uint8_t config_dcb_rx = 0;
uint8_t config_dcb_tx = 0;
uint8_t tsa[IXGBE_DCB_MAX_TRAFFIC_CLASS] = {0};
@@ -3408,9 +3417,19 @@ ixgbe_dcb_hw_configure(struct rte_eth_dev *dev,
}
}

+   switch (hw->mac.type) {
+   case ixgbe_mac_X550:
+   case ixgbe_mac_X550EM_x:
+   rx_buffer_size = X550_RX_BUFFER_SIZE;
+   break;
+   default:
+   rx_buffer_size = NIC_RX_BUFFER_SIZE;
+   break;
+   }
+
if(config_dcb_rx) {
/* Set RX buffer size */
-   pbsize = (uint16_t)(NIC_RX_BUFFER_SIZE / nb_tcs);
+   pbsize = (uint16_t)(rx_buffer_size / nb_tcs);
uint32_t rxpbsize = pbsize << IXGBE_RXPBSIZE_SHIFT;
for (i = 0 ; i < nb_tcs; i++) {
IXGBE_WRITE_REG(hw, IXGBE_RXPBSIZE(i), rxpbsize);
@@ -3466,7 +3485,7 @@ ixgbe_dcb_hw_configure(struct rte_eth_dev *dev,

/* Check if the PFC is supported */
if(dev->data->dev_conf.dcb_capability_en & ETH_DCB_PFC_SUPPORT) {
-   pbsize = (uint16_t) (NIC_RX_BUFFER_SIZE / nb_tcs);
+   pbsize = (uint16_t) (rx_buffer_size / nb_tcs);
for (i = 0; i < nb_tcs; i++) {
/*
* If the TC count is 8,and the default high_water is 48,
-- 
1.9.3



[dpdk-dev] flow_director_filter error!!

2015-08-26 Thread De Lara Guarch, Pablo
Hi Navneet,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> Sent: Tuesday, August 25, 2015 9:27 PM
> To: Wu, Jingjing; Mcnamara, John; dev at dpdk.org
> Subject: Re: [dpdk-dev] flow_director_filter error!!
> 
> Hi Jingjing:
> 
> Thanks.
> 
> I did have the ethertype_filter ignore the mac_addr, and look at only
> ethertype filtyer and it still  got a "bad arguments" message :-(
> 
> testpmd>  ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 1
> Bad arguments

Yes, apparently the example is wrong. It misses the MAC address, after mac_ignr.
So it should be:

ethertype_filter 0 add mac_ignr 00:11:22:33:44:55 ethertype 0x0806 fwd queue 1

Regards,
Pablo

> 
> 
> 
> -Original Message-
> From: Wu, Jingjing [mailto:jingjing.wu at intel.com]
> Sent: Tuesday, August 25, 2015 6:55 AM
> To: Navneet Rao; Mcnamara, John; dev at dpdk.org
> Subject: RE: [dpdk-dev] flow_director_filter error!!
> 
> Hi, Navneet
> 
> I'm sorry for I have no idea about the NIC i540. Are you talking about X540?
> If X540, I guess you can't classify on the MAC-ADDRESS to different queue by
> ethertype filter. Because in the X540 datasheet the ethertype filter is
> described as below:
> " 7.1.2.3 L2 Ethertype Filters
> These filters identify packets by their L2 Ethertype, 802.1Q user priority and
> optionally assign them to a receive queue."
> 
> So the mac_address is not the filter's input.
> 
> Thanks
> Jingjing
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > Sent: Friday, August 21, 2015 2:57 AM
> > To: Mcnamara, John; dev at dpdk.org
> > Subject: Re: [dpdk-dev] flow_director_filter error!!
> >
> > Thanks John.
> >
> > I am trying to setup/use the flow-director-filter on the i540.
> >
> > -- When I try to setup the flow-director-filter as per the example, I
> > am getting "bad arguments"!!!
> >  So decided to see if the flush command would work.
> >
> >
> > In the interim --- I am using ethertype filter to accomplish the following.
> > What I am trying to do is this --
> > Use 2 different i540 cards
> > Use the igb_uio driver.
> > Use the testpmd app.
> > Setup 5 different MAC-ADDRESSes on each port. (using the set mac_addr
> > command) Setup 5 different RxQs and TxQs on each port.
> > And then use the testpmd app to generate traffic..
> >
> > I am assuming that the testpmd app will now send and receive traffic
> > using the 5 different MAC_ADDRESSes..
> > On each port's receive I will now want to classify on the MAC-ADDRESS
> > and steer the traffic to different queues.
> >
> > Is there an example/reference on how to achieve this?
> >
> > Next, I would want to do "classify" on "flexbytes" and send/steer the
> > traffic to different queues using flow-director-filter.
> >
> > Thanks
> > -Navneet
> >
> >
> >
> >
> > -Original Message-
> > From: Mcnamara, John [mailto:john.mcnamara at intel.com]
> > Sent: Wednesday, August 19, 2015 3:39 PM
> > To: Navneet Rao; dev at dpdk.org
> > Subject: RE: [dpdk-dev] flow_director_filter error!!
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > > Sent: Tuesday, August 18, 2015 4:01 PM
> > > To:  HYPERLINK "mailto:dev at dpdk.org" dev at dpdk.org
> > > Subject: [dpdk-dev] flow_director_filter error!!
> > >
> > > After I start the testpmd app, I am flusing the flow_director_filter
> > > settings and get the following error -
> > >
> > >
> > >
> > > testpmd> flush_flow_director 0
> > >
> > > PMD: ixgbe_fdir_flush(): Failed to re-initialize FD table.
> > >
> > > flow director table flushing error: (Too many open files in system)
> >
> > Hi,
> >
> > Are you setting a flow director filter before flushing? If so, could you 
> > give
> an example.
> >
> > John.
> > --
> >


[dpdk-dev] [PATCH] doc: add missing field in ethertype_filter example in testpmd doc

2015-08-26 Thread Pablo de Lara
The two examples of ethertype_filter in testpmd documentation
were missing the mac address field, so the example was incorrect.

Signed-off-by: Pablo de Lara 
---
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst 
b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 3f076c8..aa77a91 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -1469,8 +1469,8 @@ Example, to add/remove an ethertype filter rule:

 .. code-block:: console

-testpmd> ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 3
-testpmd> ethertype_filter 0 del mac_ignr ethertype 0x0806 fwd queue 3
+testpmd> ethertype_filter 0 add mac_ignr 00:11:22:33:44:55 ethertype 
0x0806 fwd queue 3
+testpmd> ethertype_filter 0 del mac_ignr 00:11:22:33:44:55 ethertype 
0x0806 fwd queue 3

 2tuple_filter
 ~
-- 
2.4.2



[dpdk-dev] vhost compliant virtio based networking interface in container

2015-08-26 Thread Tetsuya Mukawa
On 2015/08/25 18:56, Xie, Huawei wrote:
> On 8/25/2015 10:59 AM, Tetsuya Mukawa wrote:
>> Hi Xie and Yanping,
>>
>>
>> May I ask you some questions?
>> It seems we are also developing an almost same one.
> Good to know that we are tackling the same problem and have the similar
> idea.
> What is your status now? We had the POC running, and compliant with
> dpdkvhost.
> Interrupt like notification isn't supported.

We implemented vhost PMD first, so we just start implementing it.

>
>> On 2015/08/20 19:14, Xie, Huawei wrote:
>>> Added dev at dpdk.org
>>>
>>> On 8/20/2015 6:04 PM, Xie, Huawei wrote:
 Yanping:
 I read your mail, seems what we did are quite similar. Here i wrote a
 quick mail to describe our design. Let me know if it is the same thing.

 Problem Statement:
 We don't have a high performance networking interface in container for
 NFV. Current veth pair based interface couldn't be easily accelerated.

 The key components involved:
 1.DPDK based virtio PMD driver in container.
 2.device simulation framework in container.
 3.dpdk(or kernel) vhost running in host.

 How virtio is created?
 A:  There is no "real" virtio-pci device in container environment.
 1). Host maintains pools of memories, and shares memory to container.
 This could be accomplished through host share a huge page file to 
 container.
 2). Containers creates virtio rings based on the shared memory.
 3). Container creates mbuf memory pools on the shared memory.
 4) Container send the memory and vring information to vhost through
 vhost message. This could be done either through ioctl call or vhost
 user message.

 How vhost message is sent?
 A: There are two alternative ways to do this.
 1) The customized virtio PMD is responsible for all the vring creation,
 and vhost message sending.
>> Above is our approach so far.
>> It seems Yanping also takes this kind of approach.
>> We are using vhost-user functionality instead of using the vhost-net
>> kernel module.
>> Probably this is the difference between Yanping and us.
> In my current implementation, the device simulation layer talks to "user
> space" vhost through cuse interface. It could also be done through vhost
> user socket. This isn't the key point.
> Here vhost-user is kind of confusing, maybe user space vhost is more
> accurate, either cuse or unix domain socket. :).
>
> As for yanping, they are now connecting to vhost-net kernel module, but
> they are also trying to connect to "user space" vhost.  Correct me if wrong.
> Yes, there is some difference between these two. Vhost-net kernel module
> could directly access other process's memory, while using
> vhost-user(cuse/user), we need do the memory mapping.
>> BTW, we are going to submit a vhost PMD for DPDK-2.2.
>> This PMD is implemented on librte_vhost.
>> It allows DPDK application to handle a vhost-user(cuse) backend as a
>> normal NIC port.
>> This PMD should work with both Xie and Yanping approach.
>> (In the case of Yanping approach, we may need vhost-cuse)
>>
 2) We could do this through a lightweight device simulation framework.
 The device simulation creates simple PCI bus. On the PCI bus,
 virtio-net PCI devices are created. The device simulations provides
 IOAPI for MMIO/IO access.
>> Does it mean you implemented a kernel module?
>> If so, do you still need vhost-cuse functionality to handle vhost
>> messages n userspace?
> The device simulation is  a library running in user space in container. 
> It is linked with DPDK app. It creates pseudo buses and virtio-net PCI
> devices.
> The virtio-container-PMD configures the virtio-net pseudo devices
> through IOAPI provided by the device simulation rather than IO
> instructions as in KVM.
> Why we use device simulation?
> We could create other virtio devices in container, and provide an common
> way to talk to vhost-xx module.

Thanks for explanation.
At first reading, I thought the difference between approach1 and
approach2 is whether we need to implement a new kernel module, or not.
But I understand how you implemented.

Please let me explain our design more.
We might use a kind of similar approach to handle a pseudo virtio-net
device in DPDK.
(Anyway, we haven't finished implementing yet, this overview might have
some technical problems)

Step1. Separate virtio-net and vhost-user socket related code from QEMU,
then implement it as a separated program.
The program also has below features.
 - Create a directory that contains almost same files like
/sys/bus/pci/device//*
   (To scan these file located on outside sysfs, we need to fix EAL)
 - This dummy device is driven by dummy-virtio-net-driver. This name is
specified by '/driver' file.
 - Create a shared file that represents pci configuration space, then
mmap it, also specify the path in '/resource_path'

The program will be GPL, but it will be like a bridge on th

[dpdk-dev] [ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD

2015-08-26 Thread Zoltan Kiss
Hi,

On 24/08/15 12:43, Traynor, Kevin wrote:
>
>> -Original Message-
>> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Zoltan Kiss
>> Sent: Friday, August 21, 2015 7:05 PM
>> To: dev at dpdk.org; dev at openvswitch.org
>> Cc: Richardson, Bruce; Ananyev, Konstantin
>> Subject: [ovs-dev] OVS-DPDK performance problem on ixgbe vector PMD
>>
>> Hi,
>>
>> I've set up a simple packet forwarding perf test on a dual-port 10G
>> 82599ES: one port receives 64 byte UDP packets, the other sends it out,
>> one core used. I've used latest OVS with DPDK 2.1, and the first result
>> was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last
>> year with the same test. The first thing I've changed was to revert back
>> to the old behaviour about this issue:
>>
>> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731
>>
>> So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM.
>> That increased the performance to 13.5, but to figure out what's wrong
>> started to play with the receive functions. First I've disabled vector
>> PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So
>> then I've enabled scattered RX, and with
>> ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which
>> is I guess as close as possible to the 14.2 line rate (on my HW at
>> least, with one core)
>> Does anyone has a good explanation about why the vector PMD performs so
>> significantly worse? I would expect that on a 3.2 GHz i5-4570 one core
>> should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a
>> difference.
>
> I've previously turned on/off vectorisation and found that for tx it makes
> a significant difference. For Rx it didn't make a much of a difference but
> rx bulk allocation which gets enabled with it did improve performance.
>
> Is there is something else also running on the current pmd core? did you
> try moving it to another?
I've tied the pmd to the second core, as far as I can see from top and 
profiling outputs hardly anything else runs there.

Also, did you compile OVS with -O3/-Ofast, they
> tend to give a performance boost.
Yes

>
> Are you hitting 3.2 GHz for the core with the pmd? I think that is only
> with turbo boost, so it may not be achievable all the time.
The turbo boost freq is 3.6 GHz.

>
>> I've tried to look into it with oprofile, but the results were quite
>> strange: 35% of the samples were from miniflow_extract, the part where
>> parse_vlan calls data_pull to jump after the MAC addresses. The oprofile
>> snippet (1M samples):
>>
>> 511454 190.0037  flow.c:511
>> 511458 149   0.0292  dp-packet.h:266
>> 51145f 4264  0.8357  dp-packet.h:267
>> 511466 180.0035  dp-packet.h:268
>> 51146d 430.0084  dp-packet.h:269
>> 511474 172   0.0337  flow.c:511
>> 51147a 4320  0.8467  string3.h:51
>> 51147e 358763   70.3176  flow.c:99
>> 511482 23.9e-04  string3.h:51
>> 511485 3060  0.5998  string3.h:51
>> 511488 1693  0.3318  string3.h:51
>> 51148c 2933  0.5749  flow.c:326
>> 511491 470.0092  flow.c:326
>>
>> And the corresponding disassembled code:
>>
>> 511454:   49 83 f9 0d cmpr9,0xd
>> 511458:   c6 83 81 00 00 00 00movBYTE PTR [rbx+0x81],0x0
>> 51145f:   66 89 83 82 00 00 00movWORD PTR [rbx+0x82],ax
>> 511466:   66 89 93 84 00 00 00movWORD PTR [rbx+0x84],dx
>> 51146d:   66 89 8b 86 00 00 00movWORD PTR [rbx+0x86],cx
>> 511474:   0f 86 af 01 00 00   jbe511629
>> 
>> 51147a:   48 8b 45 00 movrax,QWORD PTR [rbp+0x0]
>> 51147e:   4c 8d 5d 0c lear11,[rbp+0xc]
>> 511482:   49 89 00movQWORD PTR [r8],rax
>> 511485:   8b 45 08moveax,DWORD PTR [rbp+0x8]
>> 511488:   41 89 40 08 movDWORD PTR [r8+0x8],eax
>> 51148c:   44 0f b7 55 0c  movzx  r10d,WORD PTR [rbp+0xc]
>> 511491:   66 41 81 fa 81 00   cmpr10w,0x81
>>
>> My only explanation to this so far is that I misunderstand something
>> about the oprofile results.
>>
>> Regards,
>>
>> Zoltan
>> ___
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev


[dpdk-dev] flow_director_filter error!!

2015-08-26 Thread Navneet Rao
Thanks Pablo.

BTW -- how do I 
1. query the "settings" of the ethertype-filter to check that they are correct? 
Is there an option that I am missing...
2.  it might be good to "publish" this in the output of "show port info 


Thanks
-Navneet


-Original Message-
From: De Lara Guarch, Pablo [mailto:pablo.de.lara.gua...@intel.com] 
Sent: Wednesday, August 26, 2015 12:28 AM
To: Navneet Rao; Wu, Jingjing; Mcnamara, John; dev at dpdk.org
Subject: RE: [dpdk-dev] flow_director_filter error!!

Hi Navneet,

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> Sent: Tuesday, August 25, 2015 9:27 PM
> To: Wu, Jingjing; Mcnamara, John; dev at dpdk.org
> Subject: Re: [dpdk-dev] flow_director_filter error!!
> 
> Hi Jingjing:
> 
> Thanks.
> 
> I did have the ethertype_filter ignore the mac_addr, and look at only 
> ethertype filtyer and it still  got a "bad arguments" message :-(
> 
> testpmd>  ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 1
> Bad arguments

Yes, apparently the example is wrong. It misses the MAC address, after mac_ignr.
So it should be:

ethertype_filter 0 add mac_ignr 00:11:22:33:44:55 ethertype 0x0806 fwd queue 1

Regards,
Pablo

> 
> 
> 
> -Original Message-
> From: Wu, Jingjing [mailto:jingjing.wu at intel.com]
> Sent: Tuesday, August 25, 2015 6:55 AM
> To: Navneet Rao; Mcnamara, John; dev at dpdk.org
> Subject: RE: [dpdk-dev] flow_director_filter error!!
> 
> Hi, Navneet
> 
> I'm sorry for I have no idea about the NIC i540. Are you talking about X540?
> If X540, I guess you can't classify on the MAC-ADDRESS to different 
> queue by ethertype filter. Because in the X540 datasheet the ethertype 
> filter is described as below:
> " 7.1.2.3 L2 Ethertype Filters
> These filters identify packets by their L2 Ethertype, 802.1Q user 
> priority and optionally assign them to a receive queue."
> 
> So the mac_address is not the filter's input.
> 
> Thanks
> Jingjing
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > Sent: Friday, August 21, 2015 2:57 AM
> > To: Mcnamara, John; dev at dpdk.org
> > Subject: Re: [dpdk-dev] flow_director_filter error!!
> >
> > Thanks John.
> >
> > I am trying to setup/use the flow-director-filter on the i540.
> >
> > -- When I try to setup the flow-director-filter as per the example, 
> > I am getting "bad arguments"!!!
> >  So decided to see if the flush command would work.
> >
> >
> > In the interim --- I am using ethertype filter to accomplish the following.
> > What I am trying to do is this --
> > Use 2 different i540 cards
> > Use the igb_uio driver.
> > Use the testpmd app.
> > Setup 5 different MAC-ADDRESSes on each port. (using the set 
> > mac_addr
> > command) Setup 5 different RxQs and TxQs on each port.
> > And then use the testpmd app to generate traffic..
> >
> > I am assuming that the testpmd app will now send and receive traffic 
> > using the 5 different MAC_ADDRESSes..
> > On each port's receive I will now want to classify on the 
> > MAC-ADDRESS and steer the traffic to different queues.
> >
> > Is there an example/reference on how to achieve this?
> >
> > Next, I would want to do "classify" on "flexbytes" and send/steer 
> > the traffic to different queues using flow-director-filter.
> >
> > Thanks
> > -Navneet
> >
> >
> >
> >
> > -Original Message-
> > From: Mcnamara, John [mailto:john.mcnamara at intel.com]
> > Sent: Wednesday, August 19, 2015 3:39 PM
> > To: Navneet Rao; dev at dpdk.org
> > Subject: RE: [dpdk-dev] flow_director_filter error!!
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > > Sent: Tuesday, August 18, 2015 4:01 PM
> > > To:  HYPERLINK "mailto:dev at dpdk.org" dev at dpdk.org
> > > Subject: [dpdk-dev] flow_director_filter error!!
> > >
> > > After I start the testpmd app, I am flusing the 
> > > flow_director_filter settings and get the following error -
> > >
> > >
> > >
> > > testpmd> flush_flow_director 0
> > >
> > > PMD: ixgbe_fdir_flush(): Failed to re-initialize FD table.
> > >
> > > flow director table flushing error: (Too many open files in 
> > > system)
> >
> > Hi,
> >
> > Are you setting a flow director filter before flushing? If so, could 
> > you give
> an example.
> >
> > John.
> > --
> >


[dpdk-dev] OVS-DPDK performance problem on ixgbe vector PMD

2015-08-26 Thread Zoltan Kiss
Hi,

I've checked it further, based on Stephen's suggestion I've tried perf 
top as well. The results were the same, it spends a lot of time in that 
part of the code, and there are high number of branch load misses 
(BR_MISS_PRED_RETIRED) around there too.
I've also started to strip down miniflow_extract() to remove parts which 
are not relevant to this very simple testcase. I've removed the metadata 
checking branches and the "size < sizeof(struct eth_header)". I've 
removed the size check from emc_processing, and placed log messages in 
flow_extract and netdev_flow_key_from_flow, to make sure the excessive 
time spent in miniflow_extract is not because these two are somehow 
calling it.
That way I've closed out all of the branches preceding this instruction. 
Oddly the high sample number now moved down a few instructions:
...
dp_packet_reset_offsets
   5113eb:   b8 ff ff ff ff  mov$0x,%eax
   5113f0:   66 89 8f 86 00 00 00mov%cx,0x86(%rdi)
   5113f7:   c6 87 81 00 00 00 00movb   $0x0,0x81(%rdi)
   5113fe:   66 89 87 82 00 00 00mov%ax,0x82(%rdi)
data_pull
   511405:   48 8d 4d 0c lea0xc(%rbp),%rcx
dp_packet_reset_offsets
   511409:   66 89 97 84 00 00 00mov%dx,0x84(%rdi)
memcpy
   511410:   48 8b 45 00 mov0x0(%rbp),%rax
   511414:   48 89 46 18 mov%rax,0x18(%rsi)

This last instruction moves the first 8 bytes of the MAC address (coming 
from 0x0(%rbp)) to 0x18(%rsi), which is basically memory pointed by 
parameter "struct miniflow *dst". It is allocated on the stack by 
emc_processing.
I couldn't find any branch which can cause this miss, but then I've 
checked the PMD stats:

pmd thread numa_id 0 core_id 1:
emc hits:4395834176
megaflow hits:1
miss:1
lost:0
polling cycles:166083129380 (16.65%)
processing cycles:831536059972 (83.35%)
avg cycles per packet: 226.95 (997619189352/4395834178)
avg processing cycles per packet: 189.16 (831536059972/4395834178)

So everything hits EMC, when I measured the change of that counter for 
10 seconds, the result was around ~13.3 Mpps too. The cycle statistics 
shows that it should be able to handle more than 15M packets per second, 
yet it doesn't receive that much, while with the non-vector PMD it can 
max out the link.
Any more suggestions?

Regards,

Zoltan


On 21/08/15 19:05, Zoltan Kiss wrote:
> Hi,
>
> I've set up a simple packet forwarding perf test on a dual-port 10G
> 82599ES: one port receives 64 byte UDP packets, the other sends it out,
> one core used. I've used latest OVS with DPDK 2.1, and the first result
> was only 13.2 Mpps, which was a bit far from the 13.9 I've seen last
> year with the same test. The first thing I've changed was to revert back
> to the old behaviour about this issue:
>
> http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731
>
> So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM.
> That increased the performance to 13.5, but to figure out what's wrong
> started to play with the receive functions. First I've disabled vector
> PMD, but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So
> then I've enabled scattered RX, and with
> ixgbe_recv_pkts_lro_bulk_alloc() I could manage to get 13.98 Mpps, which
> is I guess as close as possible to the 14.2 line rate (on my HW at
> least, with one core)
> Does anyone has a good explanation about why the vector PMD performs so
> significantly worse? I would expect that on a 3.2 GHz i5-4570 one core
> should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a
> difference.
> I've tried to look into it with oprofile, but the results were quite
> strange: 35% of the samples were from miniflow_extract, the part where
> parse_vlan calls data_pull to jump after the MAC addresses. The oprofile
> snippet (1M samples):
>
>511454 190.0037  flow.c:511
>511458 149   0.0292  dp-packet.h:266
>51145f 4264  0.8357  dp-packet.h:267
>511466 180.0035  dp-packet.h:268
>51146d 430.0084  dp-packet.h:269
>511474 172   0.0337  flow.c:511
>51147a 4320  0.8467  string3.h:51
>51147e 358763   70.3176  flow.c:99
>511482 23.9e-04  string3.h:51
>511485 3060  0.5998  string3.h:51
>511488 1693  0.3318  string3.h:51
>51148c 2933  0.5749  flow.c:326
>511491 470.0092  flow.c:326
>
> And the corresponding disassembled code:
>
>511454:   49 83 f9 0d cmpr9,0xd
>511458:   c6 83 81 00 00 00 00movBYTE PTR [rbx+0x81],0x0
>51145f:   66 89 83 82 00 00 00movWORD PTR [rbx+0x82],ax
>511466:   66 89 93 84 00 00 00movWORD PTR [rbx+0x84],dx
>51146d:   66 89 8b 86 00 00 00movWORD PTR [rbx+0x86],cx
>511474:   0f 86 af 01 00 00   jbe511629
> 
>51147a:   48 8b 45 00 mov

[dpdk-dev] [PATCH] acl: Improve acl_bld.c sort_rules()

2015-08-26 Thread Mark Smith
Replace O(n^2) list sort with an O(n log n) merge sort.
The merge sort is based on the solution suggested in:
http://cslibrary.stanford.edu/105/LinkedListProblems.pdf
Tested sort_rules() improvement:
100K rules: O(n^2):  31382 milliseconds; O(n log n): 10 milliseconds
259K rules: O(n^2): 133753 milliseconds; O(n log n): 22 milliseconds

Signed-off-by: Mark Smith 
---
 lib/librte_acl/acl_bld.c |  104 +++--
 1 files changed, 81 insertions(+), 23 deletions(-)

diff --git a/lib/librte_acl/acl_bld.c b/lib/librte_acl/acl_bld.c
index e6f4530..d78bc2d 100644
--- a/lib/librte_acl/acl_bld.c
+++ b/lib/librte_acl/acl_bld.c
@@ -1164,35 +1164,93 @@ rule_cmp_wildness(struct rte_acl_build_rule *r1, struct 
rte_acl_build_rule *r2)
 }

 /*
+ * Split the rte_acl_build_rule list into two lists.
+ */
+static void
+rule_list_split(struct rte_acl_build_rule *source,
+   struct rte_acl_build_rule **list_a,
+   struct rte_acl_build_rule **list_b)
+{
+   struct rte_acl_build_rule *fast;
+   struct rte_acl_build_rule *slow;
+
+   if (source == NULL || source->next == NULL) {
+   /* length < 2 cases */
+   *list_a = source;
+   *list_b = NULL;
+   } else {
+   slow = source;
+   fast = source->next;
+   /* Advance 'fast' two nodes, and advance 'slow' one node */
+   while (fast != NULL) {
+   fast = fast->next;
+   if (fast != NULL) {
+   slow = slow->next;
+   fast = fast->next;
+   }
+   }
+   /* 'slow' is before the midpoint in the list, so split it in two
+  at that point. */
+   *list_a = source;
+   *list_b = slow->next;
+   slow->next = NULL;
+   }
+}
+
+/*
+ * Merge two sorted lists.
+ */
+static struct rte_acl_build_rule *
+rule_list_sorted_merge(struct rte_acl_build_rule *a,
+   struct rte_acl_build_rule *b)
+{
+   struct rte_acl_build_rule *result = NULL;
+   struct rte_acl_build_rule **last_next = &result;
+
+   while (1) {
+   if (a == NULL) {
+   *last_next = b;
+   break;
+   } else if (b == NULL) {
+   *last_next = a;
+   break;
+   }
+   if (rule_cmp_wildness(a, b) >= 0) {
+   *last_next = a;
+   last_next = &a->next;
+   a = a->next;
+   } else {
+   *last_next = b;
+   last_next = &b->next;
+   b = b->next;
+   }
+   }
+   return result;
+}
+
+/*
  * Sort list of rules based on the rules wildness.
+ * Use recursive mergesort algorithm.
  */
 static struct rte_acl_build_rule *
 sort_rules(struct rte_acl_build_rule *head)
 {
-   struct rte_acl_build_rule *new_head;
-   struct rte_acl_build_rule *l, *r, **p;
-
-   new_head = NULL;
-   while (head != NULL) {
-
-   /* remove element from the head of the old list. */
-   r = head;
-   head = r->next;
-   r->next = NULL;
-
-   /* walk through new sorted list to find a proper place. */
-   for (p = &new_head;
-   (l = *p) != NULL &&
-   rule_cmp_wildness(l, r) >= 0;
-   p = &l->next)
-   ;
+   struct rte_acl_build_rule *a;
+   struct rte_acl_build_rule *b;

-   /* insert element into the new sorted list. */
-   r->next = *p;
-   *p = r;
-   }
+   /* Base case -- length 0 or 1 */
+   if (head == NULL || head->next == NULL)
+   return head;
+
+   /* Split head into 'a' and 'b' sublists */
+   rule_list_split(head, &a, &b);
+
+   /* Recursively sort the sublists */
+   a = sort_rules(a);
+   b = sort_rules(b);

-   return new_head;
+   /* answer = merge the two sorted lists together */
+   return rule_list_sorted_merge(a, b);
 }

 static uint32_t
-- 
1.7.1



[dpdk-dev] BUG - KNI broken in 4.2 kernel

2015-08-26 Thread Stephen Hemminger
The network device ops handles changed again.

Does KNI really need to keep yet another copy of the Intel driver code.
There already are 4 versions:
  1. Out-of tree base driver
  2. In-kernel mainline Linux driver
  3. DPDK driver
  4. KNI DPDK driver

No wonder they can't stay in sync.