[dpdk-dev] Multi-process on the same host

2013-10-04 Thread Walter de Donato
Hello,

I've been using DPDK for a while and now I encountered the following issue:
when I try to run two primary processes on the same host (with --no-shconf
option enabled) respectively sending packets on one port and receiving them
on a different port (the two ports are directly connected with a CAT-6
cable), I get this error on the receiving process:

Program received signal SIGSEGV, Segmentation fault.
0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', queue_id=0,
rx_pkts=0x75baa8f0, nb_pkts=128) at
/home/devel/dpdk/build/include/rte_ethdev.h:1658
1658return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
rx_pkts, nb_pkts);

To give some more details:
- the options given to the two processes:
  ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1
  ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2
  where the -p option is the binary mask to select the ports to enable.
- the network card is a dualport Intel X540:
  port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01)
  port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01)
- this is the hugeadm --pool-list output:
  Size  Minimum  Current  Maximum  Default
  1073741824222*

My first question is: should it be possible to let separate primary
processes coexist if they use different resources (cores, ports, memory
pools)?

A second question is: there is any other workaround to let this scenario
work without merging the two processes into two lcores of the same process?

Thanks in advance,
-Walter

Walter de Donato, Ph.D.
PostDoc @ Department of Electrical Engineering and Information Technologies
University of Napoli Federico II
Via Claudio 21 -- 80125 Napoli (Italy)
Phone: +39 081 76 83821 - Fax: +39 081 76 83816
Email: walter.dedonato at unina.it
WWW: http://wpage.unina.it/walter.dedonato


[dpdk-dev] Need comment on 82599 TSO

2013-10-04 Thread jigsaw
Hi,

I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS.

TSO must be aware of MSS, or gso in skb of kernel.
But MSS nees 16 bits per mbuf. And we have no spare 16 bits in
rte_mbuf or rte_pktmbuf.
If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be
doubled, coz currently the size is at the edge of cacheline(32 byte).

I have two solutions here:

1. Store MSS in struct rte_eth_conf.
This is actually a very bad idea, coz MSS is not bound to device.

2. Turn on and off TSO with rte_ctrlmbuf.
I found that rte_ctrlmbuf is not used at all. So it could be the first
use case of it.
With rte_ctrlmbuf we have enough space to store MSS.

Looking forward to your comments.

thx &
rgds,
-Qinglai


[dpdk-dev] Multi-process on the same host

2013-10-04 Thread Stephen Hemminger
On Fri, 4 Oct 2013 13:47:02 +0200
Walter de Donato  wrote:

> Hello,
> 
> I've been using DPDK for a while and now I encountered the following issue:
> when I try to run two primary processes on the same host (with --no-shconf
> option enabled) respectively sending packets on one port and receiving them
> on a different port (the two ports are directly connected with a CAT-6
> cable), I get this error on the receiving process:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', queue_id=0,
> rx_pkts=0x75baa8f0, nb_pkts=128) at
> /home/devel/dpdk/build/include/rte_ethdev.h:1658
> 1658return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id],
> rx_pkts, nb_pkts);
> 
> To give some more details:
> - the options given to the two processes:
>   ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1
>   ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2
>   where the -p option is the binary mask to select the ports to enable.
> - the network card is a dualport Intel X540:
>   port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01)
>   port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01)
> - this is the hugeadm --pool-list output:
>   Size  Minimum  Current  Maximum  Default
>   1073741824222*
> 
> My first question is: should it be possible to let separate primary
> processes coexist if they use different resources (cores, ports, memory
> pools)?
> 
> A second question is: there is any other workaround to let this scenario
> work without merging the two processes into two lcores of the same process?
> 
> Thanks in advance,
> -Walter

The problem is that huge TLB filesystem is a shared resource.
Because of that the memory pools of the two processes overlap, and memory pools
are used for packet buffers, malloc, etc.

You might be able to use no-huge, but then other things would probably break.


[dpdk-dev] Need comment on 82599 TSO

2013-10-04 Thread Stephen Hemminger
On Fri, 4 Oct 2013 15:44:19 +0300
jigsaw  wrote:

> Hi,
> 
> I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS.
> 
> TSO must be aware of MSS, or gso in skb of kernel.
> But MSS nees 16 bits per mbuf. And we have no spare 16 bits in
> rte_mbuf or rte_pktmbuf.
> If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be
> doubled, coz currently the size is at the edge of cacheline(32 byte).
> 
> I have two solutions here:
> 
> 1. Store MSS in struct rte_eth_conf.
> This is actually a very bad idea, coz MSS is not bound to device.
> 
> 2. Turn on and off TSO with rte_ctrlmbuf.
> I found that rte_ctrlmbuf is not used at all. So it could be the first
> use case of it.
> With rte_ctrlmbuf we have enough space to store MSS.
> 
> Looking forward to your comments.
> 
> thx &
> rgds,
> -Qinglai

The mbuf needs to grow to 2 cache lines. There are other things that need
to be added to mbuf eventually as well. For example the QoS bitfield is
too small when crammed into 32 bits. Ideally the normal small packet
stuff would be in the first cacheline; and the other part of the struct
would have things less likely to be used.


[dpdk-dev] Multi-process on the same host

2013-10-04 Thread Richardson, Bruce
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> Hemminger
> Sent: Friday, October 04, 2013 5:39 PM
> To: Walter de Donato
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] Multi-process on the same host
> 
> On Fri, 4 Oct 2013 13:47:02 +0200
> Walter de Donato  wrote:
> 
> > Hello,
> >
> > I've been using DPDK for a while and now I encountered the following
> issue:
> > when I try to run two primary processes on the same host (with
> > --no-shconf option enabled) respectively sending packets on one port
> > and receiving them on a different port (the two ports are directly
> > connected with a CAT-6 cable), I get this error on the receiving process:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000',
> queue_id=0,
> > rx_pkts=0x75baa8f0, nb_pkts=128) at
> > /home/devel/dpdk/build/include/rte_ethdev.h:1658
> > 1658return (*dev->rx_pkt_burst)(dev->data-
> >rx_queues[queue_id],
> > rx_pkts, nb_pkts);
> >
> > To give some more details:
> > - the options given to the two processes:
> >   ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1
> >   ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2
> >   where the -p option is the binary mask to select the ports to enable.
> > - the network card is a dualport Intel X540:
> >   port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev
> 01)
> >   port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2
> > (rev 01)
> > - this is the hugeadm --pool-list output:
> >   Size  Minimum  Current  Maximum  Default
> >   1073741824222*
> >
> > My first question is: should it be possible to let separate primary
> > processes coexist if they use different resources (cores, ports,
> > memory pools)?
> >
> > A second question is: there is any other workaround to let this
> > scenario work without merging the two processes into two lcores of the
> same process?
> >
> > Thanks in advance,
> > -Walter
> 
> The problem is that huge TLB filesystem is a shared resource.
> Because of that the memory pools of the two processes overlap, and
> memory pools are used for packet buffers, malloc, etc.
> 
> You might be able to use no-huge, but then other things would probably
> break.

The way to run two primary processes side by side is documented in the document 
"Intel(r) Data Plane Development Kit (Intel(r) DPDK): Programmer's Guide" 
available at: 
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-programmers-guide.html
 and is covered in section 17.2.3. You need to pass the "--file-prefix" flag 
when running your application to force the processes to use different hugepage 
files so they are not shared among the two processes.

Regards,
/Bruce
--
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole 
use of the intended recipient(s). Any review or distribution by others is 
strictly prohibited. If you are not the intended recipient, please contact the 
sender and delete all copies.




[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread Qinglai Xiao
This patch is a draft of TSO on 82599. That is, it is not expected to be
accepted as is.
The problem is where to put the mss field. In this patch, the mss is put in
the union of hash in rte_pktmbuf. It is not the best place, but it is quite
convenient, since hash is not used in TX procedure.
The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss
easy to access.

However, the hash is also misleading, coz mss has nothing to do with Rx hash.
A more formal way could be rename hash as below:

union {
uint32_t data;
struct rx_hash hash;
uint32_t tx_mss;
} misc; 

It is gonna be a major change coz it affects the core data structure.

Any comments will be appreciated.

Qinglai Xiao (1):
  ixgbe: TCP/UDP segment offload support on 82599.

 lib/librte_mbuf/rte_mbuf.h|6 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   32 +---
 2 files changed, 34 insertions(+), 4 deletions(-)

-- 
1.7.10.4



[dpdk-dev] [PATCH] ixgbe: TCP/UDP segment offload support on 82599.

2013-10-04 Thread Qinglai Xiao
Add support for TCP/UDP segment offload on 82599.
User can turn on TSO by setting MSS in the first frame.
Meantime, the L2 and L3 len, together with offload flags must be set in the
first frame accordingly. Otherwise the driver will cease the sending.
---
 lib/librte_mbuf/rte_mbuf.h|6 +-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   32 +---
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index d914562..ea4bb88 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -159,6 +159,10 @@ struct rte_pktmbuf {
uint16_t id;
} fdir; /**< Filter identifier if FDIR enabled */
uint32_t sched; /**< Hierarchical scheduler */
+   uint16_t mss;   /**< Maximum Segment Size. If more than 
zero,
+then TSO is enabled. User is 
responsible
+for setting vlan_macip and TCP/IP cksum
+accordingly. */
} hash; /**< hash information */
 };

@@ -195,7 +199,7 @@ struct rte_mbuf {
uint16_t refcnt_reserved; /**< Do not use this field */
 #endif
uint8_t type; /**< Type of mbuf. */
-   uint8_t reserved; /**< Unused field. Required for padding. 
*/
+   uint8_t reserved; /**< Unused field. Required for padding. 
*/ 
uint16_t ol_flags;/**< Offload features. */

union {
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 5c8668e..63d7f8a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -498,7 +498,7 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 static inline void
 ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,
volatile struct ixgbe_adv_tx_context_desc *ctx_txd,
-   uint16_t ol_flags, uint32_t vlan_macip_lens)
+   uint16_t ol_flags, uint32_t vlan_macip_lens, uint16_t mss)
 {
uint32_t type_tucmd_mlhl;
uint32_t mss_l4len_idx;
@@ -520,6 +520,10 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq,

/* Specify which HW CTX to upload. */
mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT);
+
+   /* MSS is reqired for TSO. The user must set mss accordingly */
+   mss_l4len_idx |= mss << IXGBE_ADVTXD_MSS_SHIFT;
+
switch (ol_flags & PKT_TX_L4_MASK) {
case PKT_TX_UDP_CKSUM:
type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP |
@@ -694,6 +698,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint32_t vlan_macip_lens;
uint32_t ctx = 0;
uint32_t new_ctx;
+   uint16_t mss;

txq = tx_queue;
sw_ring = txq->sw_ring;
@@ -719,10 +724,25 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 * are needed for offload functionality.
 */
ol_flags = tx_pkt->ol_flags;
+
vlan_macip_lens = tx_pkt->pkt.vlan_macip.data;
+   mss = tx_pkt->pkt.hash.mss;

/* If hardware offload required */
tx_ol_req = (uint16_t)(ol_flags & PKT_TX_OFFLOAD_MASK);
+
+   /*
+* If mss is set, we assume TSO is required.
+*
+* If TSO is turned on, the caller must set the offload bits
+* accordingly, otherwise we have to drop the packet, because
+* we have no knowledge of L2 or L3.
+*/
+   if (!tx_ol_req && mss) {
+   PMD_TX_LOG(DEBUG, "TSO set without offload bits. Abort 
sending.");
+   goto end_of_tx;
+   }
+
if (tx_ol_req) {
/* If new context need be built or reuse the exist ctx. 
*/
ctx = what_advctx_update(txq, tx_ol_req,
@@ -841,6 +861,11 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 */
cmd_type_len = IXGBE_ADVTXD_DTYP_DATA |
IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT;
+
+   /* Enable TSE bit for TSO */
+   if (mss)
+   cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE;
+
olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT);
 #ifdef RTE_LIBRTE_IEEE1588
if (ol_flags & PKT_TX_IEEE1588_TMST)
@@ -868,7 +893,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
}

ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req,
-   vlan_macip_lens);
+   vlan_macip_lens, mss);

txe->last_id = tx_last;
tx_id = txe->next_id;
@@ -339

[dpdk-dev] Need comment on 82599 TSO

2013-10-04 Thread jigsaw
Hi Stephen,

Thanks for comment. Pls check the other thread that I just posted.

thx &
rgds,
-Qinglai

On Fri, Oct 4, 2013 at 7:41 PM, Stephen Hemminger
 wrote:
> On Fri, 4 Oct 2013 15:44:19 +0300
> jigsaw  wrote:
>
>> Hi,
>>
>> I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS.
>>
>> TSO must be aware of MSS, or gso in skb of kernel.
>> But MSS nees 16 bits per mbuf. And we have no spare 16 bits in
>> rte_mbuf or rte_pktmbuf.
>> If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be
>> doubled, coz currently the size is at the edge of cacheline(32 byte).
>>
>> I have two solutions here:
>>
>> 1. Store MSS in struct rte_eth_conf.
>> This is actually a very bad idea, coz MSS is not bound to device.
>>
>> 2. Turn on and off TSO with rte_ctrlmbuf.
>> I found that rte_ctrlmbuf is not used at all. So it could be the first
>> use case of it.
>> With rte_ctrlmbuf we have enough space to store MSS.
>>
>> Looking forward to your comments.
>>
>> thx &
>> rgds,
>> -Qinglai
>
> The mbuf needs to grow to 2 cache lines. There are other things that need
> to be added to mbuf eventually as well. For example the QoS bitfield is
> too small when crammed into 32 bits. Ideally the normal small packet
> stuff would be in the first cacheline; and the other part of the struct
> would have things less likely to be used.


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread Stephen Hemminger
On Fri,  4 Oct 2013 20:06:52 +0300
Qinglai Xiao  wrote:

> This patch is a draft of TSO on 82599. That is, it is not expected to be
> accepted as is.
> The problem is where to put the mss field. In this patch, the mss is put in
> the union of hash in rte_pktmbuf. It is not the best place, but it is quite
> convenient, since hash is not used in TX procedure.
> The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss
> easy to access.
> 
> However, the hash is also misleading, coz mss has nothing to do with Rx hash.
> A more formal way could be rename hash as below:
> 
>   union {
>   uint32_t data;
>   struct rx_hash hash;
>   uint32_t tx_mss;
>   } misc; 
> 
> It is gonna be a major change coz it affects the core data structure.
> 
> Any comments will be appreciated.
> 
> Qinglai Xiao (1):
>   ixgbe: TCP/UDP segment offload support on 82599.
> 
>  lib/librte_mbuf/rte_mbuf.h|6 +-
>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   32 +---
>  2 files changed, 34 insertions(+), 4 deletions(-)
> 

This will work for local generated packets but overlapping existing
field won't work well for forwarding.

What we want to be able to do is to take offload (jumbo) packets in
with from virtio (need better driver support which I am doing), and then
send them through to network devices.



[dpdk-dev] L2fwd Performance issue with Virtual Machine

2013-10-04 Thread Patel, Rashmin N
Hi,

If you are not using SRIOV or direct device assignment to VM, your traffic hits 
vSwitch(via vmware native ixgbe driver and network stack) in the ESX and 
switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch is not 
optimized for PMD at present so you would get optimal performance benefit 
having PMD, I believe.

For the RSS front, I would say you won't see much difference with RSS enabled 
for 1500 bytes frames. In fact, core is capable of handling such traffic in VM, 
but the bottleneck is in ESXi software switching layer, that's what my initial 
research shows across multiple hypervisors.

Thanks,
Rashmin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Selvaganapathy Chidambaram
Sent: Thursday, October 03, 2013 2:39 PM
To: dev at dpdk.org
Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine

Hello Everyone,

I have tried to run DPDK sample application l2fwd(modified to support multiple 
queues) in my ESX Virtual Machine. I see that performance is not scaling with 
cores. [My apologies for the long email]

*Setup:*

Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic of L3 
packet of length 1500 bytes (with four different flows) from Spirent through 
one port and received at the second port. Also sent traffic from reverse 
direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or  Direct 
path I/O.

*Emulated Driver:*

With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple cores 
did not improve the performance. On debugging I noticed that function 
eth_em_infos_get() says RSS is not supported.

*vmxnet3_usermap:*

Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again 
adding another core did not help. On debugging, I noticed that in vmxnet3 
kernel driver (in function vmxnet3_probe_device) , RSS is disabled if *
adapter->is_shm* is non zero. In our case, its 
adapter->VMXNET3_SHM_USERMAP_DRIVER
which is non zero.

Before trying to enable it, I would like to know if there is any known 
limitation why RSS is not enabled in both the drivers. Please help me 
understand.

*Hardware Configuration:*
Hardware  : Intel Xeon 2.4 Ghz 4 CPUs
Hyperthreading  : No
RAM : 16 GB
Hypervisor : ESXi 5.1
Ethernet: Intel 82599EB 10 Gig SFP


Guest VM : 2 vCPU, 2 GB RAM
GuestOS  : Centos 6.2 32 bit

Thanks in advance for your time and help!!!

Thanks,
Selva.


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread jigsaw
Hi Stephen,


>>This will work for local generated packets but overlapping existing field 
>>won't work well for forwarding.
So adding a new mss field in mbuf could be the way out? or I
misunderstand something.

>> What we want to be able to do is to take offload (jumbo) packets in with 
>> from virtio
Sorry I don't understand why TSO is connected to virtio. Could you
give more details here?
Are you suggesting this TSO patch overlaps your work, or it should be
based on your work?


thx &
rgds,
-Qinglai

On Fri, Oct 4, 2013 at 8:40 PM, Stephen Hemminger
 wrote:
> On Fri,  4 Oct 2013 20:06:52 +0300
> Qinglai Xiao  wrote:
>
>> This patch is a draft of TSO on 82599. That is, it is not expected to be
>> accepted as is.
>> The problem is where to put the mss field. In this patch, the mss is put in
>> the union of hash in rte_pktmbuf. It is not the best place, but it is quite
>> convenient, since hash is not used in TX procedure.
>> The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss
>> easy to access.
>>
>> However, the hash is also misleading, coz mss has nothing to do with Rx hash.
>> A more formal way could be rename hash as below:
>>
>>   union {
>>   uint32_t data;
>>   struct rx_hash hash;
>>   uint32_t tx_mss;
>>   } misc;
>>
>> It is gonna be a major change coz it affects the core data structure.
>>
>> Any comments will be appreciated.
>>
>> Qinglai Xiao (1):
>>   ixgbe: TCP/UDP segment offload support on 82599.
>>
>>  lib/librte_mbuf/rte_mbuf.h|6 +-
>>  lib/librte_pmd_ixgbe/ixgbe_rxtx.c |   32 +---
>>  2 files changed, 34 insertions(+), 4 deletions(-)
>>
>
> This will work for local generated packets but overlapping existing
> field won't work well for forwarding.
>
> What we want to be able to do is to take offload (jumbo) packets in
> with from virtio (need better driver support which I am doing), and then
> send them through to network devices.
>


[dpdk-dev] L2fwd Performance issue with Virtual Machine

2013-10-04 Thread Patel, Rashmin N
Correction: "you would NOT get optimal performance benefit having PMD"

Thanks,
Rashmin
-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Patel, Rashmin N
Sent: Friday, October 04, 2013 10:47 AM
To: Selvaganapathy Chidambaram
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] L2fwd Performance issue with Virtual Machine

Hi,

If you are not using SRIOV or direct device assignment to VM, your traffic hits 
vSwitch(via vmware native ixgbe driver and network stack) in the ESX and 
switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch is not 
optimized for PMD at present so you would get optimal performance benefit 
having PMD, I believe.

For the RSS front, I would say you won't see much difference with RSS enabled 
for 1500 bytes frames. In fact, core is capable of handling such traffic in VM, 
but the bottleneck is in ESXi software switching layer, that's what my initial 
research shows across multiple hypervisors.

Thanks,
Rashmin

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Selvaganapathy Chidambaram
Sent: Thursday, October 03, 2013 2:39 PM
To: dev at dpdk.org
Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine

Hello Everyone,

I have tried to run DPDK sample application l2fwd(modified to support multiple 
queues) in my ESX Virtual Machine. I see that performance is not scaling with 
cores. [My apologies for the long email]

*Setup:*

Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic of L3 
packet of length 1500 bytes (with four different flows) from Spirent through 
one port and received at the second port. Also sent traffic from reverse 
direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or  Direct 
path I/O.

*Emulated Driver:*

With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple cores 
did not improve the performance. On debugging I noticed that function 
eth_em_infos_get() says RSS is not supported.

*vmxnet3_usermap:*

Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again 
adding another core did not help. On debugging, I noticed that in vmxnet3 
kernel driver (in function vmxnet3_probe_device) , RSS is disabled if *
adapter->is_shm* is non zero. In our case, its 
adapter->VMXNET3_SHM_USERMAP_DRIVER
which is non zero.

Before trying to enable it, I would like to know if there is any known 
limitation why RSS is not enabled in both the drivers. Please help me 
understand.

*Hardware Configuration:*
Hardware  : Intel Xeon 2.4 Ghz 4 CPUs
Hyperthreading  : No
RAM : 16 GB
Hypervisor : ESXi 5.1
Ethernet: Intel 82599EB 10 Gig SFP


Guest VM : 2 vCPU, 2 GB RAM
GuestOS  : Centos 6.2 32 bit

Thanks in advance for your time and help!!!

Thanks,
Selva.


[dpdk-dev] Need comment on 82599 TSO

2013-10-04 Thread Venkatesan, Venky
Stephen, 

Agree. Growing to two cache lines is an inevitability. Re-organizing the mbuf a 
bit to alleviate some of the immediate space with as minimal a performance as 
possible (including separating the QoS fields out completely into its own 
separate area) is a good idea - the first cache line would be packet + mbuf 
related information, the second more of the metadata that we need. Any 
suggestions on how many bytes would be needed for QoS? 

Qinglai, 

For your TSO implementation patch, let's work the patch as is (assuming the 
mbuf grows) - add the 16 bits into the pktmbuf structure (and it will grow 
beyond a cache line). We can get some performance numbers for the standard 
benchmarks. I will look at a few ideas to free up some space in the mbuf to 
keep the packet related stuff within the first cache line while keeping 
performance close to where it is today.

Regards, 
-Venky


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Friday, October 04, 2013 9:41 AM
To: jigsaw
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] Need comment on 82599 TSO

On Fri, 4 Oct 2013 15:44:19 +0300
jigsaw  wrote:

> Hi,
> 
> I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS.
> 
> TSO must be aware of MSS, or gso in skb of kernel.
> But MSS nees 16 bits per mbuf. And we have no spare 16 bits in 
> rte_mbuf or rte_pktmbuf.
> If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be 
> doubled, coz currently the size is at the edge of cacheline(32 byte).
> 
> I have two solutions here:
> 
> 1. Store MSS in struct rte_eth_conf.
> This is actually a very bad idea, coz MSS is not bound to device.
> 
> 2. Turn on and off TSO with rte_ctrlmbuf.
> I found that rte_ctrlmbuf is not used at all. So it could be the first 
> use case of it.
> With rte_ctrlmbuf we have enough space to store MSS.
> 
> Looking forward to your comments.
> 
> thx &
> rgds,
> -Qinglai

The mbuf needs to grow to 2 cache lines. There are other things that need to be 
added to mbuf eventually as well. For example the QoS bitfield is too small 
when crammed into 32 bits. Ideally the normal small packet stuff would be in 
the first cacheline; and the other part of the struct would have things less 
likely to be used.


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread Stephen Hemminger
On Fri, 4 Oct 2013 20:54:31 +0300
jigsaw  wrote:

> Hi Stephen,
> 
> 
> >>This will work for local generated packets but overlapping existing field 
> >>won't work well for forwarding.
> So adding a new mss field in mbuf could be the way out? or I
> misunderstand something.
> 
> >> What we want to be able to do is to take offload (jumbo) packets in with 
> >> from virtio
> Sorry I don't understand why TSO is connected to virtio. Could you
> give more details here?
> Are you suggesting this TSO patch overlaps your work, or it should be
> based on your work?

I am working on a better virtio driver. Already have lots more features working,
and doing better offload support is planned.

TSO is a subset of the more generic segment offload (GSO) on Linux.
With virtio is possible to receive GSO packets as well as send them.
This feature is negotiated between guest and host.

The idea is that between guests they can exchange jumbo (64K) packets even with
a smaller MTU. This helps in many ways. One example is only a single
route lookup is needed.

Another issue is that the current DPDK model of offload flags for checksum is 
problematic.
It matches what is available in Intel hardware and is not easily generalizable 
to other
devices.

Current DPDK flag is checksum bad. I would like to change it to checksum known
good. Then drivers which dont' do checksum would leave it 0, but if receive
checksum is known good set it to 1.  Basically 1 means known good, and
0 means unknown (or bad).  Higher level software can then do sw checksum
if necessary.


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread Venkatesan, Venky
Stephen, 

Agree on the checksum flag definition. I'm presuming that we should do this on 
the L3 and L4 checksums separately (that ol_flags field is another one that 
needs extension in the mbuf). 

Regards, 
-Venky


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Friday, October 04, 2013 11:23 AM
To: jigsaw
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

On Fri, 4 Oct 2013 20:54:31 +0300
jigsaw  wrote:

> Hi Stephen,
> 
> 
> >>This will work for local generated packets but overlapping existing field 
> >>won't work well for forwarding.
> So adding a new mss field in mbuf could be the way out? or I 
> misunderstand something.
> 
> >> What we want to be able to do is to take offload (jumbo) packets in 
> >> with from virtio
> Sorry I don't understand why TSO is connected to virtio. Could you 
> give more details here?
> Are you suggesting this TSO patch overlaps your work, or it should be 
> based on your work?

I am working on a better virtio driver. Already have lots more features 
working, and doing better offload support is planned.

TSO is a subset of the more generic segment offload (GSO) on Linux.
With virtio is possible to receive GSO packets as well as send them.
This feature is negotiated between guest and host.

The idea is that between guests they can exchange jumbo (64K) packets even with 
a smaller MTU. This helps in many ways. One example is only a single route 
lookup is needed.

Another issue is that the current DPDK model of offload flags for checksum is 
problematic.
It matches what is available in Intel hardware and is not easily generalizable 
to other devices.

Current DPDK flag is checksum bad. I would like to change it to checksum known 
good. Then drivers which dont' do checksum would leave it 0, but if receive 
checksum is known good set it to 1.  Basically 1 means known good, and
0 means unknown (or bad).  Higher level software can then do sw checksum if 
necessary.


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread jigsaw
Hi Stephen,

Thanks for showing a bigger picture.

GSO is quite big implementation, that I think it won't be easily
ported to DPDK. The mbuf needs to be equipped with many fields from
skb to be able to deal with GSO.
Do you have the plan to port GSO to DPDK, or you would like to keep
GSO in scope of virtio?

Regarding checksum flags, actually I was also thinking of extending
ol_flags but then I gave it up coz I was worried about the size of
mbuf.
My current patch has to push some work to user, due to the fact that
mbuf delivers too few info (such as L2 and L3 protocol details).

Besides, as you mentioned, the ixgbe driver doesn't leverage the
hardware receive checksum offloading at all. And if this is to be
supported, the checksum flag need further extension.
(On the other hand, TSO doesn't care about receive checksum offloading).
Again, do you have plans to extend cksum flags so that virio feels
more comfortable with DPDK?

Hi Venky,

I can either make the commit now as is, or wait till the cksum flags
extension is in place. If Stephen (or somebody else) has the plan for
better support for cksum offloading or GSO, it is perhaps better to
implement TSO on top of that.

BTW, I have another small question. Current TSO patch offloads the
TCP/IP pseudo cksum work to user. Do you think DPDK could provide some
utility functions for TCP/IPv4/IPv6 pseudo cksum calculation and
updating?

thx &
rgds,
-Qinglai


On Fri, Oct 4, 2013 at 9:38 PM, Venkatesan, Venky
 wrote:
> Stephen,
>
> Agree on the checksum flag definition. I'm presuming that we should do this 
> on the L3 and L4 checksums separately (that ol_flags field is another one 
> that needs extension in the mbuf).
>
> Regards,
> -Venky
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
> Sent: Friday, October 04, 2013 11:23 AM
> To: jigsaw
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
>
> On Fri, 4 Oct 2013 20:54:31 +0300
> jigsaw  wrote:
>
>> Hi Stephen,
>>
>>
>> >>This will work for local generated packets but overlapping existing field 
>> >>won't work well for forwarding.
>> So adding a new mss field in mbuf could be the way out? or I
>> misunderstand something.
>>
>> >> What we want to be able to do is to take offload (jumbo) packets in
>> >> with from virtio
>> Sorry I don't understand why TSO is connected to virtio. Could you
>> give more details here?
>> Are you suggesting this TSO patch overlaps your work, or it should be
>> based on your work?
>
> I am working on a better virtio driver. Already have lots more features 
> working, and doing better offload support is planned.
>
> TSO is a subset of the more generic segment offload (GSO) on Linux.
> With virtio is possible to receive GSO packets as well as send them.
> This feature is negotiated between guest and host.
>
> The idea is that between guests they can exchange jumbo (64K) packets even 
> with a smaller MTU. This helps in many ways. One example is only a single 
> route lookup is needed.
>
> Another issue is that the current DPDK model of offload flags for checksum is 
> problematic.
> It matches what is available in Intel hardware and is not easily 
> generalizable to other devices.
>
> Current DPDK flag is checksum bad. I would like to change it to checksum 
> known good. Then drivers which dont' do checksum would leave it 0, but if 
> receive checksum is known good set it to 1.  Basically 1 means known good, and
> 0 means unknown (or bad).  Higher level software can then do sw checksum if 
> necessary.


[dpdk-dev] Multi-process on the same host

2013-10-04 Thread Walter de Donato
Thanks a lot Bruce,

I started looking at the multi-process examples - where this case is not
considered - and I missed that section in the programmer's guide.

Regards,
-Walter

Walter de Donato, Ph.D.
PostDoc @ Department of Electrical Engineering and Information Technologies
University of Napoli Federico II
Via Claudio 21 -- 80125 Napoli (Italy)
Phone: +39 081 76 83821 - Fax: +39 081 76 83816
Email: walter.dedonato at unina.it
WWW: http://wpage.unina.it/walter.dedonato


2013/10/4 Richardson, Bruce 

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen
> > Hemminger
> > Sent: Friday, October 04, 2013 5:39 PM
> > To: Walter de Donato
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] Multi-process on the same host
> >
> > On Fri, 4 Oct 2013 13:47:02 +0200
> > Walter de Donato  wrote:
> >
> > > Hello,
> > >
> > > I've been using DPDK for a while and now I encountered the following
> > issue:
> > > when I try to run two primary processes on the same host (with
> > > --no-shconf option enabled) respectively sending packets on one port
> > > and receiving them on a different port (the two ports are directly
> > > connected with a CAT-6 cable), I get this error on the receiving
> process:
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000',
> > queue_id=0,
> > > rx_pkts=0x75baa8f0, nb_pkts=128) at
> > > /home/devel/dpdk/build/include/rte_ethdev.h:1658
> > > 1658return (*dev->rx_pkt_burst)(dev->data-
> > >rx_queues[queue_id],
> > > rx_pkts, nb_pkts);
> > >
> > > To give some more details:
> > > - the options given to the two processes:
> > >   ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1
> > >   ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2
> > >   where the -p option is the binary mask to select the ports to enable.
> > > - the network card is a dualport Intel X540:
> > >   port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2
> (rev
> > 01)
> > >   port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2
> > > (rev 01)
> > > - this is the hugeadm --pool-list output:
> > >   Size  Minimum  Current  Maximum  Default
> > >   1073741824222*
> > >
> > > My first question is: should it be possible to let separate primary
> > > processes coexist if they use different resources (cores, ports,
> > > memory pools)?
> > >
> > > A second question is: there is any other workaround to let this
> > > scenario work without merging the two processes into two lcores of the
> > same process?
> > >
> > > Thanks in advance,
> > > -Walter
> >
> > The problem is that huge TLB filesystem is a shared resource.
> > Because of that the memory pools of the two processes overlap, and
> > memory pools are used for packet buffers, malloc, etc.
> >
> > You might be able to use no-huge, but then other things would probably
> > break.
>
> The way to run two primary processes side by side is documented in the
> document "Intel(r) Data Plane Development Kit (Intel(r) DPDK): Programmer's
> Guide" available at:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-programmers-guide.htmland
>  is covered in section 17.2.3. You need to pass the "--file-prefix" flag
> when running your application to force the processes to use different
> hugepage files so they are not shared among the two processes.
>
> Regards,
> /Bruce
> --
> Intel Shannon Limited
> Registered in Ireland
> Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
> Registered Number: 308263
> Business address: Dromore House, East Park, Shannon, Co. Clare
>
> This e-mail and any attachments may contain confidential material for the
> sole use of the intended recipient(s). Any review or distribution by others
> is strictly prohibited. If you are not the intended recipient, please
> contact the sender and delete all copies.
>
>
>
>


[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support

2013-10-04 Thread Stephen Hemminger
On Fri, 4 Oct 2013 22:10:33 +0300
jigsaw  wrote:

> Hi Stephen,
> 
> Thanks for showing a bigger picture.
> 
> GSO is quite big implementation, that I think it won't be easily
> ported to DPDK. The mbuf needs to be equipped with many fields from
> skb to be able to deal with GSO.
> Do you have the plan to port GSO to DPDK, or you would like to keep
> GSO in scope of virtio?
> 
> Regarding checksum flags, actually I was also thinking of extending
> ol_flags but then I gave it up coz I was worried about the size of
> mbuf.
> My current patch has to push some work to user, due to the fact that
> mbuf delivers too few info (such as L2 and L3 protocol details).
> 
> Besides, as you mentioned, the ixgbe driver doesn't leverage the
> hardware receive checksum offloading at all. And if this is to be
> supported, the checksum flag need further extension.
> (On the other hand, TSO doesn't care about receive checksum offloading).
> Again, do you have plans to extend cksum flags so that virio feels
> more comfortable with DPDK?
> 
> Hi Venky,
> 
> I can either make the commit now as is, or wait till the cksum flags
> extension is in place. If Stephen (or somebody else) has the plan for
> better support for cksum offloading or GSO, it is perhaps better to
> implement TSO on top of that.
> 
> BTW, I have another small question. Current TSO patch offloads the
> TCP/IP pseudo cksum work to user. Do you think DPDK could provide some
> utility functions for TCP/IPv4/IPv6 pseudo cksum calculation and
> updating?
> 
> thx &
> rgds,
> -Qinglai

I want to get Tx checksum offload in virtio working first.
Just looking ahead to Rx.



[dpdk-dev] L2fwd Performance issue with Virtual Machine

2013-10-04 Thread Selvaganapathy Chidambaram
Thanks Rashmin for your time and help!

So it looks like with the given hardware config, we could probably only
achieve around 8 Gbps in VM without using SRIOV. Once DPDK is used in
vSwitch design, we could gain more performance.


Thanks,
Selvaganapathy.C.


On Fri, Oct 4, 2013 at 11:02 AM, Patel, Rashmin N  wrote:

> Correction: "you would NOT get optimal performance benefit having PMD"
>
> Thanks,
> Rashmin
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Patel, Rashmin N
> Sent: Friday, October 04, 2013 10:47 AM
> To: Selvaganapathy Chidambaram
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] L2fwd Performance issue with Virtual Machine
>
> Hi,
>
> If you are not using SRIOV or direct device assignment to VM, your traffic
> hits vSwitch(via vmware native ixgbe driver and network stack) in the ESX
> and switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch
> is not optimized for PMD at present so you would get optimal performance
> benefit having PMD, I believe.
>
> For the RSS front, I would say you won't see much difference with RSS
> enabled for 1500 bytes frames. In fact, core is capable of handling such
> traffic in VM, but the bottleneck is in ESXi software switching layer,
> that's what my initial research shows across multiple hypervisors.
>
> Thanks,
> Rashmin
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Selvaganapathy
> Chidambaram
> Sent: Thursday, October 03, 2013 2:39 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine
>
> Hello Everyone,
>
> I have tried to run DPDK sample application l2fwd(modified to support
> multiple queues) in my ESX Virtual Machine. I see that performance is not
> scaling with cores. [My apologies for the long email]
>
> *Setup:*
>
> Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic
> of L3 packet of length 1500 bytes (with four different flows) from Spirent
> through one port and received at the second port. Also sent traffic from
> reverse direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or
>  Direct path I/O.
>
> *Emulated Driver:*
>
> With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple
> cores did not improve the performance. On debugging I noticed that function
> eth_em_infos_get() says RSS is not supported.
>
> *vmxnet3_usermap:*
>
> Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again
> adding another core did not help. On debugging, I noticed that in vmxnet3
> kernel driver (in function vmxnet3_probe_device) , RSS is disabled if *
> adapter->is_shm* is non zero. In our case, its
> adapter->VMXNET3_SHM_USERMAP_DRIVER
> which is non zero.
>
> Before trying to enable it, I would like to know if there is any known
> limitation why RSS is not enabled in both the drivers. Please help me
> understand.
>
> *Hardware Configuration:*
> Hardware  : Intel Xeon 2.4 Ghz 4 CPUs
> Hyperthreading  : No
> RAM : 16 GB
> Hypervisor : ESXi 5.1
> Ethernet: Intel 82599EB 10 Gig SFP
>
>
> Guest VM : 2 vCPU, 2 GB RAM
> GuestOS  : Centos 6.2 32 bit
>
> Thanks in advance for your time and help!!!
>
> Thanks,
> Selva.
>