[dpdk-dev] Multi-process on the same host
Hello, I've been using DPDK for a while and now I encountered the following issue: when I try to run two primary processes on the same host (with --no-shconf option enabled) respectively sending packets on one port and receiving them on a different port (the two ports are directly connected with a CAT-6 cable), I get this error on the receiving process: Program received signal SIGSEGV, Segmentation fault. 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', queue_id=0, rx_pkts=0x75baa8f0, nb_pkts=128) at /home/devel/dpdk/build/include/rte_ethdev.h:1658 1658return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], rx_pkts, nb_pkts); To give some more details: - the options given to the two processes: ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1 ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2 where the -p option is the binary mask to select the ports to enable. - the network card is a dualport Intel X540: port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01) port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01) - this is the hugeadm --pool-list output: Size Minimum Current Maximum Default 1073741824222* My first question is: should it be possible to let separate primary processes coexist if they use different resources (cores, ports, memory pools)? A second question is: there is any other workaround to let this scenario work without merging the two processes into two lcores of the same process? Thanks in advance, -Walter Walter de Donato, Ph.D. PostDoc @ Department of Electrical Engineering and Information Technologies University of Napoli Federico II Via Claudio 21 -- 80125 Napoli (Italy) Phone: +39 081 76 83821 - Fax: +39 081 76 83816 Email: walter.dedonato at unina.it WWW: http://wpage.unina.it/walter.dedonato
[dpdk-dev] Need comment on 82599 TSO
Hi, I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS. TSO must be aware of MSS, or gso in skb of kernel. But MSS nees 16 bits per mbuf. And we have no spare 16 bits in rte_mbuf or rte_pktmbuf. If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be doubled, coz currently the size is at the edge of cacheline(32 byte). I have two solutions here: 1. Store MSS in struct rte_eth_conf. This is actually a very bad idea, coz MSS is not bound to device. 2. Turn on and off TSO with rte_ctrlmbuf. I found that rte_ctrlmbuf is not used at all. So it could be the first use case of it. With rte_ctrlmbuf we have enough space to store MSS. Looking forward to your comments. thx & rgds, -Qinglai
[dpdk-dev] Multi-process on the same host
On Fri, 4 Oct 2013 13:47:02 +0200 Walter de Donato wrote: > Hello, > > I've been using DPDK for a while and now I encountered the following issue: > when I try to run two primary processes on the same host (with --no-shconf > option enabled) respectively sending packets on one port and receiving them > on a different port (the two ports are directly connected with a CAT-6 > cable), I get this error on the receiving process: > > Program received signal SIGSEGV, Segmentation fault. > 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', queue_id=0, > rx_pkts=0x75baa8f0, nb_pkts=128) at > /home/devel/dpdk/build/include/rte_ethdev.h:1658 > 1658return (*dev->rx_pkt_burst)(dev->data->rx_queues[queue_id], > rx_pkts, nb_pkts); > > To give some more details: > - the options given to the two processes: > ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1 > ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2 > where the -p option is the binary mask to select the ports to enable. > - the network card is a dualport Intel X540: > port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01) > port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev 01) > - this is the hugeadm --pool-list output: > Size Minimum Current Maximum Default > 1073741824222* > > My first question is: should it be possible to let separate primary > processes coexist if they use different resources (cores, ports, memory > pools)? > > A second question is: there is any other workaround to let this scenario > work without merging the two processes into two lcores of the same process? > > Thanks in advance, > -Walter The problem is that huge TLB filesystem is a shared resource. Because of that the memory pools of the two processes overlap, and memory pools are used for packet buffers, malloc, etc. You might be able to use no-huge, but then other things would probably break.
[dpdk-dev] Need comment on 82599 TSO
On Fri, 4 Oct 2013 15:44:19 +0300 jigsaw wrote: > Hi, > > I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS. > > TSO must be aware of MSS, or gso in skb of kernel. > But MSS nees 16 bits per mbuf. And we have no spare 16 bits in > rte_mbuf or rte_pktmbuf. > If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be > doubled, coz currently the size is at the edge of cacheline(32 byte). > > I have two solutions here: > > 1. Store MSS in struct rte_eth_conf. > This is actually a very bad idea, coz MSS is not bound to device. > > 2. Turn on and off TSO with rte_ctrlmbuf. > I found that rte_ctrlmbuf is not used at all. So it could be the first > use case of it. > With rte_ctrlmbuf we have enough space to store MSS. > > Looking forward to your comments. > > thx & > rgds, > -Qinglai The mbuf needs to grow to 2 cache lines. There are other things that need to be added to mbuf eventually as well. For example the QoS bitfield is too small when crammed into 32 bits. Ideally the normal small packet stuff would be in the first cacheline; and the other part of the struct would have things less likely to be used.
[dpdk-dev] Multi-process on the same host
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > Hemminger > Sent: Friday, October 04, 2013 5:39 PM > To: Walter de Donato > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Multi-process on the same host > > On Fri, 4 Oct 2013 13:47:02 +0200 > Walter de Donato wrote: > > > Hello, > > > > I've been using DPDK for a while and now I encountered the following > issue: > > when I try to run two primary processes on the same host (with > > --no-shconf option enabled) respectively sending packets on one port > > and receiving them on a different port (the two ports are directly > > connected with a CAT-6 cable), I get this error on the receiving process: > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', > queue_id=0, > > rx_pkts=0x75baa8f0, nb_pkts=128) at > > /home/devel/dpdk/build/include/rte_ethdev.h:1658 > > 1658return (*dev->rx_pkt_burst)(dev->data- > >rx_queues[queue_id], > > rx_pkts, nb_pkts); > > > > To give some more details: > > - the options given to the two processes: > > ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1 > > ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2 > > where the -p option is the binary mask to select the ports to enable. > > - the network card is a dualport Intel X540: > > port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 (rev > 01) > > port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 > > (rev 01) > > - this is the hugeadm --pool-list output: > > Size Minimum Current Maximum Default > > 1073741824222* > > > > My first question is: should it be possible to let separate primary > > processes coexist if they use different resources (cores, ports, > > memory pools)? > > > > A second question is: there is any other workaround to let this > > scenario work without merging the two processes into two lcores of the > same process? > > > > Thanks in advance, > > -Walter > > The problem is that huge TLB filesystem is a shared resource. > Because of that the memory pools of the two processes overlap, and > memory pools are used for packet buffers, malloc, etc. > > You might be able to use no-huge, but then other things would probably > break. The way to run two primary processes side by side is documented in the document "Intel(r) Data Plane Development Kit (Intel(r) DPDK): Programmer's Guide" available at: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-programmers-guide.html and is covered in section 17.2.3. You need to pass the "--file-prefix" flag when running your application to force the processes to use different hugepage files so they are not shared among the two processes. Regards, /Bruce -- Intel Shannon Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 Business address: Dromore House, East Park, Shannon, Co. Clare This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
This patch is a draft of TSO on 82599. That is, it is not expected to be accepted as is. The problem is where to put the mss field. In this patch, the mss is put in the union of hash in rte_pktmbuf. It is not the best place, but it is quite convenient, since hash is not used in TX procedure. The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss easy to access. However, the hash is also misleading, coz mss has nothing to do with Rx hash. A more formal way could be rename hash as below: union { uint32_t data; struct rx_hash hash; uint32_t tx_mss; } misc; It is gonna be a major change coz it affects the core data structure. Any comments will be appreciated. Qinglai Xiao (1): ixgbe: TCP/UDP segment offload support on 82599. lib/librte_mbuf/rte_mbuf.h|6 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 32 +--- 2 files changed, 34 insertions(+), 4 deletions(-) -- 1.7.10.4
[dpdk-dev] [PATCH] ixgbe: TCP/UDP segment offload support on 82599.
Add support for TCP/UDP segment offload on 82599. User can turn on TSO by setting MSS in the first frame. Meantime, the L2 and L3 len, together with offload flags must be set in the first frame accordingly. Otherwise the driver will cease the sending. --- lib/librte_mbuf/rte_mbuf.h|6 +- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 32 +--- 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index d914562..ea4bb88 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -159,6 +159,10 @@ struct rte_pktmbuf { uint16_t id; } fdir; /**< Filter identifier if FDIR enabled */ uint32_t sched; /**< Hierarchical scheduler */ + uint16_t mss; /**< Maximum Segment Size. If more than zero, +then TSO is enabled. User is responsible +for setting vlan_macip and TCP/IP cksum +accordingly. */ } hash; /**< hash information */ }; @@ -195,7 +199,7 @@ struct rte_mbuf { uint16_t refcnt_reserved; /**< Do not use this field */ #endif uint8_t type; /**< Type of mbuf. */ - uint8_t reserved; /**< Unused field. Required for padding. */ + uint8_t reserved; /**< Unused field. Required for padding. */ uint16_t ol_flags;/**< Offload features. */ union { diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 5c8668e..63d7f8a 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c @@ -498,7 +498,7 @@ ixgbe_xmit_pkts_simple(void *tx_queue, struct rte_mbuf **tx_pkts, static inline void ixgbe_set_xmit_ctx(struct igb_tx_queue* txq, volatile struct ixgbe_adv_tx_context_desc *ctx_txd, - uint16_t ol_flags, uint32_t vlan_macip_lens) + uint16_t ol_flags, uint32_t vlan_macip_lens, uint16_t mss) { uint32_t type_tucmd_mlhl; uint32_t mss_l4len_idx; @@ -520,6 +520,10 @@ ixgbe_set_xmit_ctx(struct igb_tx_queue* txq, /* Specify which HW CTX to upload. */ mss_l4len_idx = (ctx_idx << IXGBE_ADVTXD_IDX_SHIFT); + + /* MSS is reqired for TSO. The user must set mss accordingly */ + mss_l4len_idx |= mss << IXGBE_ADVTXD_MSS_SHIFT; + switch (ol_flags & PKT_TX_L4_MASK) { case PKT_TX_UDP_CKSUM: type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP | @@ -694,6 +698,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint32_t vlan_macip_lens; uint32_t ctx = 0; uint32_t new_ctx; + uint16_t mss; txq = tx_queue; sw_ring = txq->sw_ring; @@ -719,10 +724,25 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, * are needed for offload functionality. */ ol_flags = tx_pkt->ol_flags; + vlan_macip_lens = tx_pkt->pkt.vlan_macip.data; + mss = tx_pkt->pkt.hash.mss; /* If hardware offload required */ tx_ol_req = (uint16_t)(ol_flags & PKT_TX_OFFLOAD_MASK); + + /* +* If mss is set, we assume TSO is required. +* +* If TSO is turned on, the caller must set the offload bits +* accordingly, otherwise we have to drop the packet, because +* we have no knowledge of L2 or L3. +*/ + if (!tx_ol_req && mss) { + PMD_TX_LOG(DEBUG, "TSO set without offload bits. Abort sending."); + goto end_of_tx; + } + if (tx_ol_req) { /* If new context need be built or reuse the exist ctx. */ ctx = what_advctx_update(txq, tx_ol_req, @@ -841,6 +861,11 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, */ cmd_type_len = IXGBE_ADVTXD_DTYP_DATA | IXGBE_ADVTXD_DCMD_IFCS | IXGBE_ADVTXD_DCMD_DEXT; + + /* Enable TSE bit for TSO */ + if (mss) + cmd_type_len |= IXGBE_ADVTXD_DCMD_TSE; + olinfo_status = (pkt_len << IXGBE_ADVTXD_PAYLEN_SHIFT); #ifdef RTE_LIBRTE_IEEE1588 if (ol_flags & PKT_TX_IEEE1588_TMST) @@ -868,7 +893,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, } ixgbe_set_xmit_ctx(txq, ctx_txd, tx_ol_req, - vlan_macip_lens); + vlan_macip_lens, mss); txe->last_id = tx_last; tx_id = txe->next_id; @@ -339
[dpdk-dev] Need comment on 82599 TSO
Hi Stephen, Thanks for comment. Pls check the other thread that I just posted. thx & rgds, -Qinglai On Fri, Oct 4, 2013 at 7:41 PM, Stephen Hemminger wrote: > On Fri, 4 Oct 2013 15:44:19 +0300 > jigsaw wrote: > >> Hi, >> >> I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS. >> >> TSO must be aware of MSS, or gso in skb of kernel. >> But MSS nees 16 bits per mbuf. And we have no spare 16 bits in >> rte_mbuf or rte_pktmbuf. >> If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be >> doubled, coz currently the size is at the edge of cacheline(32 byte). >> >> I have two solutions here: >> >> 1. Store MSS in struct rte_eth_conf. >> This is actually a very bad idea, coz MSS is not bound to device. >> >> 2. Turn on and off TSO with rte_ctrlmbuf. >> I found that rte_ctrlmbuf is not used at all. So it could be the first >> use case of it. >> With rte_ctrlmbuf we have enough space to store MSS. >> >> Looking forward to your comments. >> >> thx & >> rgds, >> -Qinglai > > The mbuf needs to grow to 2 cache lines. There are other things that need > to be added to mbuf eventually as well. For example the QoS bitfield is > too small when crammed into 32 bits. Ideally the normal small packet > stuff would be in the first cacheline; and the other part of the struct > would have things less likely to be used.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
On Fri, 4 Oct 2013 20:06:52 +0300 Qinglai Xiao wrote: > This patch is a draft of TSO on 82599. That is, it is not expected to be > accepted as is. > The problem is where to put the mss field. In this patch, the mss is put in > the union of hash in rte_pktmbuf. It is not the best place, but it is quite > convenient, since hash is not used in TX procedure. > The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss > easy to access. > > However, the hash is also misleading, coz mss has nothing to do with Rx hash. > A more formal way could be rename hash as below: > > union { > uint32_t data; > struct rx_hash hash; > uint32_t tx_mss; > } misc; > > It is gonna be a major change coz it affects the core data structure. > > Any comments will be appreciated. > > Qinglai Xiao (1): > ixgbe: TCP/UDP segment offload support on 82599. > > lib/librte_mbuf/rte_mbuf.h|6 +- > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 32 +--- > 2 files changed, 34 insertions(+), 4 deletions(-) > This will work for local generated packets but overlapping existing field won't work well for forwarding. What we want to be able to do is to take offload (jumbo) packets in with from virtio (need better driver support which I am doing), and then send them through to network devices.
[dpdk-dev] L2fwd Performance issue with Virtual Machine
Hi, If you are not using SRIOV or direct device assignment to VM, your traffic hits vSwitch(via vmware native ixgbe driver and network stack) in the ESX and switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch is not optimized for PMD at present so you would get optimal performance benefit having PMD, I believe. For the RSS front, I would say you won't see much difference with RSS enabled for 1500 bytes frames. In fact, core is capable of handling such traffic in VM, but the bottleneck is in ESXi software switching layer, that's what my initial research shows across multiple hypervisors. Thanks, Rashmin -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Selvaganapathy Chidambaram Sent: Thursday, October 03, 2013 2:39 PM To: dev at dpdk.org Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine Hello Everyone, I have tried to run DPDK sample application l2fwd(modified to support multiple queues) in my ESX Virtual Machine. I see that performance is not scaling with cores. [My apologies for the long email] *Setup:* Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic of L3 packet of length 1500 bytes (with four different flows) from Spirent through one port and received at the second port. Also sent traffic from reverse direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or Direct path I/O. *Emulated Driver:* With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple cores did not improve the performance. On debugging I noticed that function eth_em_infos_get() says RSS is not supported. *vmxnet3_usermap:* Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again adding another core did not help. On debugging, I noticed that in vmxnet3 kernel driver (in function vmxnet3_probe_device) , RSS is disabled if * adapter->is_shm* is non zero. In our case, its adapter->VMXNET3_SHM_USERMAP_DRIVER which is non zero. Before trying to enable it, I would like to know if there is any known limitation why RSS is not enabled in both the drivers. Please help me understand. *Hardware Configuration:* Hardware : Intel Xeon 2.4 Ghz 4 CPUs Hyperthreading : No RAM : 16 GB Hypervisor : ESXi 5.1 Ethernet: Intel 82599EB 10 Gig SFP Guest VM : 2 vCPU, 2 GB RAM GuestOS : Centos 6.2 32 bit Thanks in advance for your time and help!!! Thanks, Selva.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
Hi Stephen, >>This will work for local generated packets but overlapping existing field >>won't work well for forwarding. So adding a new mss field in mbuf could be the way out? or I misunderstand something. >> What we want to be able to do is to take offload (jumbo) packets in with >> from virtio Sorry I don't understand why TSO is connected to virtio. Could you give more details here? Are you suggesting this TSO patch overlaps your work, or it should be based on your work? thx & rgds, -Qinglai On Fri, Oct 4, 2013 at 8:40 PM, Stephen Hemminger wrote: > On Fri, 4 Oct 2013 20:06:52 +0300 > Qinglai Xiao wrote: > >> This patch is a draft of TSO on 82599. That is, it is not expected to be >> accepted as is. >> The problem is where to put the mss field. In this patch, the mss is put in >> the union of hash in rte_pktmbuf. It is not the best place, but it is quite >> convenient, since hash is not used in TX procedure. >> The idea is to avoid increasing sizeof(struct rte_pktmbuf), while keeping mss >> easy to access. >> >> However, the hash is also misleading, coz mss has nothing to do with Rx hash. >> A more formal way could be rename hash as below: >> >> union { >> uint32_t data; >> struct rx_hash hash; >> uint32_t tx_mss; >> } misc; >> >> It is gonna be a major change coz it affects the core data structure. >> >> Any comments will be appreciated. >> >> Qinglai Xiao (1): >> ixgbe: TCP/UDP segment offload support on 82599. >> >> lib/librte_mbuf/rte_mbuf.h|6 +- >> lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 32 +--- >> 2 files changed, 34 insertions(+), 4 deletions(-) >> > > This will work for local generated packets but overlapping existing > field won't work well for forwarding. > > What we want to be able to do is to take offload (jumbo) packets in > with from virtio (need better driver support which I am doing), and then > send them through to network devices. >
[dpdk-dev] L2fwd Performance issue with Virtual Machine
Correction: "you would NOT get optimal performance benefit having PMD" Thanks, Rashmin -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Patel, Rashmin N Sent: Friday, October 04, 2013 10:47 AM To: Selvaganapathy Chidambaram Cc: dev at dpdk.org Subject: Re: [dpdk-dev] L2fwd Performance issue with Virtual Machine Hi, If you are not using SRIOV or direct device assignment to VM, your traffic hits vSwitch(via vmware native ixgbe driver and network stack) in the ESX and switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch is not optimized for PMD at present so you would get optimal performance benefit having PMD, I believe. For the RSS front, I would say you won't see much difference with RSS enabled for 1500 bytes frames. In fact, core is capable of handling such traffic in VM, but the bottleneck is in ESXi software switching layer, that's what my initial research shows across multiple hypervisors. Thanks, Rashmin -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Selvaganapathy Chidambaram Sent: Thursday, October 03, 2013 2:39 PM To: dev at dpdk.org Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine Hello Everyone, I have tried to run DPDK sample application l2fwd(modified to support multiple queues) in my ESX Virtual Machine. I see that performance is not scaling with cores. [My apologies for the long email] *Setup:* Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic of L3 packet of length 1500 bytes (with four different flows) from Spirent through one port and received at the second port. Also sent traffic from reverse direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or Direct path I/O. *Emulated Driver:* With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple cores did not improve the performance. On debugging I noticed that function eth_em_infos_get() says RSS is not supported. *vmxnet3_usermap:* Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again adding another core did not help. On debugging, I noticed that in vmxnet3 kernel driver (in function vmxnet3_probe_device) , RSS is disabled if * adapter->is_shm* is non zero. In our case, its adapter->VMXNET3_SHM_USERMAP_DRIVER which is non zero. Before trying to enable it, I would like to know if there is any known limitation why RSS is not enabled in both the drivers. Please help me understand. *Hardware Configuration:* Hardware : Intel Xeon 2.4 Ghz 4 CPUs Hyperthreading : No RAM : 16 GB Hypervisor : ESXi 5.1 Ethernet: Intel 82599EB 10 Gig SFP Guest VM : 2 vCPU, 2 GB RAM GuestOS : Centos 6.2 32 bit Thanks in advance for your time and help!!! Thanks, Selva.
[dpdk-dev] Need comment on 82599 TSO
Stephen, Agree. Growing to two cache lines is an inevitability. Re-organizing the mbuf a bit to alleviate some of the immediate space with as minimal a performance as possible (including separating the QoS fields out completely into its own separate area) is a good idea - the first cache line would be packet + mbuf related information, the second more of the metadata that we need. Any suggestions on how many bytes would be needed for QoS? Qinglai, For your TSO implementation patch, let's work the patch as is (assuming the mbuf grows) - add the 16 bits into the pktmbuf structure (and it will grow beyond a cache line). We can get some performance numbers for the standard benchmarks. I will look at a few ideas to free up some space in the mbuf to keep the packet related stuff within the first cache line while keeping performance close to where it is today. Regards, -Venky -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Friday, October 04, 2013 9:41 AM To: jigsaw Cc: dev at dpdk.org Subject: Re: [dpdk-dev] Need comment on 82599 TSO On Fri, 4 Oct 2013 15:44:19 +0300 jigsaw wrote: > Hi, > > I'm working on TSO for 82599, and encounter a problem: nowhere to store MSS. > > TSO must be aware of MSS, or gso in skb of kernel. > But MSS nees 16 bits per mbuf. And we have no spare 16 bits in > rte_mbuf or rte_pktmbuf. > If we add 16 bit field in rte_pktmbuf, the size of rte_mbuf will be > doubled, coz currently the size is at the edge of cacheline(32 byte). > > I have two solutions here: > > 1. Store MSS in struct rte_eth_conf. > This is actually a very bad idea, coz MSS is not bound to device. > > 2. Turn on and off TSO with rte_ctrlmbuf. > I found that rte_ctrlmbuf is not used at all. So it could be the first > use case of it. > With rte_ctrlmbuf we have enough space to store MSS. > > Looking forward to your comments. > > thx & > rgds, > -Qinglai The mbuf needs to grow to 2 cache lines. There are other things that need to be added to mbuf eventually as well. For example the QoS bitfield is too small when crammed into 32 bits. Ideally the normal small packet stuff would be in the first cacheline; and the other part of the struct would have things less likely to be used.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
On Fri, 4 Oct 2013 20:54:31 +0300 jigsaw wrote: > Hi Stephen, > > > >>This will work for local generated packets but overlapping existing field > >>won't work well for forwarding. > So adding a new mss field in mbuf could be the way out? or I > misunderstand something. > > >> What we want to be able to do is to take offload (jumbo) packets in with > >> from virtio > Sorry I don't understand why TSO is connected to virtio. Could you > give more details here? > Are you suggesting this TSO patch overlaps your work, or it should be > based on your work? I am working on a better virtio driver. Already have lots more features working, and doing better offload support is planned. TSO is a subset of the more generic segment offload (GSO) on Linux. With virtio is possible to receive GSO packets as well as send them. This feature is negotiated between guest and host. The idea is that between guests they can exchange jumbo (64K) packets even with a smaller MTU. This helps in many ways. One example is only a single route lookup is needed. Another issue is that the current DPDK model of offload flags for checksum is problematic. It matches what is available in Intel hardware and is not easily generalizable to other devices. Current DPDK flag is checksum bad. I would like to change it to checksum known good. Then drivers which dont' do checksum would leave it 0, but if receive checksum is known good set it to 1. Basically 1 means known good, and 0 means unknown (or bad). Higher level software can then do sw checksum if necessary.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
Stephen, Agree on the checksum flag definition. I'm presuming that we should do this on the L3 and L4 checksums separately (that ol_flags field is another one that needs extension in the mbuf). Regards, -Venky -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Stephen Hemminger Sent: Friday, October 04, 2013 11:23 AM To: jigsaw Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] Request for comments on ixgbe TSO support On Fri, 4 Oct 2013 20:54:31 +0300 jigsaw wrote: > Hi Stephen, > > > >>This will work for local generated packets but overlapping existing field > >>won't work well for forwarding. > So adding a new mss field in mbuf could be the way out? or I > misunderstand something. > > >> What we want to be able to do is to take offload (jumbo) packets in > >> with from virtio > Sorry I don't understand why TSO is connected to virtio. Could you > give more details here? > Are you suggesting this TSO patch overlaps your work, or it should be > based on your work? I am working on a better virtio driver. Already have lots more features working, and doing better offload support is planned. TSO is a subset of the more generic segment offload (GSO) on Linux. With virtio is possible to receive GSO packets as well as send them. This feature is negotiated between guest and host. The idea is that between guests they can exchange jumbo (64K) packets even with a smaller MTU. This helps in many ways. One example is only a single route lookup is needed. Another issue is that the current DPDK model of offload flags for checksum is problematic. It matches what is available in Intel hardware and is not easily generalizable to other devices. Current DPDK flag is checksum bad. I would like to change it to checksum known good. Then drivers which dont' do checksum would leave it 0, but if receive checksum is known good set it to 1. Basically 1 means known good, and 0 means unknown (or bad). Higher level software can then do sw checksum if necessary.
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
Hi Stephen, Thanks for showing a bigger picture. GSO is quite big implementation, that I think it won't be easily ported to DPDK. The mbuf needs to be equipped with many fields from skb to be able to deal with GSO. Do you have the plan to port GSO to DPDK, or you would like to keep GSO in scope of virtio? Regarding checksum flags, actually I was also thinking of extending ol_flags but then I gave it up coz I was worried about the size of mbuf. My current patch has to push some work to user, due to the fact that mbuf delivers too few info (such as L2 and L3 protocol details). Besides, as you mentioned, the ixgbe driver doesn't leverage the hardware receive checksum offloading at all. And if this is to be supported, the checksum flag need further extension. (On the other hand, TSO doesn't care about receive checksum offloading). Again, do you have plans to extend cksum flags so that virio feels more comfortable with DPDK? Hi Venky, I can either make the commit now as is, or wait till the cksum flags extension is in place. If Stephen (or somebody else) has the plan for better support for cksum offloading or GSO, it is perhaps better to implement TSO on top of that. BTW, I have another small question. Current TSO patch offloads the TCP/IP pseudo cksum work to user. Do you think DPDK could provide some utility functions for TCP/IPv4/IPv6 pseudo cksum calculation and updating? thx & rgds, -Qinglai On Fri, Oct 4, 2013 at 9:38 PM, Venkatesan, Venky wrote: > Stephen, > > Agree on the checksum flag definition. I'm presuming that we should do this > on the L3 and L4 checksums separately (that ol_flags field is another one > that needs extension in the mbuf). > > Regards, > -Venky > > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger > Sent: Friday, October 04, 2013 11:23 AM > To: jigsaw > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] Request for comments on ixgbe TSO support > > On Fri, 4 Oct 2013 20:54:31 +0300 > jigsaw wrote: > >> Hi Stephen, >> >> >> >>This will work for local generated packets but overlapping existing field >> >>won't work well for forwarding. >> So adding a new mss field in mbuf could be the way out? or I >> misunderstand something. >> >> >> What we want to be able to do is to take offload (jumbo) packets in >> >> with from virtio >> Sorry I don't understand why TSO is connected to virtio. Could you >> give more details here? >> Are you suggesting this TSO patch overlaps your work, or it should be >> based on your work? > > I am working on a better virtio driver. Already have lots more features > working, and doing better offload support is planned. > > TSO is a subset of the more generic segment offload (GSO) on Linux. > With virtio is possible to receive GSO packets as well as send them. > This feature is negotiated between guest and host. > > The idea is that between guests they can exchange jumbo (64K) packets even > with a smaller MTU. This helps in many ways. One example is only a single > route lookup is needed. > > Another issue is that the current DPDK model of offload flags for checksum is > problematic. > It matches what is available in Intel hardware and is not easily > generalizable to other devices. > > Current DPDK flag is checksum bad. I would like to change it to checksum > known good. Then drivers which dont' do checksum would leave it 0, but if > receive checksum is known good set it to 1. Basically 1 means known good, and > 0 means unknown (or bad). Higher level software can then do sw checksum if > necessary.
[dpdk-dev] Multi-process on the same host
Thanks a lot Bruce, I started looking at the multi-process examples - where this case is not considered - and I missed that section in the programmer's guide. Regards, -Walter Walter de Donato, Ph.D. PostDoc @ Department of Electrical Engineering and Information Technologies University of Napoli Federico II Via Claudio 21 -- 80125 Napoli (Italy) Phone: +39 081 76 83821 - Fax: +39 081 76 83816 Email: walter.dedonato at unina.it WWW: http://wpage.unina.it/walter.dedonato 2013/10/4 Richardson, Bruce > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen > > Hemminger > > Sent: Friday, October 04, 2013 5:39 PM > > To: Walter de Donato > > Cc: dev at dpdk.org > > Subject: Re: [dpdk-dev] Multi-process on the same host > > > > On Fri, 4 Oct 2013 13:47:02 +0200 > > Walter de Donato wrote: > > > > > Hello, > > > > > > I've been using DPDK for a while and now I encountered the following > > issue: > > > when I try to run two primary processes on the same host (with > > > --no-shconf option enabled) respectively sending packets on one port > > > and receiving them on a different port (the two ports are directly > > > connected with a CAT-6 cable), I get this error on the receiving > process: > > > > > > Program received signal SIGSEGV, Segmentation fault. > > > 0x004158a0 in rte_eth_rx_burst (port_id=0 '\000', > > queue_id=0, > > > rx_pkts=0x75baa8f0, nb_pkts=128) at > > > /home/devel/dpdk/build/include/rte_ethdev.h:1658 > > > 1658return (*dev->rx_pkt_burst)(dev->data- > > >rx_queues[queue_id], > > > rx_pkts, nb_pkts); > > > > > > To give some more details: > > > - the options given to the two processes: > > > ./receiver -c 0x3 -n 2 -m 200 --no-shconf -- -p 0x1 > > > ./sender -c 0xc -n 2 -m 200 --no-shconf -- -p 0x2 > > > where the -p option is the binary mask to select the ports to enable. > > > - the network card is a dualport Intel X540: > > > port 0: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 > (rev > > 01) > > > port 1: Intel Corporation Ethernet Controller 10 Gigabit X540-AT2 > > > (rev 01) > > > - this is the hugeadm --pool-list output: > > > Size Minimum Current Maximum Default > > > 1073741824222* > > > > > > My first question is: should it be possible to let separate primary > > > processes coexist if they use different resources (cores, ports, > > > memory pools)? > > > > > > A second question is: there is any other workaround to let this > > > scenario work without merging the two processes into two lcores of the > > same process? > > > > > > Thanks in advance, > > > -Walter > > > > The problem is that huge TLB filesystem is a shared resource. > > Because of that the memory pools of the two processes overlap, and > > memory pools are used for packet buffers, malloc, etc. > > > > You might be able to use no-huge, but then other things would probably > > break. > > The way to run two primary processes side by side is documented in the > document "Intel(r) Data Plane Development Kit (Intel(r) DPDK): Programmer's > Guide" available at: > http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/intel-dpdk-programmers-guide.htmland > is covered in section 17.2.3. You need to pass the "--file-prefix" flag > when running your application to force the processes to use different > hugepage files so they are not shared among the two processes. > > Regards, > /Bruce > -- > Intel Shannon Limited > Registered in Ireland > Registered Office: Collinstown Industrial Park, Leixlip, County Kildare > Registered Number: 308263 > Business address: Dromore House, East Park, Shannon, Co. Clare > > This e-mail and any attachments may contain confidential material for the > sole use of the intended recipient(s). Any review or distribution by others > is strictly prohibited. If you are not the intended recipient, please > contact the sender and delete all copies. > > > >
[dpdk-dev] [PATCH] Request for comments on ixgbe TSO support
On Fri, 4 Oct 2013 22:10:33 +0300 jigsaw wrote: > Hi Stephen, > > Thanks for showing a bigger picture. > > GSO is quite big implementation, that I think it won't be easily > ported to DPDK. The mbuf needs to be equipped with many fields from > skb to be able to deal with GSO. > Do you have the plan to port GSO to DPDK, or you would like to keep > GSO in scope of virtio? > > Regarding checksum flags, actually I was also thinking of extending > ol_flags but then I gave it up coz I was worried about the size of > mbuf. > My current patch has to push some work to user, due to the fact that > mbuf delivers too few info (such as L2 and L3 protocol details). > > Besides, as you mentioned, the ixgbe driver doesn't leverage the > hardware receive checksum offloading at all. And if this is to be > supported, the checksum flag need further extension. > (On the other hand, TSO doesn't care about receive checksum offloading). > Again, do you have plans to extend cksum flags so that virio feels > more comfortable with DPDK? > > Hi Venky, > > I can either make the commit now as is, or wait till the cksum flags > extension is in place. If Stephen (or somebody else) has the plan for > better support for cksum offloading or GSO, it is perhaps better to > implement TSO on top of that. > > BTW, I have another small question. Current TSO patch offloads the > TCP/IP pseudo cksum work to user. Do you think DPDK could provide some > utility functions for TCP/IPv4/IPv6 pseudo cksum calculation and > updating? > > thx & > rgds, > -Qinglai I want to get Tx checksum offload in virtio working first. Just looking ahead to Rx.
[dpdk-dev] L2fwd Performance issue with Virtual Machine
Thanks Rashmin for your time and help! So it looks like with the given hardware config, we could probably only achieve around 8 Gbps in VM without using SRIOV. Once DPDK is used in vSwitch design, we could gain more performance. Thanks, Selvaganapathy.C. On Fri, Oct 4, 2013 at 11:02 AM, Patel, Rashmin N wrote: > Correction: "you would NOT get optimal performance benefit having PMD" > > Thanks, > Rashmin > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Patel, Rashmin N > Sent: Friday, October 04, 2013 10:47 AM > To: Selvaganapathy Chidambaram > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] L2fwd Performance issue with Virtual Machine > > Hi, > > If you are not using SRIOV or direct device assignment to VM, your traffic > hits vSwitch(via vmware native ixgbe driver and network stack) in the ESX > and switched to your E1000/VMXNET3 interface connected to a VM. The vSwitch > is not optimized for PMD at present so you would get optimal performance > benefit having PMD, I believe. > > For the RSS front, I would say you won't see much difference with RSS > enabled for 1500 bytes frames. In fact, core is capable of handling such > traffic in VM, but the bottleneck is in ESXi software switching layer, > that's what my initial research shows across multiple hypervisors. > > Thanks, > Rashmin > > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Selvaganapathy > Chidambaram > Sent: Thursday, October 03, 2013 2:39 PM > To: dev at dpdk.org > Subject: [dpdk-dev] L2fwd Performance issue with Virtual Machine > > Hello Everyone, > > I have tried to run DPDK sample application l2fwd(modified to support > multiple queues) in my ESX Virtual Machine. I see that performance is not > scaling with cores. [My apologies for the long email] > > *Setup:* > > Connected VM to two ports of Spirent with 10Gig link. Sent 10 Gig traffic > of L3 packet of length 1500 bytes (with four different flows) from Spirent > through one port and received at the second port. Also sent traffic from > reverse direction so that net traffic is 20 Gbps. Haven't enabled SR-IOV or > Direct path I/O. > > *Emulated Driver:* > > With default emulated driver, I got 7.3 Gbps for 1 core. Adding multiple > cores did not improve the performance. On debugging I noticed that function > eth_em_infos_get() says RSS is not supported. > > *vmxnet3_usermap:* > > Then I tried extension vmxnet3_usermap and got 8.7 Gbps for 1 core. Again > adding another core did not help. On debugging, I noticed that in vmxnet3 > kernel driver (in function vmxnet3_probe_device) , RSS is disabled if * > adapter->is_shm* is non zero. In our case, its > adapter->VMXNET3_SHM_USERMAP_DRIVER > which is non zero. > > Before trying to enable it, I would like to know if there is any known > limitation why RSS is not enabled in both the drivers. Please help me > understand. > > *Hardware Configuration:* > Hardware : Intel Xeon 2.4 Ghz 4 CPUs > Hyperthreading : No > RAM : 16 GB > Hypervisor : ESXi 5.1 > Ethernet: Intel 82599EB 10 Gig SFP > > > Guest VM : 2 vCPU, 2 GB RAM > GuestOS : Centos 6.2 32 bit > > Thanks in advance for your time and help!!! > > Thanks, > Selva. >