The simple answer is that you are running a stress test and there is some 
finite processing being done separating traffic into queues which is showing 
some performance impact on your setup. At that rate you could also be running 
into limitations of the PCIe bus and the added latency of going across the QPI 
bus if you're using cores on remote CPUs.

Depending on the driver and kernel version, we've seen best small packet 
performance at around 3 queues. Adding queues increases the processing required 
to separate the traffic into queues which, at some point, will not gain you 
anything. Since you're able to do 10G, it sounds like most of the system is 
properly working and the trick is to tune the system for whatever it is you're 
planning to use it for.

Thanks.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
[email protected]
(503) 712-4565


-----Original Message-----
From: Shinae Woo [mailto:[email protected]] 
Sent: Wednesday, March 20, 2013 8:59 PM
To: [email protected]
Subject: [E1000-devel] Low receive performance with multiple RSS queue

Hello, all,

 We're observing a weird problem with receive-side scaling (RSS) on a 
82599-based Intel NIC. In a nutshell, we do not see 10Gbps for 64B packet RX if 
we configure the NIC to use multiple (>1) RSS hardware queues while we _do_ see 
10Gbps with a single RSS queue (with one CPU core). For a packet size larger 
than 80 bytes, we achieve a line rate for packet RX (e.g., 10Gbps) regardless 
of the number of RSS queues. I'm wondering if this is a hardware problem or if 
we missed anything in the driver.. We use a modified ixgbe driver (called 
packetshader IO engine) to bypass a severe kernel-level memory management 
overhead with a small packet size, and use batch processing of received packets 
(like NAPI). What we observe is summarized as follows,

 * WIth 1 RSS queue with 1 CPU core, we do not see any single packet loss with 
input rate of 10Gbps with all 64 bytes
 * With 6 RSS queues with 6 CPU cores, we see up to 10% packet drops at the NIC 
(64B packets)
     - The loss rate increases as we increase # of RSS queues from 2 to 6
     - However, even when we see packet drops, rx_descriptor is almost always 
empty (not full).


We use a machine with two Intel Xeon X5690 CPUs and an Intel NIC with 82599 
chipsets (Linux 2.6.32-42-server Ubuntu 12.04). PaketShader IO engine is based 
on ixgbe 2.0.38.2(PSIO :
http://shader.kaist.edu/packetshader/io_engine/index.html), but even if we 
ported PSIO to ixgbe 3.12, we see the similar problem. Also, similar IO 
libraries like netmap (http://info.iet.unipi.it/~luigi/netmap/), and PF_RING 
(http://www.ntop.org/products/pf_ring/) show the same trend: the packet drop 
rate increases as we increase the number of RSS queues and use more CPU cores, 
which is counterintuitive since more CPU cores should improve the performance. 
This is why we suspect that the problem is somewhat related to the hardware 
(RSS) when the packet size is small (< 80 bytes).


 We have attached file that shows the packet RX performance from PSIO, NetMap, 
and PF_RING with various packet sizes and numbers of RSS queues.

    PSIO-0.2 (based on ixgbe 2.0.38.2)
    PF_RING-5.5.2 (based on ixgbe-3.11.33)
    NetMap- 20120813 (based on ixgbe in kernel 3.2.9-k2)

 Please let us know if you have experienced a similar problem or have any clue 
what's going on. We'll greatly appreciate your help.

 Regards,
 Shinae Woo

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to