Hi Anuj, Thanks for fixes! I have 2 comments - from i40e_ethdev.h : #define I40E_DEFAULT_RX_WTHRESH 0 - (26 + 32) / 4 (batched descriptor writeback) should be (26 + 4 * 32) / 4 (batched descriptor writeback) , thus we have 135 bytes/packet
This corresponds to 58.8 Mpps Regards, Vladimir 2015-07-01 17:22 GMT+03:00 Anuj Kalia <anujkaliaiitd at gmail.com>: > Vladimir, > > Few possible fixes to your PCIe analysis (let me know if I'm wrong): > - ECRC is probably disabled (check using sudo lspci -vvv | grep > CGenEn-), so TLP header is 26 bytes > - Descriptor writeback can be batched using high value of WTHRESH, > which is what DPDK uses by default > - Read request contains full TLP header (26 bytes) > > Assuming WTHRESH = 4, bytes transferred from NIC to host per packet = > 26 + 64 (packet itself) + > (26 + 32) / 4 (batched descriptor writeback) + > (26 / 4) (read request for new descriptors) = > 111 bytes / packet > > This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP > overhead, rate = 67.4 Mpps > > --Anuj > > > > On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin <medvedkinv at gmail.com> > wrote: > > In case with syn flood you should take into account return syn-ack > traffic, > > which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds > > faster. And don't forget about DLLP's generated by rx traffic, which > > saturates host-to-NIC bus. > > > > 2015-07-01 16:05 GMT+03:00 Pavel Odintsov <pavel.odintsov at gmail.com>: > > > >> Yes, Bruce, we understand this. But we are working with huge SYN > >> attacks processing and they are 64byte only :( > >> > >> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson > >> <bruce.richardson at intel.com> wrote: > >> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote: > >> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want > >> >> achieve 40GE line rate... > >> >> > >> > Note that this would only apply for your minimal i.e. 64-byte, packet > >> sizes. > >> > Once you go up to larger e.g. 128B packets, your PCI bandwidth > >> requirements > >> > are lower and you can easier achieve line rate. > >> > > >> > /Bruce > >> > > >> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin < > >> medvedkinv at gmail.com> wrote: > >> >> > Hi Pavel, > >> >> > > >> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710 > rx > >> only > >> >> > case. > >> >> > Assume we have 32byte descriptors (if we want more offload). > >> >> > DMA makes one pcie transaction with packet payload, one descriptor > >> writeback > >> >> > and one memory request for free descriptors for every 4 packets. > For > >> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + > 6 > >> DLL + > >> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet > >> itself) + > >> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new > >> >> > descriptors). Note that we do not take into account PCIe > ACK/NACK/FC > >> Update > >> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits > 1 > >> byte in > >> >> > 1 ns, so x8 transmits 8 bytes in 1 ns. 1 packet transmits in 20 > ns. > >> Thus > >> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps. > >> >> > Correct me if I'm wrong. > >> >> > > >> >> > Regards, > >> >> > Vladimir > >> >> > > >> >> > > >> > >> > >> > >> -- > >> Sincerely yours, Pavel Odintsov > >> >