BTW, the configuration looks fine, but you need to make sure the lcores are not split between two different CPU sockets. You can use the dpdk/tools/cpu_layout.py to do dump out the system configuration.
Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River mobile 940.213.5533 [Powering 30 Years of Innovation]<http://www.windriver.com/announces/wr30/> On Nov 19, 2013, at 10:42 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote: On Tue, Nov 19, 2013 at 11:31 AM, Wiles, Roger Keith <keith.wiles at windriver.com<mailto:keith.wiles at windriver.com>> wrote: How do you have Pktgen configured in this case? On my westmere dual socket 3.4Ghz machine I can send 20G on a single NIC 82599x two ports. My machine has a PCIe bug that does not allow me to send on more then 3 ports at wire rate. I get close to 40G 64 byte packets, but the forth port does is about 70% of wire rate because of the PCIe hardware bottle neck problem. Keith Wiles, Principal Technologist for Networking member of the CTO office, Wind River direct 972.434.4136 mobile 940.213.5533 fax 000.000.0000 On Nov 19, 2013, at 10:09 AM, jinho hwang <hwang.jinho at gmail.com<mailto:hwang.jinho at gmail.com>> wrote: Hi All, I have two NICs (82599) x two ports that are used as packet generators. I want to generate full line-rate packets (40Gbps), but Pktgen-DPDK does not seem to be able to do it when two port in a NIC are used simultaneously. Does anyone know how to generate 40Gbps without replicating packets in the switch? Thank you, Jinho Hi Keith, Thank you for the e-mail. I am not sure how I figure out whether my PCIe also has any problems to prevent me from sending full line-rates. I use Intel(R) Xeon(R) CPU E5649 @ 2.53GHz. It is hard for me to figure out where is the bottleneck. My configuration is: sudo ./app/build/pktgen -c 1ff -n 3 $BLACK-LIST -- -p 0xf0 -P -m "[1:2].0, [3:4].1, [5:6].2, [7:8].3" -f test/forward.lua === port to lcore mapping table (# lcores 9) === lcore: 0 1 2 3 4 5 6 7 8 port 0: D: T 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 = 1: 1 port 1: D: T 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 0: 0 0: 0 = 1: 1 port 2: D: T 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 0: 0 0: 0 = 1: 1 port 3: D: T 0: 0 0: 0 0: 0 0: 0 0: 0 0: 0 1: 0 0: 1 = 1: 1 Total : 0: 0 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 1: 0 0: 1 Display and Timer on lcore 0, rx:tx counts per port/lcore Configuring 4 ports, MBUF Size 1984, MBUF Cache Size 128 Lcore: 1, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 0: 0) , TX (pid:qid): 2, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 0: 0) 3, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 1: 0) , TX (pid:qid): 4, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 1: 0) 5, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 2: 0) , TX (pid:qid): 6, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 2: 0) 7, type RX , rx_cnt 1, tx_cnt 0 private (nil), RX (pid:qid): ( 3: 0) , TX (pid:qid): 8, type TX , rx_cnt 0, tx_cnt 1 private (nil), RX (pid:qid): , TX (pid:qid): ( 3: 0) Port : 0, nb_lcores 2, private 0x6fd5a0, lcores: 1 2 1, nb_lcores 2, private 0x700208, lcores: 3 4 2, nb_lcores 2, private 0x702e70, lcores: 5 6 3, nb_lcores 2, private 0x705ad8, lcores: 7 8 Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a4 Create: Default RX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Default TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Range TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Sequence TX 0:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Special TX 0:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 = 515 KB Port memory used = 10251 KB Initialize Port 1 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:2f:f2:a5 Create: Default RX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Default TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Range TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Sequence TX 1:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Special TX 1:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 = 515 KB Port memory used = 10251 KB Initialize Port 2 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1c Create: Default RX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Default TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Range TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Sequence TX 2:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Special TX 2:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 = 515 KB Port memory used = 10251 KB Initialize Port 3 -- TxQ 1, RxQ 1, Src MAC 90:e2:ba:4a:e6:1d Create: Default RX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Default TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Range TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Sequence TX 3:0 - Memory used (MBUFs 1024 x (size 1984 + Hdr 64)) + 395392 = 2435 KB Create: Special TX 3:0 - Memory used (MBUFs 64 x (size 1984 + Hdr 64)) + 395392 = 515 KB Port memory used = 10251 KB Total memory used = 41003 KB Port 0: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode> Port 1: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode> Port 2: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode> Port 3: Link Up - speed 10000 Mbps - full-duplex <Enable promiscuous mode> === Display processing on lcore 0 === RX processing on lcore 1, rxcnt 1, port/qid, 0/0 === TX processing on lcore 2, txcnt 1, port/qid, 0/0 === RX processing on lcore 3, rxcnt 1, port/qid, 1/0 === TX processing on lcore 4, txcnt 1, port/qid, 1/0 === RX processing on lcore 5, rxcnt 1, port/qid, 2/0 === TX processing on lcore 6, txcnt 1, port/qid, 2/0 === RX processing on lcore 7, rxcnt 1, port/qid, 3/0 === TX processing on lcore 8, txcnt 1, port/qid, 3/0 Please, advise me if you have time. Thank you always for your help! Jinho