On 6/16/16, 9:36 AM, "Take Ceara" <dumitru.ceara at gmail.com> wrote:
>Hi Keith, > >On Tue, Jun 14, 2016 at 3:47 PM, Wiles, Keith <keith.wiles at intel.com> wrote: >>>> Normally the limitation is in the hardware, basically how the PCI bus is >>>> connected to the CPUs (or sockets). How the PCI buses are connected to the >>>> system depends on the Mother board design. I normally see the buses >>>> attached to socket 0, but you could have some of the buses attached to the >>>> other sockets or all on one socket via a PCI bridge device. >>>> >>>> No easy way around the problem if some of your PCI buses are split or all >>>> on a single socket. Need to look at your system docs or look at lspci it >>>> has an option to dump the PCI bus as an ASCII tree, at least on Ubuntu. >>> >>>This is the motherboard we use on our system: >>> >>>http://www.supermicro.com/products/motherboard/Xeon/C600/X10DRX.cfm >>> >>>I need to swap some NICs around (as now we moved everything on socket >>>1) before I can share the lspci output. >> >> FYI: the option for lspci is ?lspci ?tv?, but maybe more options too. >> > >I retested with two 10G X710 ports connected back to back: >port 0: 0000:01:00.3 - socket 0 >port 1: 0000:81:00.3 - socket 1 Please provide the output from tools/cpu_layout.py. > >I ran the following scenarios: >- assign 16 threads from CPU 0 on socket 0 to port 0 and 16 threads >from CPU 1 to port 1 => setup rate of 1.6M sess/s >- assign only the 16 threads from CPU0 for both ports (so 8 threads on >socket 0 for port 0 and 8 threads on socket 0 for port 1) => setup >rate of 3M sess/s >- assign only the 16 threads from CPU1 for both ports (so 8 threads on >socket 1 for port 0 and 8 threads on socket 1 for port 1) => setup >rate of 3M sess/s > >I also tried a scenario with two machines connected back to back each >of which had a NIC on socket 1. I assigned 16 threads from socket 1 on >each machine to the port and performance scaled to 6M sess/s as >expected. > >I double checked all our memory allocations and, at least in the >tested scenario, we never use memory that's not on the same socket as >the core. > >I pasted below the output of lspci -tv. I see that 0000:01:00.3 and >0000:81:00.3 are connected to different PCI bridges but on each of >those bridges there are also "Intel Corporation Xeon E7 v3/Xeon E5 >v3/Core i7 DMA Channel <X>" devices. > >It would be great if you could also take a look in case I >missed/misunderstood something. > >Thanks, >Dumitru >