Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis
| Remember that the point of IB and other operating-system bypass devices is that the driver is not involved in the fast path of sending / | receiving. One of the side-effects of that design point is that userspace does all the allocation of send / receive buffers. That's a good point. It was not

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis
| The driver doesn't allocate much memory here. Maybe some small control buffers, but nothing significantly involved in large message transfer | performance. Everything critical here is allocated by user-space (either MPI lib or application), so we just have to make sure we bind the | process memor

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Jeff Squyres (jsquyres)
On Jul 8, 2013, at 2:01 PM, Brice Goglin wrote: > The driver doesn't allocate much memory here. Maybe some small control > buffers, but nothing significantly involved in large message transfer > performance. Everything critical here is allocated by user-space (either MPI > lib or application),

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Jeff Squyres (jsquyres)
Cisco hasn't been involved in IB for several years, so I can't comment on that directly. That being said, our Cisco VIC devices are PCI gen *2*, but they are x16 (not x8). We can get full bandwidth out of out 2*10Gb device from remote NUMA nodes on E5-2690-based machines (Sandy Bridge) for lar

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Brice Goglin
The driver doesn't allocate much memory here. Maybe some small control buffers, but nothing significantly involved in large message transfer performance. Everything critical here is allocated by user-space (either MPI lib or application), so we just have to make sure we bind the process memory prop

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis
Hi Brice, thanks for testing this out. How did you make sure that the pinned pages used by the I/O adapter mapped to the "other" socket's memory controller ? Is pining the MPI binary to a socket sufficient to pin the space used for MPI I/O as well to that socket? I think this is something done by

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Brice Goglin
On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong throughput drop from 6000 to 5700MB/s when the memory isn't allocated on the right socket (and latency increases from 0.8 to 1.4us). Of course that's pingpong only, things will be worse on a memory-overloaded machine. But I don't expe

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis
People have mentioned that they experience unexpected slow downs in PCIe_gen3 I/O when the pages map to a socket different from the one the HCA connects to. It is speculated that the inter-socket QPI is not provisioned to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may not be

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Jeff Squyres (jsquyres)
On Jul 8, 2013, at 11:35 AM, Michael Thomadakis wrote: > The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA > memory, SandyBridge will use the inter-socket QPIs to get this data across to > the other socket. I think there is considerable limitation in PCIe I/O > traff

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Michael Thomadakis
Hi Jeff, thanks for the reply. The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA memory, SandyBridge will use the inter-socket QPIs to get this data across to the other socket. I think there is considerable limitation in PCIe I/O traffic data going over the inter-socket

Re: [OMPI users] Question on handling of memory for communications

2013-07-08 Thread Jeff Squyres (jsquyres)
On Jul 6, 2013, at 4:59 PM, Michael Thomadakis wrote: > When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 gen 3 do > you pay any special attention to the memory buffers according to which > socket/memory controller their physical memory belongs to? > > For instance, if the HCA

[OMPI users] Question on handling of memory for communications

2013-07-06 Thread Michael Thomadakis
Hello OpenMPI, When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 *gen 3*do you pay any special attention to the memory buffers according to which socket/memory controller their physical memory belongs to? For instance, if the HCA is attached to the PCIgen3 lanes of Socket 1 do yo