With respect to the CUDA-aware support, Ralph is correct. The ability to send
and receive GPU buffers is in the Open MPI 1.7 series. And incremental
improvements will be added to the Open MPI 1.7 series. CUDA 5.0 is supported.
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.
Open MPI may have get confused if you end up having different receive queue
specifications in your IB setup (in the "openib" Byte Transfer Layer (BTL)
plugin that is used for point-to-point MPI communication transport in OMPI).
If Open MPI doesn't work out of the box for you in a job that util
+1.
This one seems like it could be as simple as a missing header file. Try adding
#include "opal/constants.h"
in the timer_aix_component.c file.
On Jul 6, 2013, at 1:08 PM, Ralph Castain wrote:
> We haven't had access to an AIX machine in quite some time, so it isn't a big
> surprise th
On Jul 6, 2013, at 2:33 PM, Patrick Brückner
wrote:
> data p;
> p.collection = malloc(sizeof(int)*N);
>
> printf("[%d] before receiving, data id %d at %d with direction
> %d\n",me,p.id,p.position,p.direction);
>
> MPI_Status data_status;
> MPI_Recv(&p,1,MPI_data,MPI_ANY_SOURCE,99,MPI_COMM_WOR
On Jul 6, 2013, at 4:59 PM, Michael Thomadakis wrote:
> When you stack runs on SandyBridge nodes atached to HCAs ove PCI3 gen 3 do
> you pay any special attention to the memory buffers according to which
> socket/memory controller their physical memory belongs to?
>
> For instance, if the HCA
Hi Jeff,
thanks for the reply.
The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA
memory, SandyBridge will use the inter-socket QPIs to get this data across
to the other socket. I think there is considerable limitation in PCIe I/O
traffic data going over the inter-socket
Thanks ...
Michael
On Mon, Jul 8, 2013 at 8:50 AM, Rolf vandeVaart wrote:
> With respect to the CUDA-aware support, Ralph is correct. The ability to
> send and receive GPU buffers is in the Open MPI 1.7 series. And
> incremental improvements will be added to the Open MPI 1.7 series. CUDA
> 5.
On Jul 8, 2013, at 11:35 AM, Michael Thomadakis
wrote:
> The issue is that when you read or write PCIe_gen 3 dat to a non-local NUMA
> memory, SandyBridge will use the inter-socket QPIs to get this data across to
> the other socket. I think there is considerable limitation in PCIe I/O
> traff
People have mentioned that they experience unexpected slow downs in
PCIe_gen3 I/O when the pages map to a socket different from the one the HCA
connects to. It is speculated that the inter-socket QPI is not provisioned
to transfer more than 1GiB/sec for PCIe_gen 3 traffic. This situation may
not be
On a dual E5 2650 machine with FDR cards, I see the IMB Pingpong
throughput drop from 6000 to 5700MB/s when the memory isn't allocated on
the right socket (and latency increases from 0.8 to 1.4us). Of course
that's pingpong only, things will be worse on a memory-overloaded
machine. But I don't expe
Do you guys have any plan to support Intel Phi in the future? That is, running
MPI code on the Phi cards or across the multicore and Phi, as Intel MPI does?
[Tom]
Hi Michael,
Because a Xeon Phi card acts a lot like a Linux host with an x86 architecture,
you can build your own Open MPI libraries t
Thanks Tom, that sounds good. I will give it a try as soon as our Phi host
here host gets installed.
I assume that all the prerequisite libs and bins on the Phi side are
available when we download the Phi s/w stack from Intel's site, right ?
Cheers
Michael
On Mon, Jul 8, 2013 at 12:10 PM, Elk
Hi,
today I installed openmpi-1.9a1r28730 and tried to test MPI_Iexscan()
on my machine (Solaris 10 sparc, Sun C 5.12). Unfortunately my program
breaks.
tyr xxx 105 mpicc iexscan.c
tyr xxx 106 mpiexec -np 2 iexscan
[tyr:21094] *** An error occurred in MPI_Iexscan
[tyr:21094] *** reported by proc
Hi Brice,
thanks for testing this out.
How did you make sure that the pinned pages used by the I/O adapter mapped
to the "other" socket's memory controller ? Is pining the MPI binary to a
socket sufficient to pin the space used for MPI I/O as well to that socket?
I think this is something done by
The driver doesn't allocate much memory here. Maybe some small control
buffers, but nothing significantly involved in large message transfer
performance. Everything critical here is allocated by user-space (either
MPI lib or application), so we just have to make sure we bind the
process memory prop
Cisco hasn't been involved in IB for several years, so I can't comment on that
directly.
That being said, our Cisco VIC devices are PCI gen *2*, but they are x16 (not
x8). We can get full bandwidth out of out 2*10Gb device from remote NUMA nodes
on E5-2690-based machines (Sandy Bridge) for lar
On Jul 8, 2013, at 2:01 PM, Brice Goglin wrote:
> The driver doesn't allocate much memory here. Maybe some small control
> buffers, but nothing significantly involved in large message transfer
> performance. Everything critical here is allocated by user-space (either MPI
> lib or application),
Thanks Tom, that sounds good. I will give it a try as soon as our Phi host here
host gets installed.
I assume that all the prerequisite libs and bins on the Phi side are available
when we download the Phi s/w stack from Intel's site, right ?
[Tom]
Right. When you install Intel's MPSS (Manycore
Thanks Tom, I will test it out...
regards
Michael
On Mon, Jul 8, 2013 at 1:16 PM, Elken, Tom wrote:
> ** **
>
> Thanks Tom, that sounds good. I will give it a try as soon as our Phi host
> here host gets installed.
>
> ** **
>
> I assume that all the prerequisite libs and bins on the Phi
| The driver doesn't allocate much memory here. Maybe some small control
buffers, but nothing significantly involved in large message transfer |
performance. Everything critical here is allocated by user-space (either
MPI lib or application), so we just have to make sure we bind the
| process memor
| Remember that the point of IB and other operating-system bypass devices
is that the driver is not involved in the fast path of sending /
| receiving. One of the side-effects of that design point is that
userspace does all the allocation of send / receive buffers.
That's a good point. It was not
On Mon, 8 Jul 2013, Elken, Tom wrote:
It isn't quite so easy.
Out of the box, there is no gcc on the Phi card. You can use the cross
compiler on the host, but you don't get gcc on the Phi by default.
See this post http://software.intel.com/en-us/forums/topic/382057
I really think you would n
Hi Tim,
Well, in general and not on MIC I usually build the MPI stacks using the
Intel compiler set. Have you ran into s/w that requires GCC instead of
Intel compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to
produce MIC native code (the OpenMPI stack for that matter)?
regards
Hi Tim,
Well, in general and not on MIC I usually build the MPI stacks using the Intel
compiler set. Have you ran into s/w that requires GCC instead of Intel
compilers (beside Nvidia Cuda)? Did you try to use Intel compiler to produce
MIC native code (the OpenMPI stack for that matter)?
[Tom]
On Mon, 8 Jul 2013, Elken, Tom wrote:
My mistake on the OFED bits. The host I was installing on did not have all
of the MPSS software installed (my cluster admin node and not one of the
compute nodes). Adding the intel-mic-ofed-card RPM fixed the problem with
compiling the btl:openib bits with
25 matches
Mail list logo