manpage and on the OpenMPI website, see
http://www.open-mpi.org/faq/?category=tuning#paffinity-defs
I hope this answers your original question.
Jens
> Thank you
>
>
> 2013/1/29 Jens Glaser
> Hi Pradeep,
>
> On Jan 28, 2013, at 11:16 PM, Pradeep Jha wrote:
>>
Hi Pradeep,
On Jan 28, 2013, at 11:16 PM, Pradeep Jha wrote:
> I have a very basic question about MPI.
>
> I have a computer with 8 processors (each with 8 cores). What is the
> difference between if I run a program simply by "./program" and "mpirun -np 8
> /path/to/program" ? In the first c
Hi Justin
from looking at your code it seems you are receiving more bytes from the
processors then you send (I assume MAX_RECV_SIZE_PER_PE > send_sizes[p]).
I don't think this is valid. Your transfers should have matched sizes on the
sending and receiving side. To achieve this, either communicat
cudaHostAlloc/cudaFreeHost() (I assume OpenMPI 1.7 will have some level of cuda
support), because
otherwise applications using GPUDirect are not guaranteed to work correctly
with them, that is, they will exhibit undefined behavior.
Jens
On Nov 3, 2012, at 10:41 PM, Jens Glaser wrote:
>
Hi,
I am working on a CUDA/MPI application. It uses page-locked host buffers
allocated with cudaHostAlloc(...,cudaHostAllocDefault), to which data from the
device is copied before calling MPI.
The application, a particle simulation, reproducibly crashed or produced
undefined behavior at large p
Hi,
we've tried to use a multithreaded application with a more recent trunk version
(March 21) of OpenMPI. We need to use this version because of CUDA RDMA
support. OpenMPI was binding all the threads to a single core, which is
undesirable.
In OpenMPI 1.5. there was an option --cpus-per-rank, w
Hello,
I am using the latest trunk version of OMPI, in order to take advantage of the
new CUDA RDMA features (smcuda BTL). RDMA support is superb, however, I have to
give a manual parameter
mpirun --mca pml ob1 ...
to have the OB1 upper layer selected and, consequently, to get smcuda
activate