Let me try this out and see what happens for me.  But yes, please go ahead and 
send me the complete backtrace.
Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of KESTENER Pierre
Sent: Wednesday, October 30, 2013 11:34 AM
To: us...@open-mpi.org
Cc: KESTENER Pierre
Subject: [OMPI users] OpenMPI-1.7.3 - cuda support

Hello,

I'm having problems running a simple cuda-aware mpi application; the one found 
at
https://github.com/parallel-forall/code-samples/tree/master/posts/cuda-aware-mpi-example

I have modified symbol ENV_LOCAL_RANK into OMPI_COMM_WORLD_LOCAL_RANK
My cluster has 2 K20m GPUs per node, with QLogic IB stack.

The normal CUDA/MPI application works fine;
 but the cuda-ware mpi app is crashing when using 2 MPI proc over the 2 GPUs of 
the same node:
the error message is:
    Assertion failure at ptl.c:200: nbytes == msglen
I can send the complete backtrace from cuda-gdb if needed.

The same app when running on 2 GPUs on 2 different nodes give another error:
    jacobi_cuda_aware_mpi:28280 terminated with signal 11 at PC=2aae9d7c9f78 
SP=7fffc06c21f8.      Backtrace:
    /gpfslocal/pub/local/lib64/libinfinipath.so.4(+0x8f78)[0x2aae9d7c9f78]


Can someone give me hints where to look to track this problem ?
Thank you.

Pierre Kestener.



-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Reply via email to