Correct.
Can it be because torque (pbs_mom) is not running on the head node and
mpiexec attempts to contact it ?
Maxime
Le 2014-08-15 17:31, Joshua Ladd a écrit :
But OMPI 1.8.x does run the ring_c program successfully on your
compute node, right? The error only happens on the front-end login
node if I understood you correctly.
Josh
On Fri, Aug 15, 2014 at 5:20 PM, Maxime Boissonneault
<maxime.boissonnea...@calculquebec.ca
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
Here are the requested files.
In the archive, you will find the output of configure, make, make
install as well as the config.log, the environment when running
ring_c and the ompi_info --all.
Just for a reminder, the ring_c example compiled and ran, but
produced no output when running and exited with code 65.
Thanks,
Maxime
Le 2014-08-14 15:26, Joshua Ladd a écrit :
One more, Maxime, can you please make sure you've covered
everything here:
http://www.open-mpi.org/community/help/
Josh
On Thu, Aug 14, 2014 at 3:18 PM, Joshua Ladd
<jladd.m...@gmail.com <mailto:jladd.m...@gmail.com>> wrote:
And maybe include your LD_LIBRARY_PATH
Josh
On Thu, Aug 14, 2014 at 3:16 PM, Joshua Ladd
<jladd.m...@gmail.com <mailto:jladd.m...@gmail.com>> wrote:
Can you try to run the example code "ring_c" across nodes?
Josh
On Thu, Aug 14, 2014 at 3:14 PM, Maxime Boissonneault
<maxime.boissonnea...@calculquebec.ca
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
Yes,
Everything has been built with GCC 4.8.x, although x
might have changed between the OpenMPI 1.8.1 build
and the gromacs build. For OpenMPI 1.8.2rc4 however,
it was the exact same compiler for everything.
Maxime
Le 2014-08-14 14:57, Joshua Ladd a écrit :
Hmmm...weird. Seems like maybe a mismatch between
libraries. Did you build OMPI with the same compiler
as you did GROMACS/Charm++?
I'm stealing this suggestion from an old Gromacs
forum with essentially the same symptom:
"Did you compile Open MPI and Gromacs with the same
compiler (i.e. both gcc and the same version)? You
write you tried different OpenMPI versions and
different GCC versions but it is unclear whether
those match. Can you provide more detail how you
compiled (including all options you specified)? Have
you tested any other MPI program linked against
those Open MPI versions? Please make sure (e.g. with
ldd) that the MPI and pthread library you compiled
against is also used for execution. If you compiled
and run on different hosts, check whether the error
still occurs when executing on the build host."
http://redmine.gromacs.org/issues/1025
Josh
On Thu, Aug 14, 2014 at 2:40 PM, Maxime
Boissonneault <maxime.boissonnea...@calculquebec.ca
<mailto:maxime.boissonnea...@calculquebec.ca>> wrote:
I just tried Gromacs with two nodes. It crashes,
but with a different error. I get
[gpu-k20-13:142156] *** Process received signal ***
[gpu-k20-13:142156] Signal: Segmentation fault (11)
[gpu-k20-13:142156] Signal code: Address not
mapped (1)
[gpu-k20-13:142156] Failing at address: 0x8
[gpu-k20-13:142156] [ 0]
/lib64/libpthread.so.0(+0xf710)[0x2ac5d070c710]
[gpu-k20-13:142156] [ 1]
/usr/lib64/nvidia/libcuda.so.1(+0x263acf)[0x2ac5ddfbcacf]
[gpu-k20-13:142156] [ 2]
/usr/lib64/nvidia/libcuda.so.1(+0x229a83)[0x2ac5ddf82a83]
[gpu-k20-13:142156] [ 3]
/usr/lib64/nvidia/libcuda.so.1(+0x15b2da)[0x2ac5ddeb42da]
[gpu-k20-13:142156] [ 4]
/usr/lib64/nvidia/libcuda.so.1(cuInit+0x43)[0x2ac5ddea0933]
[gpu-k20-13:142156] [ 5]
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15965)[0x2ac5d0930965]
[gpu-k20-13:142156] [ 6]
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a0a)[0x2ac5d0930a0a]
[gpu-k20-13:142156] [ 7]
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(+0x15a3b)[0x2ac5d0930a3b]
[gpu-k20-13:142156] [ 8]
/software-gpu/cuda/6.0.37/lib64/libcudart.so.6.0(cudaDriverGetVersion+0x4a)[0x2ac5d094602a]
[gpu-k20-13:142156] [ 9]
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_print_version_info_gpu+0x55)[0x2ac5cf9a90b5]
[gpu-k20-13:142156] [10]
/software-gpu/apps/gromacs/4.6.5_gcc/lib/libgmxmpi.so.8(gmx_log_open+0x17e)[0x2ac5cf54b9be]
[gpu-k20-13:142156] [11]
mdrunmpi(cmain+0x1cdb)[0x43b4bb]
[gpu-k20-13:142156] [12]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2ac5d1534d1d]
[gpu-k20-13:142156] [13] mdrunmpi[0x407be1]
[gpu-k20-13:142156] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID
142156 on node gpu-k20-13 exited on signal 11
(Segmentation fault).
--------------------------------------------------------------------------
We do not have MPI_THREAD_MULTIPLE enabled in
our build, so Charm++ cannot be using this level
of threading. The configure line for OpenMPI was
./configure --prefix=$PREFIX \
--with-threads --with-verbs=yes --enable-shared
--enable-static \
--with-io-romio-flags="--with-file-system=nfs+lustre"
\
--without-loadleveler --without-slurm --with-tm \
--with-cuda=$(dirname $(dirname $(which nvcc)))
Maxime
Le 2014-08-14 14:20, Joshua Ladd a écrit :
What about between nodes? Since this is coming
from the OpenIB BTL, would be good to check this.
Do you know what the MPI thread level is set to
when used with the Charm++ runtime? Is it
MPI_THREAD_MULTIPLE? The OpenIB BTL is not
thread safe.
Josh
On Thu, Aug 14, 2014 at 2:17 PM, Maxime
Boissonneault
<maxime.boissonnea...@calculquebec.ca
<mailto:maxime.boissonnea...@calculquebec.ca>>
wrote:
Hi,
I ran gromacs successfully with OpenMPI
1.8.1 and Cuda 6.0.37 on a single node,
with 8 ranks and multiple OpenMP threads.
Maxime
Le 2014-08-14 14:15, Joshua Ladd a écrit :
Hi, Maxime
Just curious, are you able to run a
vanilla MPI program? Can you try one one
of the example programs in the "examples"
subdirectory. Looks like a threading issue
to me.
Thanks,
Josh
_______________________________________________
users mailing list us...@open-mpi.org
<mailto:us...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/08/25023.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25024.php
_______________________________________________
users mailing list us...@open-mpi.org
<mailto:us...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/08/25025.php
--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25026.php
_______________________________________________
users mailing list us...@open-mpi.org
<mailto:us...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/08/25027.php
--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25028.php
_______________________________________________ users mailing
list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2014/08/25031.php
--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25039.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/08/25040.php
--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique