if you are running with master, i recommend you
mpirun --mca mpi_add_procs_cutoff 1024 ...
in order to avoid the crash i just reported at
https://github.com/open-mpi/ompi/issues/1501
Cheers,
Gilles
On 3/28/2016 4:44 PM, Gilles Gouaillardet wrote:
at first, does it hang when running on only one node ?
when the hang occur, you can collect stack traces
(run pstack on mpitest)
to see where it hangs.
since you configure'd with --disable-dlopen, it means your btl has
been slurped into openmpi.
that means some parts of it are executed, and it could be responsible
for the hang.
note if you
mpirun --mca btl self,sm -np 2 ...
on two hosts, then it will never work since no btl can be used so mpi
tasks can communicate.
it seems something is wrong on master, i will check from now
do smallMPI and bigMPI implies on host is little endian and the other
is big endian ?
if yes, then you need to configure with --enable-heterogeneous
Cheers,
Gilles
On 3/28/2016 4:26 PM, dpchoudh . wrote:
Hello Gilles
Per your suggestion, installing libnl3-devel does fixes the mpicc
issue, but there still seems to be another issue down the road: the
generated executable seems to hang. I have tried sm, tcp and openib
BTLs, all with the same result:
[durga@smallMPI ~]$ mpirun -np 2 -H smallMPI,bigMPI -mca btl self,sm
./mpitest <--- Hangs
The source code for the simple test is as follows:
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d
processors\n", hostname, world_rank, world_size);
MPI_Finalize();
return 0;
}
What do I do now?
Thanks
Durga
We learn from history that we never learn from history.
On Mon, Mar 28, 2016 at 2:37 AM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Does this happen only with master ?
what does
ldd mpicc
says ?
does it require both libnl and libnl3 ?
libnl3 is used by OpenMPI if libnl3-devel package is installed,
and this is not the case on your system
a possible root cause is third party libs use libnl3, and the
reachable/netlink component
tries to use libnl, in this case, installing libnl3-devel should
fix your issue
/* you will need to re-configure after that */
an other possible root cause is some third party libs use libnl
and other use libnl3,
and in this case, i am afraid there is no simple workaround.
if installing libnl3-devel did not solve your issue, you can give
a try to
https://github.com/open-mpi/ompi/pull/1014
at least, it will abort with an error message that states which
lib is using libnl and which is using libnl3
i am afraid the only option is to manually disable some
components, so only one flavor of lib nl is used.
that can be achieved by adding a .opal_ignore empty file in the
dir of the components you want to disable.
/* you will need to rerun autogen.pl <http://autogen.pl> after
that */
Cheers,
Gilles
On 3/28/2016 3:16 PM, dpchoudh . wrote:
Hello all
The system in question is a CentOS 7 box, that has been running
OpenMPI, both the master branch and the 1.10.2 release happily
until now.
Just now, in order to debug something, I recompiled with the
following options:
$ ./configure --enable-debug --enable-debug-symbols --disable-dlopen
The compilation and install was successful; however, mpicc now
crashes like this:
[durga@smallMPI ~]$ mpicc -Wall -Wextra -o mpitest mpitest.c
mpicc: route/tc.c:973: rtnl_tc_register: Assertion `0' failed.
Aborted (core dumped)
Searching the mailing archive, I found two posts that describe
similar situations:
https://www.open-mpi.org/community/lists/devel/2015/08/17812.php
http://www.open-mpi.org/community/lists/users/2015/11/28016.php
However, the solution proposed in these, to disable verbs, is
not acceptable to me for the following reasons: I am trying to
implement a new BTL by reverse engineering the openib BTL. I am
using a Qlogic HCA for this purpose. (Please note that I cannot
use PSM as I am writing code for a BTL)
As there any more acceptable solutions for this? Here are the
list of nl libraries on my box:
[durga@smallMPI ~]$ sudo yum list installed | grep libnl
libnl.x86_64 1.1.4-3.el7 @anaconda
libnl-devel.x86_64 1.1.4-3.el7 @anaconda
libnl3.x86_64 3.2.21-10.el7 @base
libnl3-cli.x86_64 3.2.21-10.el7 @base
and uninstalling libnl3 is not an option either: it seems yum
wants to uninstall around 100 odd other packages because of
dependency which will essentially render the machine unusable.
Please help!
Thanks in advance
Durga
We learn from history that we never learn from history.
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/03/28855.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28856.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/03/28859.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28860.php