Hello Gilles Thanks for your prompt response and apologies for the delayed response.
The hang issue is fixed now. It seems that OpenMPI seems to prefer PSM when it detects Qlogic HCAs, even when I pass -mca btl openib,self. Adding another parameter, -mca pml ob1 fixed the issue. There is nothing wrong with the code base (except for the conflicting libnl dependency), and sorry for confusing you. Durga We learn from history that we never learn from history. On Mon, Mar 28, 2016 at 3:44 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > at first, does it hang when running on only one node ? > > when the hang occur, you can collect stack traces > (run pstack on mpitest) > to see where it hangs. > > since you configure'd with --disable-dlopen, it means your btl has been > slurped into openmpi. > that means some parts of it are executed, and it could be responsible for > the hang. > > note if you > mpirun --mca btl self,sm -np 2 ... > on two hosts, then it will never work since no btl can be used so mpi > tasks can communicate. > > it seems something is wrong on master, i will check from now > > do smallMPI and bigMPI implies on host is little endian and the other is > big endian ? > if yes, then you need to configure with --enable-heterogeneous > > Cheers, > > Gilles > > > On 3/28/2016 4:26 PM, dpchoudh . wrote: > > Hello Gilles > > Per your suggestion, installing libnl3-devel does fixes the mpicc issue, > but there still seems to be another issue down the road: the generated > executable seems to hang. I have tried sm, tcp and openib BTLs, all with > the same result: > > [durga@smallMPI ~]$ mpirun -np 2 -H smallMPI,bigMPI -mca btl self,sm > ./mpitest <--- Hangs > > The source code for the simple test is as follows: > > #include <mpi.h> > #include <stdio.h> > #include <unistd.h> > > int main(int argc, char** argv) > { > int world_size, world_rank, name_len; > char hostname[MPI_MAX_PROCESSOR_NAME]; > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD, &world_size); > MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); > MPI_Get_processor_name(hostname, &name_len); > printf("Hello world from processor %s, rank %d out of %d > processors\n", hostname, world_rank, world_size); > MPI_Finalize(); > return 0; > } > > > What do I do now? > > Thanks > Durga > > We learn from history that we never learn from history. > > On Mon, Mar 28, 2016 at 2:37 AM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> Does this happen only with master ? >> >> what does >> ldd mpicc >> says ? >> does it require both libnl and libnl3 ? >> >> libnl3 is used by OpenMPI if libnl3-devel package is installed, >> and this is not the case on your system >> >> a possible root cause is third party libs use libnl3, and the >> reachable/netlink component >> tries to use libnl, in this case, installing libnl3-devel should fix your >> issue >> /* you will need to re-configure after that */ >> >> an other possible root cause is some third party libs use libnl and other >> use libnl3, >> and in this case, i am afraid there is no simple workaround. >> if installing libnl3-devel did not solve your issue, you can give a try to >> https://github.com/open-mpi/ompi/pull/1014 >> at least, it will abort with an error message that states which lib is >> using libnl and which is using libnl3 >> >> i am afraid the only option is to manually disable some components, so >> only one flavor of lib nl is used. >> that can be achieved by adding a .opal_ignore empty file in the dir of >> the components you want to disable. >> /* you will need to rerun autogen.pl after that */ >> >> Cheers, >> >> Gilles >> >> On 3/28/2016 3:16 PM, dpchoudh . wrote: >> >> Hello all >> >> The system in question is a CentOS 7 box, that has been running OpenMPI, >> both the master branch and the 1.10.2 release happily until now. >> >> Just now, in order to debug something, I recompiled with the following >> options: >> >> $ ./configure --enable-debug --enable-debug-symbols --disable-dlopen >> >> The compilation and install was successful; however, mpicc now crashes >> like this: >> >> [durga@smallMPI ~]$ mpicc -Wall -Wextra -o mpitest mpitest.c >> mpicc: route/tc.c:973: rtnl_tc_register: Assertion `0' failed. >> Aborted (core dumped) >> >> >> Searching the mailing archive, I found two posts that describe similar >> situations: >> >> https://www.open-mpi.org/community/lists/devel/2015/08/17812.php >> http://www.open-mpi.org/community/lists/users/2015/11/28016.php >> >> However, the solution proposed in these, to disable verbs, is not >> acceptable to me for the following reasons: I am trying to implement a new >> BTL by reverse engineering the openib BTL. I am using a Qlogic HCA for this >> purpose. (Please note that I cannot use PSM as I am writing code for a BTL) >> >> As there any more acceptable solutions for this? Here are the list of nl >> libraries on my box: >> >> [durga@smallMPI ~]$ sudo yum list installed | grep libnl >> libnl.x86_64 1.1.4-3.el7 >> @anaconda >> libnl-devel.x86_64 1.1.4-3.el7 >> @anaconda >> libnl3.x86_64 3.2.21-10.el7 >> @base >> libnl3-cli.x86_64 3.2.21-10.el7 >> @base >> >> and uninstalling libnl3 is not an option either: it seems yum wants to >> uninstall around 100 odd other packages because of dependency which will >> essentially render the machine unusable. >> >> Please help! >> >> Thanks in advance >> Durga >> >> We learn from history that we never learn from history. >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28855.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28856.php >> > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28859.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28860.php >