if your application still hangs, you can use padb http://padb.pittman.org.uk to the state of the application. if it hangs in a MPI collective subroutine, you can try to mpirun --mca coll basic and see if the hang disappear. (the default tuned coll module is known to be broken in some cases)
based on your report, we might recommend some tuning for the tuned module (as you can guess, the basic coll module is not optimized) Cheers, Gilles On Saturday, August 8, 2015, Ralph Castain <r...@open-mpi.org> wrote: > My first suggestion would be to try using 1.8.8 instead to get all the bug > fixes since 1.8.1 was released > > On Fri, Aug 7, 2015 at 10:34 PM, kishor sharma <kishor.i...@gmail.com > <javascript:_e(%7B%7D,'cvml','kishor.i...@gmail.com');>> wrote: > >> Hi, >> >> I recently upgraded from openmpi 1.5.4 to openmpi 1.81 and built an >> application which uses parallel version of Mumps ( >> http://mumps.enseeiht.fr/) . >> >> I am noticing process gets hung with np > 2 but some time it works also. >> I am not sure if this because of the openmpi upgrade or some problem with >> our code. It used to work fine with 1.5.4. >> >> Strace shows that process is polling some resource while it is hanged. >> Any pointers on how to debug this ? >> >> thanks, >> Kishor >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/08/27410.php >> > >