I think I have this resolved, though that I still suspect their is something wrong on that system. You shouldn’t have some nodes running munge and others not running it. I wonder if someone was experimenting and started munge on some of the nodes, and forgot to turn it off afterwards??
Anyway, see if this fixes the problem. https://github.com/open-mpi/ompi/pull/497 <https://github.com/open-mpi/ompi/pull/497> > On Mar 25, 2015, at 9:43 AM, Ralph Castain <r...@open-mpi.org> wrote: > > Much appreciated! Interesting problem/configuration :-) > >> On Mar 25, 2015, at 9:42 AM, Mark Santcroos <mark.santcr...@rutgers.edu> >> wrote: >> >> >>> On 25 Mar 2015, at 17:39 , Ralph Castain <r...@open-mpi.org> wrote: >>> Not surprising - I’m surprised to find munge on the mom’s node anyway given >>> that you are using Torque. >>> >>> I have to finish something else first, and it sounds like you aren’t >>> blocked at the moment. I’ll provide a patch for you to try later, if you’re >>> willing. >> >> Right. Don't worry, Im not blocked indeed, and just wanted to offer some >> debugging assistance as a courtesy. >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/03/26528.php >