> On Dec 7, 2016, at 10:07 AM, Christof Koehler > <christof.koeh...@bccms.uni-bremen.de> wrote: >> > I really think the hang is a consequence of > unclean termination (in the sense that the non-root ranks are not > terminated) and probably not the cause, in my interpretation of what I > see. Would you have any suggestion to catch signals sent between orterun > (mpirun) and the child tasks ?
Do you know where in the code the termination call is? Is it actually calling mpi_abort(), or just doing something ugly like calling fortran “stop”? If the latter, would that explain a possible hang? Presumably someone here can comment on what the standard says about the validity of terminating without mpi_abort. Actually, if you’re willing to share enough input files to reproduce, I could take a look. I just recompiled our VASP with openmpi 2.0.1 to fix a crash that was apparently addressed by some change in the memory allocator in a recent version of openmpi. Just e-mail me if that’s the case. Noam ____________ || |U.S. NAVAL| |_RESEARCH_| LABORATORY Noam Bernstein, Ph.D. Center for Materials Physics and Technology U.S. Naval Research Laboratory T +1 202 404 8628 F +1 202 404 7546 https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users