> On Dec 7, 2016, at 10:07 AM, Christof Koehler 
> <christof.koeh...@bccms.uni-bremen.de> wrote:
>> 
> I really think the hang is a consequence of
> unclean termination (in the sense that the non-root ranks are not
> terminated) and probably not the cause, in my interpretation of what I
> see. Would you have any suggestion to catch signals sent between orterun
> (mpirun) and the child tasks ?

Do you know where in the code the termination call is?  Is it actually calling 
mpi_abort(), or just doing something ugly like calling fortran “stop”?  If the 
latter, would that explain a possible hang?

Presumably someone here can comment on what the standard says about the 
validity of terminating without mpi_abort.

Actually, if you’re willing to share enough input files to reproduce, I could 
take a look.  I just recompiled our VASP with openmpi 2.0.1 to fix a crash that 
was apparently addressed by some change in the memory allocator in a recent 
version of openmpi.  Just e-mail me if that’s the case.

                                                                        Noam


____________
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to