Noam,

I do not recall exactly which version of Open MPI was affected, but we had
some issues with the non-reentrancy of our memory allocator. More recent
versions (1.10 and 2.0) will not have this issue. Can you update to a newer
version of Open MPI (1.10 or maybe 2.0) and see if you can reproduce it?

Thanks,
  George.



On Wed, Nov 23, 2016 at 11:44 AM, Noam Bernstein <
noam.bernst...@nrl.navy.mil> wrote:

> On Nov 17, 2016, at 3:22 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil>
> wrote:
>
> Hi - we’ve started seeing over the last few days crashes and hangs in
> openmpi, in a code that hasn’t been touched in months, and an openmpi
> installation (v. 1.8.5) that also hasn’t been touched in months.  The
> symptoms are either a hang, with a stack trace (from attaching to the one
> running process that’s got 0% CPU usage) that looks like this:
>
> .
>
> .
> .
> .
>
> I’m in the process of recompiling openmpi 1.8.8 and the mpi-using code
> (vasp 5.4.1), just to make sure everything’s clean, but I was just
> wondering if anyone had any ideas as to what might even be causing this
> kind of behavior, or what other information might be useful for me to
> gather to figure out what’s going on.  As I implied at the top, this
> setup’s been working well for years, and I believe entirely untouched (the
> openmpi library and executable, I mean, since we did just have a kernel
> update) for far longer than these crashes.
>
>
>
> No one has any suggestions about this problem?  I tried openmpi 1.8.8, and
> a newer version of Mellanox’s OFED, and behavior is the same.
>
> Does anyone who knows the guts of mpi have any ideas whether this even
> looks like an openmpi problem (as opposed to lower level, i.e. infiniband
> drivers, or higher level, i.e. calling code), from the stack traces I
> posted earlier?
>
> Noam
>
> ____________
> |
> |
> |
> *U.S. NAVAL*
> |
> |
> _*RESEARCH*_
> |
> LABORATORY
>
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to