Dear open-mpi user,
I am running a CPMD calculation in parallel. I got the following error and
job got killed. Below I have given the error message. What is this error
and how to fix it ?
[[12065,1],23][btl_openib_component.c:2948:handle_wc] from
compute-0-0.local to: compute-0-7 error polling LP
This typically indicates an error in the physical layer of your IB network.
You should run layer 0 diagnostics and look for bad cables, bad HCAs, etc.
On Oct 18, 2013, at 1:49 AM, "sudhirs@" wrote:
> Dear open-mpi user,
> I am running a CPMD calculation in parallel. I got the following error
I've been testing an application that turns out to be ~30% slower with
OMPI 1.6.5 than (the Red Hat packaged version of) 1.5.4, with the same
mca-params and the same binary, just flipping the runtime. It's running
over openib, and the profile it prints says that alltoall is a factor of
four slower
"Jeff Squyres (jsquyres)" writes:
> Short version:
> --
>
> What you really want is:
>
> mpirun --mca pml ob1 ...
>
> The "--mca mtl ^psm" way will get the same result, but forcing pml=ob1 is
> really a slightly Better solution (from a semantic perspective)
I'm afraid ^psm is r