That backtrace shows we are registering MPI_Alloc_mem memory with verbs. This
is expected behavior but it doesn’t show the openib btl being used for any
communication. I am looking into a issue on an OmniPath system where just
initializing the openib btl causes performance problems even if it is
Hi,
there are several uncommon things happening here :
- btl/vader has a higher exclusivity than btl/sm, so bottom line, vader
should be used instead of sm
- is your interconnect infiniband or qlogic ? infiniband uses pml/ob1
and btl/openib for inter node communication,
whereas qlogic u
Hi Jeff,
I just installed Open MPI: 2.0.2 (repo revision: v2.0.1-348-ge291d0e; release
date: Jan 31, 2017) but have the same problem.
Attached please find two gdb backtraces on any write of a file descriptor
returned from opening /dev/infiniband/uverbs in the cp2k.popt process.
Thanks,
Jin
Can you try upgrading to Open MPI v2.0.2? We just released that last week with
a bunch of bug fixes.
> On Feb 7, 2017, at 3:07 PM, Jingchao Zhang wrote:
>
> Hi Tobias,
>
> Thanks for the reply. I tried both "export OMPI_MCA_mpi_leave_pinned=0" and
> "mpirun -mca mpi_leave_pinned 0" but stil
Hi Tobias,
Thanks for the reply. I tried both "export OMPI_MCA_mpi_leave_pinned=0" and
"mpirun -mca mpi_leave_pinned 0" but still got the same behavior. Our OpenMPI
version is 2.0.1. Repo version is v2.0.0-257-gee86e07. We have Intel Qlogic and
OPA networks on the same cluster.
Below is our
Hello Howard,
I am able to run my Open MPI job to completion over TCP as you suggested for a
sanity/configuration double check. I also am able to complete the job using
the RoCE fabric if I swap the breakout cable with 2 regular RoCE cables. I am
willing to test some custom builds to help iro
Hello Jingchao,
try to use -mca mpi_leave_pinned 0, also for multinode jobs.
kind regards,
Tobias Klöffel
On 02/06/2017 09:38 PM, Jingchao Zhang wrote:
Hi,
We recently noticed openmpi is using btl openib over self,sm for
single node jobs, which has caused performance degradation for some
a