Hi,
there are several uncommon things happening here :
- btl/vader has a higher exclusivity than btl/sm, so bottom line, vader
should be used instead of sm
- is your interconnect infiniband or qlogic ? infiniband uses pml/ob1
and btl/openib for inter node communication,
whereas qlogic users pml/cm and mtl/psm.
- does your program involve MPI_Comm_spawn ? note that nor btl/vader nor
btl/sm can be user for inter job communications
(e.g. the main task and a spawn-ed task), so btl/openib would be used
even for intra node communications.
can you please run again your app with
mpirun --mca pml_base_verbose 10 --mca btl_base_verbose 10 --mca
mtl_base_verbose 10 ...
Cheers,
Gilles
On 2/8/2017 6:50 AM, Jingchao Zhang wrote:
Hi Jeff,
I just installed Open MPI: 2.0.2 (repo revision: v2.0.1-348-ge291d0e;
release date: Jan 31, 2017) but have the same problem.
Attached please find two gdb backtraces on any write of a file
descriptor returned from opening /dev/infiniband/uverbs in the
cp2k.popt process.
Thanks,
Jingchao
------------------------------------------------------------------------
*From:* users <users-boun...@lists.open-mpi.org> on behalf of Jeff
Squyres (jsquyres) <jsquy...@cisco.com>
*Sent:* Tuesday, February 7, 2017 2:14:40 PM
*To:* Open MPI User's List
*Subject:* Re: [OMPI users] openmpi single node jobs using btl openib
Can you try upgrading to Open MPI v2.0.2? We just released that last
week with a bunch of bug fixes.
> On Feb 7, 2017, at 3:07 PM, Jingchao Zhang <zh...@unl.edu> wrote:
>
> Hi Tobias,
>
> Thanks for the reply. I tried both "export
OMPI_MCA_mpi_leave_pinned=0" and "mpirun -mca mpi_leave_pinned 0" but
still got the same behavior. Our OpenMPI version is 2.0.1. Repo
version is v2.0.0-257-gee86e07. We have Intel Qlogic and OPA networks
on the same cluster.
>
> Below is our configuration flags:
> ./configure --prefix=$PREFIX \
> --with-hwloc=internal \
> --enable-mpirun-prefix-by-default \
> --with-slurm \
> --with-verbs \
> --with-psm \
> --with-psm2 \
> --disable-openib-connectx-xrc \
> --with-knem=/opt/knem-1.1.2.90mlnx1 \
> --with-cma
>
> So the question remains why OpenMPI choose openib over self,sm for
single node jobs? Isn't there a mechanism to differentiate btl
networks for single/multi-node jobs?
>
> Thanks,
> Jingchao
> From: users <users-boun...@lists.open-mpi.org> on behalf of Tobias
Kloeffel <tobias.kloef...@fau.de>
> Sent: Tuesday, February 7, 2017 2:54:46 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] openmpi single node jobs using btl openib
>
> Hello Jingchao,
> try to use -mca mpi_leave_pinned 0, also for multinode jobs.
>
> kind regards,
> Tobias Klöffel
>
> On 02/06/2017 09:38 PM, Jingchao Zhang wrote:
>> Hi,
>>
>> We recently noticed openmpi is using btl openib over self,sm for
single node jobs, which has caused performance degradation for some
applications, e.g. 'cp2k'. For opempi version 2.0.1, our test shows
single node 'cp2k' job using openib is ~25% slower than using self,sm.
We advise users do '--mca btl_base_exclude openib' as a temporary fix.
I need to point out that not all applications are affected by this
feature. Many of them have the same single-node performance
with/without openib. Why doesn't openmpi use self,sm by default for
single node jobs? Is this the intended behavior?
>>
>> Thanks,
>> Jingchao
>>
>>
>> _______________________________________________
>> users mailing list
>>
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> M.Sc. Tobias Klöffel
> =======================================================
> Interdisciplinary Center for Molecular Materials (ICMM)
> and Computer-Chemistry-Center (CCC)
> Department Chemie und Pharmazie
> Friedrich-Alexander-Universität Erlangen-Nürnberg
> Nägelsbachstr. 25
> D-91052 Erlangen, Germany
>
> Room: 2.305
> Phone: +49 (0) 9131 / 85 - 20423
> Fax: +49 (0) 9131 / 85 - 26565
>
> =======================================================
>
> E-mail:
> tobias.kloef...@fau.de
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Jeff Squyres
jsquy...@cisco.com
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users