Hi Jeff,

I just installed Open MPI: 2.0.2 (repo revision: v2.0.1-348-ge291d0e; release 
date: Jan 31, 2017) but have the same problem.


Attached please find two gdb backtraces on any write of a file descriptor 
returned from opening /dev/infiniband/uverbs in the cp2k.popt process.


Thanks,

Jingchao

________________________________
From: users <users-boun...@lists.open-mpi.org> on behalf of Jeff Squyres 
(jsquyres) <jsquy...@cisco.com>
Sent: Tuesday, February 7, 2017 2:14:40 PM
To: Open MPI User's List
Subject: Re: [OMPI users] openmpi single node jobs using btl openib

Can you try upgrading to Open MPI v2.0.2?  We just released that last week with 
a bunch of bug fixes.


> On Feb 7, 2017, at 3:07 PM, Jingchao Zhang <zh...@unl.edu> wrote:
>
> Hi Tobias,
>
> Thanks for the reply. I tried both "export OMPI_MCA_mpi_leave_pinned=0" and 
> "mpirun -mca mpi_leave_pinned 0" but still got the same behavior. Our OpenMPI 
> version is 2.0.1. Repo version is v2.0.0-257-gee86e07. We have Intel Qlogic 
> and OPA networks on the same cluster.
>
> Below is our configuration flags:
> ./configure     --prefix=$PREFIX \
>                 --with-hwloc=internal \
>                 --enable-mpirun-prefix-by-default \
>                 --with-slurm \
>                 --with-verbs \
>                 --with-psm \
>                 --with-psm2 \
>                 --disable-openib-connectx-xrc \
>                 --with-knem=/opt/knem-1.1.2.90mlnx1 \
>                 --with-cma
>
> So the question remains why OpenMPI choose openib over self,sm for single 
> node jobs? Isn't there a mechanism to differentiate btl networks for 
> single/multi-node jobs?
>
> Thanks,
> Jingchao
> From: users <users-boun...@lists.open-mpi.org> on behalf of Tobias Kloeffel 
> <tobias.kloef...@fau.de>
> Sent: Tuesday, February 7, 2017 2:54:46 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] openmpi single node jobs using btl openib
>
> Hello Jingchao,
> try to use -mca mpi_leave_pinned 0, also for multinode jobs.
>
> kind regards,
> Tobias Klöffel
>
> On 02/06/2017 09:38 PM, Jingchao Zhang wrote:
>> Hi,
>>
>> We recently noticed openmpi is using btl openib over self,sm for single node 
>> jobs, which has caused performance degradation for some applications, e.g. 
>> 'cp2k'. For opempi version 2.0.1, our test shows single node 'cp2k' job 
>> using openib is ~25% slower than using self,sm. We advise users do '--mca 
>> btl_base_exclude openib' as a temporary fix. I need to point out that not 
>> all applications are affected by this feature. Many of them have the same 
>> single-node performance with/without openib. Why doesn't openmpi use self,sm 
>> by default for single node jobs? Is this the intended behavior?
>>
>> Thanks,
>> Jingchao
>>
>>
>> _______________________________________________
>> users mailing list
>>
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> M.Sc. Tobias Klöffel
> =======================================================
> Interdisciplinary Center for Molecular Materials (ICMM)
> and Computer-Chemistry-Center (CCC)
> Department Chemie und Pharmazie
> Friedrich-Alexander-Universität Erlangen-Nürnberg
> Nägelsbachstr. 25
> D-91052 Erlangen, Germany
>
> Room: 2.305
> Phone: +49 (0) 9131 / 85 - 20423
> Fax: +49 (0) 9131 / 85 - 26565
>
> =======================================================
>
> E-mail:
> tobias.kloef...@fau.de
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Single node cp2k run.
Both PSM and openib shouldn't be used, just self,vader, etc?

<cp2k.popt program backtrace, within PMPI_Alloc_mem MPI call upon write() of
/dev/infiniband/uverbs0 file descriptor>
#0  0x00000039efe0e7a0 in write () from /lib64/libpthread.so.0
#1  0x00000039f0203ae7 in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1
#2  0x00002af33926165a in ?? () from /usr/lib64/libipathverbs-rdmav2.so
#3  0x00000039f020a0c3 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1
#4  0x00002af33ac78c37 in openib_reg_mr ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_btl_openib.so
#5  0x00002af33a455b29 in mca_mpool_grdma_register ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so
#6  0x00002af33a4557b3 in mca_mpool_grdma_alloc ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so
#7  0x00002af32b963cee in mca_mpool_base_alloc ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libopen-pal.so.20
#8  0x00002af32b34f3bc in PMPI_Alloc_mem ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi.so.20
#9  0x00002af32b0dea12 in pmpi_alloc_mem_cptr__ ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi_mpifh.so.20
#10 0x000000000277b1c2 in message_passing_mp_mp_allocate_d_ ()
#11 0x000000000257d80e in dbcsr_ptr_util_mp_mem_alloc_d_ ()
#12 0x000000000256f612 in dbcsr_data_methods_low_mp_internal_data_allocate_ ()
#13 0x0000000002585f89 in dbcsr_data_methods_mp_dbcsr_data_new_ ()
#14 0x00000000024cb8d2 in dbcsr_work_operations_mp_dbcsr_create_new_ ()
#15 0x0000000002399f5b in dbcsr_mm_cannon_mp_make_images_ ()
#16 0x0000000002396b02 in dbcsr_mm_cannon_mp_make_m2s_..0 ()
#17 0x0000000002395c24 in dbcsr_mm_cannon_mp_dbcsr_mm_cannon_multiply_ ()
#18 0x00000000023da877 in dbcsr_multiply_api_mp_dbcsr_multiply_d_ ()
#19 0x00000000022d1585 in cp_dbcsr_interface_mp_cp_dbcsr_multiply_d_ ()
#20 0x0000000000950ba8 in cp_dbcsr_operations_mp_cp_dbcsr_plus_fm_fm_t_native_ 
()
#21 0x0000000000e93a23 in qs_mo_methods_mp_calculate_dm_sparse_ ()
#22 0x0000000001944f2b in qs_scf_diagonalization_mp_do_general_diag_ ()
#23 0x000000000184e34f in qs_scf_loop_utils_mp_qs_scf_new_mos_ ()
#24 0x000000000172c4d7 in qs_scf_mp_scf_env_do_scf_ ()
#25 0x00000000017299db in qs_scf_mp_scf_ ()
#26 0x0000000000e3ed70 in qs_energy_mp_qs_energies_ ()
#27 0x00000000013e7efd in qs_force_mp_qs_forces_ ()
#28 0x00000000013e66d1 in qs_force_mp_qs_calc_energy_force_ ()
#29 0x0000000000a3a611 in force_env_methods_mp_force_env_calc_energy_force_ ()
#30 0x000000000058fc6f in cp_eval_at_ ()
#31 0x00000000005bc2a0 in bfgs_optimizer_mp_geoopt_bfgs_ ()
#32 0x00000000005a231a in geo_opt_mp_cp_geo_opt_ ()
#33 0x00000000004268ef in cp2k_runs_mp_run_input_ ()
#34 0x000000000040e035 in MAIN__ ()
#35 0x000000000040d67e in main ()


<cp2k.popt program backtrace, within PMPI_Free_mem MPI call upon write() of
/dev/infiniband/uverbs0 file descriptor>
#0  0x00000039efe0e7a0 in write () from /lib64/libpthread.so.0
#1  0x00000039f0203a42 in ibv_cmd_dereg_mr () from /usr/lib64/libibverbs.so.1
#2  0x00002af3392615bd in ?? () from /usr/lib64/libipathverbs-rdmav2.so
#3  0x00000039f020a021 in ibv_dereg_mr () from /usr/lib64/libibverbs.so.1
#4  0x00002af33ac78c7d in openib_dereg_mr ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_btl_openib.so
#5  0x00002af33a455ff2 in mca_mpool_grdma_deregister ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so
#6  0x00002af33a455efd in mca_mpool_grdma_free ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so
#7  0x00002af32b963dcf in mca_mpool_base_free ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libopen-pal.so.20
#8  0x00002af32b3597bb in PMPI_Free_mem ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi.so.20
#9  0x00002af32b0e0b3a in pmpi_free_mem__ ()
   from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi_mpifh.so.20
#10 0x0000000002716a40 in message_passing_mp_mp_deallocate_i_ ()
#11 0x0000000002577f6c in dbcsr_ptr_util_mp_ensure_array_size_i_ ()
#12 0x00000000024ebde8 in dbcsr_index_operations_mp_dbcsr_addto_index_array_ ()
#13 0x00000000024ce5fd in dbcsr_work_operations_mp_quick_finalize_ ()
#14 0x00000000024d9af3 in dbcsr_work_operations_mp_dbcsr_special_finalize_ ()
#15 0x000000000239cc2a in dbcsr_mm_cannon_mp_make_images_ ()
#16 0x0000000002396b02 in dbcsr_mm_cannon_mp_make_m2s_..0 ()
#17 0x0000000002395c24 in dbcsr_mm_cannon_mp_dbcsr_mm_cannon_multiply_ ()
#18 0x00000000023da877 in dbcsr_multiply_api_mp_dbcsr_multiply_d_ ()
#19 0x00000000022d1585 in cp_dbcsr_interface_mp_cp_dbcsr_multiply_d_ ()
#20 0x0000000000950ba8 in cp_dbcsr_operations_mp_cp_dbcsr_plus_fm_fm_t_native_ 
()
#21 0x0000000000e93a23 in qs_mo_methods_mp_calculate_dm_sparse_ ()
#22 0x0000000001944f2b in qs_scf_diagonalization_mp_do_general_diag_ ()
#23 0x000000000184e34f in qs_scf_loop_utils_mp_qs_scf_new_mos_ ()
#24 0x000000000172c4d7 in qs_scf_mp_scf_env_do_scf_ ()
#25 0x00000000017299db in qs_scf_mp_scf_ ()
#26 0x0000000000e3ed70 in qs_energy_mp_qs_energies_ ()
#27 0x00000000013e7efd in qs_force_mp_qs_forces_ ()
#28 0x00000000013e66d1 in qs_force_mp_qs_calc_energy_force_ ()
#29 0x0000000000a3a611 in force_env_methods_mp_force_env_calc_energy_force_ ()
#30 0x000000000058fc6f in cp_eval_at_ ()
#31 0x00000000005bc2a0 in bfgs_optimizer_mp_geoopt_bfgs_ ()
#32 0x00000000005a231a in geo_opt_mp_cp_geo_opt_ ()
#33 0x00000000004268ef in cp2k_runs_mp_run_input_ ()
#34 0x000000000040e035 in MAIN__ ()
#35 0x000000000040d67e in main ()

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to