Hi Jeff,
I just installed Open MPI: 2.0.2 (repo revision: v2.0.1-348-ge291d0e; release date: Jan 31, 2017) but have the same problem. Attached please find two gdb backtraces on any write of a file descriptor returned from opening /dev/infiniband/uverbs in the cp2k.popt process. Thanks, Jingchao ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of Jeff Squyres (jsquyres) <jsquy...@cisco.com> Sent: Tuesday, February 7, 2017 2:14:40 PM To: Open MPI User's List Subject: Re: [OMPI users] openmpi single node jobs using btl openib Can you try upgrading to Open MPI v2.0.2? We just released that last week with a bunch of bug fixes. > On Feb 7, 2017, at 3:07 PM, Jingchao Zhang <zh...@unl.edu> wrote: > > Hi Tobias, > > Thanks for the reply. I tried both "export OMPI_MCA_mpi_leave_pinned=0" and > "mpirun -mca mpi_leave_pinned 0" but still got the same behavior. Our OpenMPI > version is 2.0.1. Repo version is v2.0.0-257-gee86e07. We have Intel Qlogic > and OPA networks on the same cluster. > > Below is our configuration flags: > ./configure --prefix=$PREFIX \ > --with-hwloc=internal \ > --enable-mpirun-prefix-by-default \ > --with-slurm \ > --with-verbs \ > --with-psm \ > --with-psm2 \ > --disable-openib-connectx-xrc \ > --with-knem=/opt/knem-1.1.2.90mlnx1 \ > --with-cma > > So the question remains why OpenMPI choose openib over self,sm for single > node jobs? Isn't there a mechanism to differentiate btl networks for > single/multi-node jobs? > > Thanks, > Jingchao > From: users <users-boun...@lists.open-mpi.org> on behalf of Tobias Kloeffel > <tobias.kloef...@fau.de> > Sent: Tuesday, February 7, 2017 2:54:46 AM > To: Open MPI Users > Subject: Re: [OMPI users] openmpi single node jobs using btl openib > > Hello Jingchao, > try to use -mca mpi_leave_pinned 0, also for multinode jobs. > > kind regards, > Tobias Klöffel > > On 02/06/2017 09:38 PM, Jingchao Zhang wrote: >> Hi, >> >> We recently noticed openmpi is using btl openib over self,sm for single node >> jobs, which has caused performance degradation for some applications, e.g. >> 'cp2k'. For opempi version 2.0.1, our test shows single node 'cp2k' job >> using openib is ~25% slower than using self,sm. We advise users do '--mca >> btl_base_exclude openib' as a temporary fix. I need to point out that not >> all applications are affected by this feature. Many of them have the same >> single-node performance with/without openib. Why doesn't openmpi use self,sm >> by default for single node jobs? Is this the intended behavior? >> >> Thanks, >> Jingchao >> >> >> _______________________________________________ >> users mailing list >> >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > -- > M.Sc. Tobias Klöffel > ======================================================= > Interdisciplinary Center for Molecular Materials (ICMM) > and Computer-Chemistry-Center (CCC) > Department Chemie und Pharmazie > Friedrich-Alexander-Universität Erlangen-Nürnberg > Nägelsbachstr. 25 > D-91052 Erlangen, Germany > > Room: 2.305 > Phone: +49 (0) 9131 / 85 - 20423 > Fax: +49 (0) 9131 / 85 - 26565 > > ======================================================= > > E-mail: > tobias.kloef...@fau.de > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Single node cp2k run. Both PSM and openib shouldn't be used, just self,vader, etc? <cp2k.popt program backtrace, within PMPI_Alloc_mem MPI call upon write() of /dev/infiniband/uverbs0 file descriptor> #0 0x00000039efe0e7a0 in write () from /lib64/libpthread.so.0 #1 0x00000039f0203ae7 in ibv_cmd_reg_mr () from /usr/lib64/libibverbs.so.1 #2 0x00002af33926165a in ?? () from /usr/lib64/libipathverbs-rdmav2.so #3 0x00000039f020a0c3 in ibv_reg_mr () from /usr/lib64/libibverbs.so.1 #4 0x00002af33ac78c37 in openib_reg_mr () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_btl_openib.so #5 0x00002af33a455b29 in mca_mpool_grdma_register () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so #6 0x00002af33a4557b3 in mca_mpool_grdma_alloc () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so #7 0x00002af32b963cee in mca_mpool_base_alloc () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libopen-pal.so.20 #8 0x00002af32b34f3bc in PMPI_Alloc_mem () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi.so.20 #9 0x00002af32b0dea12 in pmpi_alloc_mem_cptr__ () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi_mpifh.so.20 #10 0x000000000277b1c2 in message_passing_mp_mp_allocate_d_ () #11 0x000000000257d80e in dbcsr_ptr_util_mp_mem_alloc_d_ () #12 0x000000000256f612 in dbcsr_data_methods_low_mp_internal_data_allocate_ () #13 0x0000000002585f89 in dbcsr_data_methods_mp_dbcsr_data_new_ () #14 0x00000000024cb8d2 in dbcsr_work_operations_mp_dbcsr_create_new_ () #15 0x0000000002399f5b in dbcsr_mm_cannon_mp_make_images_ () #16 0x0000000002396b02 in dbcsr_mm_cannon_mp_make_m2s_..0 () #17 0x0000000002395c24 in dbcsr_mm_cannon_mp_dbcsr_mm_cannon_multiply_ () #18 0x00000000023da877 in dbcsr_multiply_api_mp_dbcsr_multiply_d_ () #19 0x00000000022d1585 in cp_dbcsr_interface_mp_cp_dbcsr_multiply_d_ () #20 0x0000000000950ba8 in cp_dbcsr_operations_mp_cp_dbcsr_plus_fm_fm_t_native_ () #21 0x0000000000e93a23 in qs_mo_methods_mp_calculate_dm_sparse_ () #22 0x0000000001944f2b in qs_scf_diagonalization_mp_do_general_diag_ () #23 0x000000000184e34f in qs_scf_loop_utils_mp_qs_scf_new_mos_ () #24 0x000000000172c4d7 in qs_scf_mp_scf_env_do_scf_ () #25 0x00000000017299db in qs_scf_mp_scf_ () #26 0x0000000000e3ed70 in qs_energy_mp_qs_energies_ () #27 0x00000000013e7efd in qs_force_mp_qs_forces_ () #28 0x00000000013e66d1 in qs_force_mp_qs_calc_energy_force_ () #29 0x0000000000a3a611 in force_env_methods_mp_force_env_calc_energy_force_ () #30 0x000000000058fc6f in cp_eval_at_ () #31 0x00000000005bc2a0 in bfgs_optimizer_mp_geoopt_bfgs_ () #32 0x00000000005a231a in geo_opt_mp_cp_geo_opt_ () #33 0x00000000004268ef in cp2k_runs_mp_run_input_ () #34 0x000000000040e035 in MAIN__ () #35 0x000000000040d67e in main () <cp2k.popt program backtrace, within PMPI_Free_mem MPI call upon write() of /dev/infiniband/uverbs0 file descriptor> #0 0x00000039efe0e7a0 in write () from /lib64/libpthread.so.0 #1 0x00000039f0203a42 in ibv_cmd_dereg_mr () from /usr/lib64/libibverbs.so.1 #2 0x00002af3392615bd in ?? () from /usr/lib64/libipathverbs-rdmav2.so #3 0x00000039f020a021 in ibv_dereg_mr () from /usr/lib64/libibverbs.so.1 #4 0x00002af33ac78c7d in openib_dereg_mr () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_btl_openib.so #5 0x00002af33a455ff2 in mca_mpool_grdma_deregister () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so #6 0x00002af33a455efd in mca_mpool_grdma_free () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/openmpi/mca_mpool_grdma.so #7 0x00002af32b963dcf in mca_mpool_base_free () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libopen-pal.so.20 #8 0x00002af32b3597bb in PMPI_Free_mem () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi.so.20 #9 0x00002af32b0e0b3a in pmpi_free_mem__ () from /util/opt/openmpi/2.0.2.psm2/intel/15.0.2/lib/libmpi_mpifh.so.20 #10 0x0000000002716a40 in message_passing_mp_mp_deallocate_i_ () #11 0x0000000002577f6c in dbcsr_ptr_util_mp_ensure_array_size_i_ () #12 0x00000000024ebde8 in dbcsr_index_operations_mp_dbcsr_addto_index_array_ () #13 0x00000000024ce5fd in dbcsr_work_operations_mp_quick_finalize_ () #14 0x00000000024d9af3 in dbcsr_work_operations_mp_dbcsr_special_finalize_ () #15 0x000000000239cc2a in dbcsr_mm_cannon_mp_make_images_ () #16 0x0000000002396b02 in dbcsr_mm_cannon_mp_make_m2s_..0 () #17 0x0000000002395c24 in dbcsr_mm_cannon_mp_dbcsr_mm_cannon_multiply_ () #18 0x00000000023da877 in dbcsr_multiply_api_mp_dbcsr_multiply_d_ () #19 0x00000000022d1585 in cp_dbcsr_interface_mp_cp_dbcsr_multiply_d_ () #20 0x0000000000950ba8 in cp_dbcsr_operations_mp_cp_dbcsr_plus_fm_fm_t_native_ () #21 0x0000000000e93a23 in qs_mo_methods_mp_calculate_dm_sparse_ () #22 0x0000000001944f2b in qs_scf_diagonalization_mp_do_general_diag_ () #23 0x000000000184e34f in qs_scf_loop_utils_mp_qs_scf_new_mos_ () #24 0x000000000172c4d7 in qs_scf_mp_scf_env_do_scf_ () #25 0x00000000017299db in qs_scf_mp_scf_ () #26 0x0000000000e3ed70 in qs_energy_mp_qs_energies_ () #27 0x00000000013e7efd in qs_force_mp_qs_forces_ () #28 0x00000000013e66d1 in qs_force_mp_qs_calc_energy_force_ () #29 0x0000000000a3a611 in force_env_methods_mp_force_env_calc_energy_force_ () #30 0x000000000058fc6f in cp_eval_at_ () #31 0x00000000005bc2a0 in bfgs_optimizer_mp_geoopt_bfgs_ () #32 0x00000000005a231a in geo_opt_mp_cp_geo_opt_ () #33 0x00000000004268ef in cp2k_runs_mp_run_input_ () #34 0x000000000040e035 in MAIN__ () #35 0x000000000040d67e in main ()
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users