Hi Gilles,
The LAMMPS jobs for both versions are pure MPI. In the SLURM script, 64 cores are requested from 4 nodes. So it's 64 MPI tasks and not necessarily evenly distributed across all the nodes. (each node is equipped with 64 cores.) I can reproduce the performance issue using the LAMMPS example "VISCOSITY/in.wall.2d". The run time difference is a jaw-dropping 20 seconds (v-1.8.4) vs. 45 mins (v-1.10.1). Among the multiple tests, I do have one job using v-1.10.1 finished in 20 seconds. Again, unstable performance. We also tested other software packages such as cp2k, VASP and Quantum Espresso, and they all have similar issues. Here is the decomposed MPI time in the LAMMPS job outputs. v-1.8.4 (Job execution time: 00:00:20) Loop time of 8.94962 on 64 procs for 50000 steps with 1020 atoms Pair time (%) = 0.270092 (3.01791) Neigh time (%) = 0.0842548 (0.941435) Comm time (%) = 3.3474 (37.4027) Outpt time (%) = 0.00901061 (0.100682) Other time (%) = 5.23886 (58.5373) v-1.10.1 (Job execution time: 00:45:50) Loop time of 2003.07 on 64 procs for 50000 steps with 1020 atoms Pair time (%) = 0.346776 (0.0173122) Neigh time (%) = 0.18047 (0.00900966) Comm time (%) = 535.836 (26.7508) Outpt time (%) = 1.68608 (0.0841748) Other time (%) = 1465.02 (73.1387) I wonder if you can share your config.log and ompi_info with your v-1.10.1 compilation. Hopefully we can find a solution by comparing the configuration differences. We had been playing with the cma and vader parameters but with no luck. Thanks, Jingchao Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@open-mpi.org> on behalf of Gilles Gouaillardet <gil...@rist.or.jp> Sent: Tuesday, December 15, 2015 12:11 AM To: Open MPI Users Subject: Re: [OMPI users] performance issue with OpenMPI 1.10.1 Hi, First, can you check how many MPI tasks and OpenMP threads are used with both ompi versions ? /* it should be 16 MPI tasks x no OpenMP threads */ can you also post both MPI task timing breakdown (from the output) i tried a simple test with the VISCOSITY/in.wall.2d and i did not observe any performance difference. can you reproduce the performance drop with an input file from the examples directory ? if not, can you post your in.snr input file ? Cheers, Gilles On 12/15/2015 7:18 AM, Jingchao Zhang wrote: Hi all, We installed the latest release of OpenMPI 1.10.1 on our Linux cluster and find it having some performance issues. We tested the OpenMPI performance against the MD simulation package LAMMPS (http://lammps.sandia.gov/). Compared to our previous installation of version 1.8.4, the 1.10.1 is nearly three times slower when running on multiple nodes. Run time across four computing nodes have the following results: 1.10.1 1.8.4 1 0:09:39 0:09:21 2 0:50:29 0:09:23 3 0:50:29 0:09:28 4 0:13:38 0:09:27 5 0:10:43 0:09:34 Ave 0:27:00 0:09:27 Unit is hour:minute:second. Five tests are done for each case and the averaged run time is listed in the last row. Tests on single node have the same run time results for both 1.10.1 and 1.8.4. We use SLURM as our job scheduler and the submit script for the LAMMPS job is as below: "#!/bin/sh #SBATCH -N 4 #SBATCH -n 64 #SBATCH --mem=2g #SBATCH --time=00:50:00 #SBATCH --error=job.%J.err #SBATCH --output=job.%J.out module load compiler/gcc/4.7 export PATH=$PATH:/util/opt/openmpi/1.10.1/gcc/4.7/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/util/opt/openmpi/1.10.1/gcc/4.7/lib export INCLUDE=$INCLUDE:/util/opt/openmpi/1.10.1/gcc/4.7/include mpirun lmp_ompi_g++ < in.snr" The "lmp_ompi_g++" binary is compiled against gcc/4.7 and openmpi/1.10.1. The compiler flags and MPI information can be found in the attachments. The problem here as you can see is the unstable performance for v-1.10.1. I wonder if this is a configuration issue at the compilation stage. Below are some information I gathered according to the "Getting Help" page. Version of Open MPI that we are using: Open MPI version: 1.10.1 Open MPI repo revision: v1.10.0-178-gb80f802 Open MPI release date: Nov 03, 2015 "config.log" and "ompi_info --all" information are enclosed in the attachment. Network information: 1. OpenFabrics version Mellanox/vendor 2.4-1.0.4 Download:<http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.4-1.0.4&mname=MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tgz><http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.4-1.0.4&mname=MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tgz> 2. Linux version Scientific Linux release 6.6 2.6.32-504.23.4.el6.x86_64 3. subnet manager OpenSM 4. ibv_devinfo hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.9.1000 node_guid: 0002:c903:0050:6190 sys_image_guid: 0002:c903:0050:6193 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0D90110009 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5) sm_lid: 1 port_lid: 34 port_lmc: 0x00 link_layer: InfiniBand 5. ifconfig em1 Link encap:Ethernet HWaddr D0:67:E5:F9:20:76 inet addr:10.138.25.3 Bcast:10.138.255.255 Mask:255.255.0.0 inet6 addr: fe80::d267:e5ff:fef9:2076/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:28977969 errors:0 dropped:0 overruns:0 frame:0 TX packets:67069501 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3588666680 (3.3 GiB) TX bytes:8145183622 (7.5 GiB) Ifconfig uses the ioctl access method to get the full address information, which limits hardware addresses to 8 bytes. Because Infiniband address has 20 bytes, only the first 8 bytes are displayed correctly. Ifconfig is obsolete! For replacement check ip. ib0 Link encap:InfiniBand HWaddr A0:00:02:20:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.137.25.3 Bcast:10.137.255.255 Mask:255.255.0.0 inet6 addr: fe80::202:c903:50:6191/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1776 errors:0 dropped:0 overruns:0 frame:0 TX packets:418 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:131571 (128.4 KiB) TX bytes:81418 (79.5 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:40310687 errors:0 dropped:0 overruns:0 frame:0 TX packets:40310687 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:45601859442 (42.4 GiB) TX bytes:45601859442 (42.4 GiB) 6. ulimit -l unlimited Please kindly let me know if more information are needed. Thanks, Jingchao Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2015/12/28160.php