Hi Craig, list I suppose WRF uses MPI collective calls (MPI_Reduce, MPI_Bcast, MPI_Alltoall etc), just like the climate models we run here do. A recursive grep on the source code will tell.
If that is the case, you may need to tune the collectives dynamically. We are experimenting with tuned collectives here also. Specifically, we had a scaling problem with the MITgcm (also running on an IB cluster) that is probably due to collectives. Similar problems were reported on this list before, with computational chemistry software. See these threads: http://www.open-mpi.org/community/lists/users/2009/07/10045.php http://www.open-mpi.org/community/lists/users/2009/05/9419.php If WRF outputs timing information, particularly the time spent on MPI routines, you may also want to compare how the OpenMPI and MVAPICH versions fare w.r.t. MPI collectives. I hope this helps. Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Craig Tierney wrote:
I am running openmpi-1.3.3 on my cluster which is using OFED-1.4.1 for Infiniband support. I am comparing performance between this version of OpenMPI and Mvapich2, and seeing a very large difference in performance. The code I am testing is WRF v3.0.1. I am running the 12km benchmark. The two builds are the exact same codes and configuration files. All I did different was use modules to switch versions of MPI, and recompiled the code. Performance: Cores Mvapich2 Openmpi --------------------------- 8 17.3 13.9 16 31.7 25.9 32 62.9 51.6 64 110.8 92.8 128 219.2 189.4 256 384.5 317.8 512 687.2 516.7 The performance number is GFlops (so larger is better). I am calling openmpi as: /opt/openmpi/1.3.3-intel/bin/mpirun --mca plm_rsh_disable_qrsh 1 --mca btl openib,sm,self \ -machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np $NSLOTS /home/ctierney/bin/noaa_affinity ./wrf.exe So, Is this expected? Are some common sense optimizations to use? Is there a way to verify that I am really using the IB? When I try: -mca bta ^tcp,openib,sm,self I get the errors: -------------------------------------------------------------------------- No available btl components were found! This means that there are no components of this type installed on your system or all the components reported that they could not be used. This is a fatal error; your MPI process is likely to abort. Check the output of the "ompi_info" command and ensure that components of this type are available on your system. You may also wish to check the value of the "component_path" MCA parameter and ensure that it has at least one directory that contains valid MCA components. -------------------------------------------------------------------------- But ompi_info is telling me that I have openib support: MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3) Note, I did rebuild OFED and put it in a different directory and did not rebuild OpenMPI. However, since ompi_info isn't complaining and the libraries are available, I am thinking that is isn't a problem. I could be wrong. Thanks, Craig