A followup....

Part of problem was affinity.  I had written a script to do processor
and memory affinity (which works fine with MVAPICH2).  It is an
idea that I got from TACC.  However, the script didn't seem to
work correctly with OpenMPI (or I still have bugs).

Setting --mca mpi_paffinity_alone 1 made things better.  However,
the performance is still not as good:

Cores   Mvapich2    Openmpi
---------------------------
   8      17.3        17.3
  16      31.7        31.5
  32      62.9        62.8
  64     110.8       108.0
 128     219.2       201.4
 256     384.5       342.7
 512     687.2       537.6

The performance number is GFlops (so larger is better).

The first few numbers show that the executable is the right
speed.  I verified that IB is being used by using OMB and
checking latency and bandwidth.  Those numbers are what I
expect (3GB/s, 1.5mu/s for QDR).

However, the Openmpi version is not scaling as well.  Any
ideas on why that might be the case?

Thanks,
Craig


Craig Tierney wrote:
> I am running openmpi-1.3.3 on my cluster which is using
> OFED-1.4.1 for Infiniband support.  I am comparing performance
> between this version of OpenMPI and Mvapich2, and seeing a
> very large difference in performance.
> 
> The code I am testing is WRF v3.0.1.  I am running the
> 12km benchmark.
> 
> The two builds are the exact same codes and configuration
> files.  All I did different was use modules to switch versions
> of MPI, and recompiled the code.
> 
> Performance:
> 
> Cores   Mvapich2    Openmpi
> ---------------------------
>    8      17.3        13.9
>   16      31.7        25.9
>   32      62.9        51.6
>   64     110.8        92.8
>  128     219.2       189.4
>  256     384.5       317.8
>  512     687.2       516.7
> 
> The performance number is GFlops (so larger is better).
> 
> I am calling openmpi as:
> 
> /opt/openmpi/1.3.3-intel/bin/mpirun  --mca plm_rsh_disable_qrsh 1 --mca btl 
> openib,sm,self \
> -machinefile /tmp/6026489.1.qntest.q/machines -x LD_LIBRARY_PATH -np $NSLOTS 
> /home/ctierney/bin/noaa_affinity ./wrf.exe
> 
> So,
> 
> Is this expected?  Are some common sense optimizations to use?
> Is there a way to verify that I am really using the IB?  When
> I try:
> 
> -mca bta ^tcp,openib,sm,self
> 
> I get the errors:
> --------------------------------------------------------------------------
> No available btl components were found!
> 
> This means that there are no components of this type installed on your
> system or all the components reported that they could not be used.
> 
> This is a fatal error; your MPI process is likely to abort.  Check the
> output of the "ompi_info" command and ensure that components of this
> type are available on your system.  You may also wish to check the
> value of the "component_path" MCA parameter and ensure that it has at
> least one directory that contains valid MCA components.
> --------------------------------------------------------------------------
> 
> But ompi_info is telling me that I have openib support:
> 
>    MCA btl: openib (MCA v2.0, API v2.0, Component v1.3.3)
> 
> Note, I did rebuild OFED and put it in a different directory
> and did not rebuild OpenMPI.  However, since ompi_info isn't
> complaining and the libraries are available, I am thinking that
> is isn't a problem.  I could be wrong.
> 
> Thanks,
> Craig


-- 
Craig Tierney (craig.tier...@noaa.gov)

Reply via email to