Re: [OMPI users] OpenMPI vs Intel MPI

Eugene Loh Thu, 2 Jul 2009 10:34:29 -0400

Swamy Kandadai wrote:

Jeff:

I'm not Jeff, but...

Linpack has different characteristics at different problem sizes. Atsmall problem sizes, any number of different overheads could be theproblem. At large problem sizes, one should approach the peakfloating-point performance of the machine and the efficiency of one'sDGEMM (and blocking one uses, etc.) should become the issues. So, onequestion is whether there is a difference in the overheads or whetherthe large-N performance is actually different.

I recommend measuring performance for a range of matrix sizes. The datashould be able to tell you if there are performance differences at smallN that disappear with sufficiently large N or if there is a performancedifference that would persist regardless of how large one were to make N.

Again, I think it's better to look at trends as a function of N ratherthan just looking at one data point. You can get better understandingthat way. Plus, it's cheaper! (Run time grows as N^3, so it's fasterto run many small Ns than to run one or two blockbuster Ns.)

Anyhow, one would think the data will indicate that large-N performanceis independent of the MPI implementation -- so long as you use the sameDGEMMs in both cases (and you say you're using MKL in both cases). Butthis is an important assumption to check.

If it's a matter of small-N overheads taking the edge off your big-Nperformance, then you could maybe start profiling small-N runs.

I am running on a 2.66 GHz Nehalem node. On this node, the turbo modeand hyperthreading are enabled.When I run LINPACK with Intel MPI, I get 82.68 GFlops without muchtrouble.
When I ran with OpenMPI (I have OpenMPI 1.2.8 but my colleague wasusing 1.3.2). I was using the same MKL libraries both with OpenMPI andIntel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops andI could never achieve close to what I am getting with Intel MPI.
Here are muy options with OpenMPI:
mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mcacoll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned 1./xhpl_ompi
Here is my rankfile:

at rankfile
rank 0=i02n05 slot=0
rank 1=i02n05 slot=1
rank 2=i02n05 slot=2
rank 3=i02n05 slot=3
rank 4=i02n05 slot=4
rank 5=i02n05 slot=5
rank 6=i02n05 slot=6
rank 7=i02n05 slot=7
In this case the physical cores are 0-7 while the additional logicalprocessors with hyperthreading are 8-15.With "top" command, I could see all the 8 tasks are running on 8different physical cores. I did not see2 MPI tasks running on the same physical core. Also, the program isnot paging as the problem size
fits in the meory.
Do you have any ideas how I can improve the performance so that itmatches with Intel MPI performance?
Any suggestions will be greatly appreciated.

Re: [OMPI users] OpenMPI vs Intel MPI

Reply via email to