Jeff:

I am running on a 2.66 GHz Nehalem node.  On this node, the turbo mode and
hyperthreading are enabled.
When I run LINPACK with Intel MPI, I get  82.68 GFlops without much
trouble.

When I ran with OpenMPI (I have  OpenMPI 1.2.8 but my colleague was using
1.3.2). I was using the same MKL libraries both with OpenMPI and
Intel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops and I
could never achieve close to what I am getting with Intel MPI.
Here are muy options with OpenMPI:

mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mca
coll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned
1 ./xhpl_ompi

Here is my rankfile:

at rankfile
rank 0=i02n05 slot=0
rank 1=i02n05 slot=1
rank 2=i02n05 slot=2
rank 3=i02n05 slot=3
rank 4=i02n05 slot=4
rank 5=i02n05 slot=5
rank 6=i02n05 slot=6
rank 7=i02n05 slot=7

In this case the physical cores are 0-7 while the additional logical
processors with hyperthreading are 8-15.
With "top" command, I could see all the 8 tasks are running on 8 different
physical cores. I did not see
2 MPI tasks running on the same physical core. Also, the program is not
paging as the problem size
fits in the meory.

Do you have any ideas how I can improve the performance so that it matches
with Intel MPI performance?
Any suggestions will be greatly appreciated.

Thanks
Swamy Kandadai


Dr. Swamy N. Kandadai
IBM Senior Certified Executive IT Specialist
STG WW  Modular Systems Benchmark Center
STG WW HPC and BI CoC Benchmark Center
Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789
sw...@us.ibm.com
http://w3.ibm.com/sales/systems/benchmarks



Reply via email to