Jeff: I am running on a 2.66 GHz Nehalem node. On this node, the turbo mode and hyperthreading are enabled. When I run LINPACK with Intel MPI, I get 82.68 GFlops without much trouble.
When I ran with OpenMPI (I have OpenMPI 1.2.8 but my colleague was using 1.3.2). I was using the same MKL libraries both with OpenMPI and Intel MPI. But with OpenMPI, the best I got so far is 80.22 GFlops and I could never achieve close to what I am getting with Intel MPI. Here are muy options with OpenMPI: mpirun -n 8 --machinefile hf --mca rmaps_rank_file_path rankfile --mca coll_sm_info_num_procs 8 --mca btl self,sm -mca mpi_leave_pinned 1 ./xhpl_ompi Here is my rankfile: at rankfile rank 0=i02n05 slot=0 rank 1=i02n05 slot=1 rank 2=i02n05 slot=2 rank 3=i02n05 slot=3 rank 4=i02n05 slot=4 rank 5=i02n05 slot=5 rank 6=i02n05 slot=6 rank 7=i02n05 slot=7 In this case the physical cores are 0-7 while the additional logical processors with hyperthreading are 8-15. With "top" command, I could see all the 8 tasks are running on 8 different physical cores. I did not see 2 MPI tasks running on the same physical core. Also, the program is not paging as the problem size fits in the meory. Do you have any ideas how I can improve the performance so that it matches with Intel MPI performance? Any suggestions will be greatly appreciated. Thanks Swamy Kandadai Dr. Swamy N. Kandadai IBM Senior Certified Executive IT Specialist STG WW Modular Systems Benchmark Center STG WW HPC and BI CoC Benchmark Center Phone:( 845) 433 -8429 (8-293) Fax:(845)432-9789 sw...@us.ibm.com http://w3.ibm.com/sales/systems/benchmarks