On Jun 2, 2009, at 7:30 PM, Iftikhar Rathore -X (irathore - SIFY
LIMITED at Cisco) wrote:
We are using openmpi version 1.2.8 (packaged with ofed-1.4). I am
trying
to run hpl-2.0 (linpak). We have two intel quad core CPU's in all our
server (8 total cores) and all hosts in the hostfile have lines that
look like "10.100.0.227 slots=8max_slots=8".
Per Gus's mail, let's assume that this is a typo in email.
Now when I use mpirun (even with --mca mpi_paffinity_alone 1) it does
not keep the affinity, the processes seem to gravitate towards first
four cores (using top and hitting 1). I know I do have MCA paffinity
available.
[root@devi DLR_WB_88]# ompi_info | grep paffinity
[devi.cisco.com:26178] mca: base: component_find: unable to open btl
openib: file not found (ignored)
The above line is worrysome -- it means that it is unable to open the
openib BTL, which means that you're not using the OpenFabrics
networking. You should be able to run ompi_info and see that it lists
the openib BTL properly.
This *usually* means that library dependencies are missing (or perhaps
you are accidentally mixing-n-matching multiple versions of Open MPI
on one or more boxes?). Run ld on $prefix/lib/openmpi/
mca_btl_openib.so and see what libraries are missing.
This is probably a more serious performance issue to fix than affinity.
MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)
The command line I am using is:
# mpirun -nolocal -np 896 -v --mca mpi_paffinity_alone 1 -hostfile /
mnt/apps/hosts/896_8slots /mnt/apps/bin/xhpl
There's no reason that HPL should override paffinity. Drew (another
TME from Cisco) tells me in an off-list email that you're using top to
verify that the affinity is changing. Try using htop (http://htop.sf.net
) to verify. My old RHEL4-based machines' version of top don't show
affinity -- perhaps newer versions do. But I know htop does.
--
Jeff Squyres
Cisco Systems