On Jun 2, 2009, at 7:30 PM, Iftikhar Rathore -X (irathore - SIFY LIMITED at Cisco) wrote:

We are using openmpi version 1.2.8 (packaged with ofed-1.4). I am trying
to run hpl-2.0 (linpak). We have two intel quad core CPU's in all our
server (8 total cores)  and all hosts in the hostfile have lines that
look like "10.100.0.227 slots=8max_slots=8".


Per Gus's mail, let's assume that this is a typo in email.

Now when I use mpirun (even with --mca mpi_paffinity_alone 1) it does
not keep the affinity, the processes seem to gravitate towards first
four cores (using top and hitting 1). I know I do have MCA paffinity
available.

[root@devi DLR_WB_88]# ompi_info | grep paffinity
[devi.cisco.com:26178] mca: base: component_find: unable to open btl openib: file not found (ignored)


The above line is worrysome -- it means that it is unable to open the openib BTL, which means that you're not using the OpenFabrics networking. You should be able to run ompi_info and see that it lists the openib BTL properly.

This *usually* means that library dependencies are missing (or perhaps you are accidentally mixing-n-matching multiple versions of Open MPI on one or more boxes?). Run ld on $prefix/lib/openmpi/ mca_btl_openib.so and see what libraries are missing.

This is probably a more serious performance issue to fix than affinity.

           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.8)

The command line I am using is:

# mpirun -nolocal -np 896 -v --mca mpi_paffinity_alone 1 -hostfile / mnt/apps/hosts/896_8slots /mnt/apps/bin/xhpl


There's no reason that HPL should override paffinity. Drew (another TME from Cisco) tells me in an off-list email that you're using top to verify that the affinity is changing. Try using htop (http://htop.sf.net ) to verify. My old RHEL4-based machines' version of top don't show affinity -- perhaps newer versions do. But I know htop does.

--
Jeff Squyres
Cisco Systems

Reply via email to