> One of the differences among MPI implementations is the default placement of > processes within the node. E.g., should processes by default be collocated > on cores of the same socket or on cores of different sockets? I don't know > if that issue is applicable here (that is, HP MPI vs Open MPI or on > Superdome architecture), but it's potentially an issue to look at. With HP > MPI, mpirun has a -cpu_bind switch for controlling placement. With Open > MPI, mpirun controls placement with -rankfile. > > E.g., what happens if you try > > % cat rf1 > rank 0=XX slot=0 > rank 1=XX slot=1 > % cat rf2 > rank 0=XX slot=0 > rank 1=XX slot=2 > % cat rf3 > rank 0=XX slot=0 > rank 1=XX slot=3 > [...etc...] > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf1 $PWD/IMB-MPI1 pingpong > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf2 $PWD/IMB-MPI1 pingpong > % mpirun -np 2 --mca btl self,sm --host XX,XX -rf rf3 $PWD/IMB-MPI1 pingpong > [...etc...] > > where XX is the name of your node and you march through all the cores on > your Superdome node?
I tried this, but it didn't seem to make a difference either