Hello, Open MPI developer! Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores. (4x smaller Bull S6010 coupled by BCS chips to a single image machine)
On a such big box, process pinning is vital.So we tried to use the Open MPI capabilities to pin te processes. But it seem that the rankfile infrastructure does not work properly: we always get "Error: Invalid argument" message on the 128-core node, also if the rankfile was OK. On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with changed node name of course) works well.
I believe, this computer dimension is a bit too big for the pinning infrasructure now. A bug?
Best wishes, Paul Kapinos P.S. see the attached .tgz for some logzz ------------------------------------------------------------------------------ RankfilesRankfiles provide a means for specifying detailed information about how process ranks should be mapped to nodes and how they should be bound. Consider the following:
....
------------------------------------------------------------------------------
Open RTE: 1.5.3
Open RTE SVN revision: r24532
Open RTE release date: Mar 16, 2011
OPAL: 1.5.3
OPAL SVN revision: r24532
OPAL release date: Mar 16, 2011
Ident string: 1.5.3
--
Dipl.-Inform. Paul Kapinos - High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23, D 52074 Aachen (Germany)
Tel: +49 241/80-24915
rankfiles128.tgz
Description: application/compressed-tar
smime.p7s
Description: S/MIME Cryptographic Signature
