Hello, Open MPI developer! Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores. (4x smaller Bull S6010 coupled by BCS chips to a single image machine)
On a such big box, process pinning is vital.So we tried to use the Open MPI capabilities to pin te processes. But it seem that the rankfile infrastructure does not work properly: we always get "Error: Invalid argument" message on the 128-core node, also if the rankfile was OK. On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with changed node name of course) works well.
I believe, this computer dimension is a bit too big for the pinning infrasructure now. A bug?
Best wishes, Paul Kapinos P.S. see the attached .tgz for some logzz ------------------------------------------------------------------------------ RankfilesRankfiles provide a means for specifying detailed information about how process ranks should be mapped to nodes and how they should be bound. Consider the following:
.... ------------------------------------------------------------------------------ Open RTE: 1.5.3 Open RTE SVN revision: r24532 Open RTE release date: Mar 16, 2011 OPAL: 1.5.3 OPAL SVN revision: r24532 OPAL release date: Mar 16, 2011 Ident string: 1.5.3 -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
rankfiles128.tgz
Description: application/compressed-tar
smime.p7s
Description: S/MIME Cryptographic Signature