Hello, Open MPI developer!

Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)

On a such big box, process pinning is vital.

So we tried to use the Open MPI capabilities to pin te processes. But it seem that the rankfile infrastructure does not work properly: we always get "Error: Invalid argument" message on the 128-core node, also if the rankfile was OK. On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with changed node name of course) works well.

I believe, this computer dimension is a bit too big for the pinning infrasructure now. A bug?

Best wishes,

Paul Kapinos

P.S. see the attached .tgz for some logzz

------------------------------------------------------------------------------
   Rankfiles
Rankfiles provide a means for specifying detailed information about how process ranks should be mapped to nodes and how they should be bound. Consider the following:
....
------------------------------------------------------------------------------
                Open RTE: 1.5.3
   Open RTE SVN revision: r24532
   Open RTE release date: Mar 16, 2011
                    OPAL: 1.5.3
       OPAL SVN revision: r24532
       OPAL release date: Mar 16, 2011
            Ident string: 1.5.3



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

Attachment: rankfiles128.tgz
Description: application/compressed-tar

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to