Hello Ralph,
Yes, the rankfiles in rankfiles128.tgz are the rankfiles which are used, and linuxbsc*.txt files contain the output produced.

It would surprise me if the rankfile3 is incorrect - the very same files (exept the node name of course) rankfile1, rankfile2 worked on smaller machines, cf. runme.sh, the rankfile* files ant the output files.

The behaviour "it works on small box but does not work on thick box" was the quell of mu assumption that there is a error somewhere..

 For the complete error message on the thick node see linuxbsc269.txt file.

Updating to newer 1.5.x is a good idea; but it is always a bit tedious... Would 1.5.5 arrive the next time?

Best wishes,
Paul Kapinos


Ralph Castain wrote:
I don't see anything in the code that limits the number of procs in a rankfile.
> Are the attached rankfiles the ones you are trying to use?
I'm wondering if there is a syntax error that is causing the problem. It would help if you could provide the complete error message output.

At one time, there was a limit on the number of procs on a node -
> nothing to do with rankfile. That was fixed, though, and there
is no real limit any more. I don't recall the precise release number
where it changed in the 1.5 series - you might try updating to 1.5.4 as I'm sure it doesn't exist there.



On Jan 20, 2012, at 12:43 PM, Paul Kapinos wrote:

Hello, Open MPI developer!

Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores.
(4x smaller Bull S6010 coupled by BCS chips to a single image machine)

On a such big box, process pinning is vital.

So we tried to use the Open MPI capabilities to pin te processes. But it seem that the 
rankfile infrastructure does not work properly: we always get "Error: Invalid 
argument" message on the 128-core node, also if the rankfile was OK.
On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with 
changed node name of course) works well.

I believe, this computer dimension is a bit too big for the pinning 
infrasructure now. A bug?

Best wishes,

Paul Kapinos

P.S. see the attached .tgz for some logzz

------------------------------------------------------------------------------
  Rankfiles
      Rankfiles provide a means for specifying detailed information about how 
process ranks should  be  mapped  to nodes and how they should be bound.  
Consider the following:
....
------------------------------------------------------------------------------
               Open RTE: 1.5.3
  Open RTE SVN revision: r24532
  Open RTE release date: Mar 16, 2011
                   OPAL: 1.5.3
      OPAL SVN revision: r24532
      OPAL release date: Mar 16, 2011
           Ident string: 1.5.3



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915
<rankfiles128.tgz>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to