Hello Ralph,Yes, the rankfiles in rankfiles128.tgz are the rankfiles which are used, and linuxbsc*.txt files contain the output produced.
It would surprise me if the rankfile3 is incorrect - the very same files (exept the node name of course) rankfile1, rankfile2 worked on smaller machines, cf. runme.sh, the rankfile* files ant the output files.
The behaviour "it works on small box but does not work on thick box" was the quell of mu assumption that there is a error somewhere..
For the complete error message on the thick node see linuxbsc269.txt file.Updating to newer 1.5.x is a good idea; but it is always a bit tedious... Would 1.5.5 arrive the next time?
Best wishes, Paul Kapinos Ralph Castain wrote:
I don't see anything in the code that limits the number of procs in a rankfile.
> Are the attached rankfiles the ones you are trying to use?
I'm wondering if there is a syntax error that is causing the problem. It would help if you could provide the complete error message output.At one time, there was a limit on the number of procs on a node -
> nothing to do with rankfile. That was fixed, though, and there
is no real limit any more. I don't recall the precise release numberwhere it changed in the 1.5 series - you might try updating to 1.5.4 as I'm sure it doesn't exist there.
On Jan 20, 2012, at 12:43 PM, Paul Kapinos wrote:Hello, Open MPI developer! Now, we have a really nice toy: 2 Tb RAM, 16 sockets, 128 cores. (4x smaller Bull S6010 coupled by BCS chips to a single image machine) On a such big box, process pinning is vital. So we tried to use the Open MPI capabilities to pin te processes. But it seem that the rankfile infrastructure does not work properly: we always get "Error: Invalid argument" message on the 128-core node, also if the rankfile was OK. On a smaller node (up to 32 cores/ 64 threads) the very same rankfile (with changed node name of course) works well. I believe, this computer dimension is a bit too big for the pinning infrasructure now. A bug? Best wishes, Paul Kapinos P.S. see the attached .tgz for some logzz ------------------------------------------------------------------------------ Rankfiles Rankfiles provide a means for specifying detailed information about how process ranks should be mapped to nodes and how they should be bound. Consider the following: .... ------------------------------------------------------------------------------ Open RTE: 1.5.3 Open RTE SVN revision: r24532 Open RTE release date: Mar 16, 2011 OPAL: 1.5.3 OPAL SVN revision: r24532 OPAL release date: Mar 16, 2011 Ident string: 1.5.3 -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 <rankfiles128.tgz>_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
smime.p7s
Description: S/MIME Cryptographic Signature