Hi Lenny Thanks - using the full names makes it work! Is there a reason why the rankfile option treats host names differently than the hostfile option?
Thanks Jody On Mon, Aug 17, 2009 at 11:20 AM, Lenny Verkhovsky<lenny.verkhov...@gmail.com> wrote: > Hi > This message means > that you are trying to use host "plankton", that was not allocated via > hostfile or hostlist. > But according to the files and command line, everything seems fine. > Can you try using "plankton.uzh.ch" hostname instead of "plankton". > thanks > Lenny. > On Mon, Aug 17, 2009 at 10:36 AM, jody <jody....@gmail.com> wrote: >> >> Hi >> >> When i use a rankfile, i get an error message which i don't understand: >> >> [jody@plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts >> ./HelloMPI >> -------------------------------------------------------------------------- >> Rankfile claimed host plankton that was not allocated or >> oversubscribed it's slots: >> >> -------------------------------------------------------------------------- >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in >> file rmaps_rank_file.c at line 108 >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in >> file base/rmaps_base_map_job.c at line 87 >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in >> file base/plm_base_launch_support.c at line 77 >> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in >> file plm_rsh_module.c at line 990 >> -------------------------------------------------------------------------- >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to >> launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that the job aborted, but has no info as to the process >> that caused that situation. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> >> >> >> With out the '-rf rankfile' option everything works as expected. >> >> My hostfile : >> [jody@plankton tests]$ cat testhosts >> # The following node is a quad-processor machine, and we absolutely >> # want to disallow over-subscribing it: >> plankton slots=3 max-slots=3 >> # The following nodes are dual-processor machines: >> nano_00 slots=2 max-slots=2 >> nano_01 slots=2 max-slots=2 >> nano_02 slots=2 max-slots=2 >> nano_03 slots=2 max-slots=2 >> nano_04 slots=2 max-slots=2 >> nano_05 slots=2 max-slots=2 >> nano_06 slots=2 max-slots=2 >> >> my rank file: >> [jody@plankton neander]$ cat rankfile >> rank 0=nano_00 slot=1 >> rank 1=plankton slot=0 >> rank 2=nano_01 slot=1 >> >> my Open MPI version: 1.3.2 >> >> i get the same error if i use a rankfile which has a single line >> rank 0=plankton slot=0 >> (plankton is my local machine) and call mpirun with np 1 >> >> What does the "Rankfile claimed..." message mean? >> Did i make an error in my rankfile? >> If yes, what would be the correct way to write it? >> >> Thank You >> Jody >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >