Hi Lenny

Thanks - using the full names makes it work!
Is there a reason why the rankfile option treats
host names differently than the hostfile option?

Thanks
  Jody



On Mon, Aug 17, 2009 at 11:20 AM, Lenny
Verkhovsky<lenny.verkhov...@gmail.com> wrote:
> Hi
> This message means
> that you are trying to use host "plankton", that was not allocated via
> hostfile or hostlist.
> But according to the files and command line, everything seems fine.
> Can you try using "plankton.uzh.ch" hostname instead of "plankton".
> thanks
> Lenny.
> On Mon, Aug 17, 2009 at 10:36 AM, jody <jody....@gmail.com> wrote:
>>
>> Hi
>>
>> When i use a rankfile, i get an error message which i don't understand:
>>
>> [jody@plankton tests]$ mpirun -np 3 -rf rankfile -hostfile testhosts
>> ./HelloMPI
>> --------------------------------------------------------------------------
>> Rankfile claimed host plankton that was not allocated or
>> oversubscribed it's slots:
>>
>> --------------------------------------------------------------------------
>> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file rmaps_rank_file.c at line 108
>> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file base/rmaps_base_map_job.c at line 87
>> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file base/plm_base_launch_support.c at line 77
>> [plankton.uzh.ch:24327] [[44857,0],0] ORTE_ERROR_LOG: Bad parameter in
>> file plm_rsh_module.c at line 990
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>>
>>
>> With out the '-rf rankfile' option everything works as expected.
>>
>> My hostfile :
>> [jody@plankton tests]$ cat testhosts
>> # The following node is a quad-processor machine, and we absolutely
>> # want to disallow over-subscribing it:
>> plankton slots=3  max-slots=3
>> # The following nodes are dual-processor machines:
>> nano_00  slots=2 max-slots=2
>> nano_01  slots=2 max-slots=2
>> nano_02  slots=2 max-slots=2
>> nano_03  slots=2 max-slots=2
>> nano_04  slots=2 max-slots=2
>> nano_05  slots=2 max-slots=2
>> nano_06  slots=2 max-slots=2
>>
>> my rank file:
>> [jody@plankton neander]$ cat rankfile
>> rank  0=nano_00  slot=1
>> rank  1=plankton slot=0
>> rank  2=nano_01  slot=1
>>
>> my Open MPI version: 1.3.2
>>
>> i get the same error if i use a rankfile which has a single line
>>  rank  0=plankton  slot=0
>> (plankton is my local machine) and call mpirun with np 1
>>
>> What does the "Rankfile claimed..." message mean?
>> Did i make an error in my rankfile?
>> If yes, what would be the correct way to write it?
>>
>> Thank You
>>  Jody
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to