Hi,
 
I am trying to use the rankmap to bind a 4-proc mpi job to one socket of a 
two-socket, 8 core machine. However I'm getting a strange error.
 
CMDS USED
orterun --hostfile hostlist.1 -n 4  --mca rmaps_rank_file_path ./rankmap.1 
desres-netscan  -o $OUTDIR
 
$ cat rankmap.1
rank 0=drdb0235.en slot=0:0
rank 1=drdb0235.en slot=0:1
rank 2=drdb0235.en slot=0:2
rank 3=drdb0235.en slot=0:3
 
$ cat hostlist.1
drdb0235.en slots=8

ERROR SEEN
--------------------------------------------------------------------------
Rankfile claimed host drdb0235.en that was not allocated or oversubscribed it's 
slots:
--------------------------------------------------------------------------
[drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad 
parameter in file rmaps_rank_file.c at line 108
[drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad 
parameter in file base/rmaps_base_map_job.c at line 87
[drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad 
parameter in file base/plm_base_launch_support.c at line 77
[drdb0235.en.desres.deshaw.com:14242] [[37407,0],0] ORTE_ERROR_LOG: Bad 
parameter in file plm_rsh_module.c at line 985 
 
>From looking at the code in rmaps_rank_file.c it seems the error occurs when 
>the node-gathering code wraps twice around the hostlist. However I dont see 
>why that is happening.
 
If I specify 8 slots in the rankmap, I see a different error: Error, invalid 
rank (4) in the rankfile (./rankmap.1)
 
Thanks,
Federico

         


Reply via email to