I'm using gridengine 6.2u5 and openmpi 1.3.3.  I'm submitting a parallel
job and would like to specify a rankfile to set processor binding but am
getting errors.

The $PE_HOSTFILE generated by gridengine is:

amos.cora.nwra.com 4 cloud...@amos.cora.nwra.com UNDEFINED
andrew.cora.nwra.com 4 cloud...@andrew.cora.nwra.com UNDEFINED

The rankfile I'm using is:

rank 0=amos.cora.nwra.com slot=0
rank 1=andrew.cora.nwra.com slot=0
rank 2=amos.cora.nwra.com slot=4
rank 3=andrew.cora.nwra.com slot=4
rank 4=amos.cora.nwra.com slot=1
rank 5=andrew.cora.nwra.com slot=1
rank 6=amos.cora.nwra.com slot=5
rank 7=andrew.cora.nwra.com slot=5

The error I'm getting is:

Rankfile claimed host amos.cora.nwra.com that was not allocated or
oversubscribed it's slots:

--------------------------------------------------------------------------
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 108
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 87
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/plm_base_launch_support.c at line 77
[amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
plm_rsh_module.c at line 990
--------------------------------------------------------------------------
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

Any ideas?

Thanks!

- Orion

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  or...@cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

Reply via email to