The slots are numbered 0-3 for a four-slot allocation as shown in your 
PE_HOSTFILE. Your rankfile contains assignments to slot=4 and slot=5, which are 
outside your allocation.


On Apr 23, 2010, at 10:36 AM, Orion Poplawski wrote:

> I'm using gridengine 6.2u5 and openmpi 1.3.3.  I'm submitting a parallel
> job and would like to specify a rankfile to set processor binding but am
> getting errors.
> 
> The $PE_HOSTFILE generated by gridengine is:
> 
> amos.cora.nwra.com 4 cloud...@amos.cora.nwra.com UNDEFINED
> andrew.cora.nwra.com 4 cloud...@andrew.cora.nwra.com UNDEFINED
> 
> The rankfile I'm using is:
> 
> rank 0=amos.cora.nwra.com slot=0
> rank 1=andrew.cora.nwra.com slot=0
> rank 2=amos.cora.nwra.com slot=4
> rank 3=andrew.cora.nwra.com slot=4
> rank 4=amos.cora.nwra.com slot=1
> rank 5=andrew.cora.nwra.com slot=1
> rank 6=amos.cora.nwra.com slot=5
> rank 7=andrew.cora.nwra.com slot=5
> 
> The error I'm getting is:
> 
> Rankfile claimed host amos.cora.nwra.com that was not allocated or
> oversubscribed it's slots:
> 
> --------------------------------------------------------------------------
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> rmaps_rank_file.c at line 108
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/rmaps_base_map_job.c at line 87
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> base/plm_base_launch_support.c at line 77
> [amos:05727] [[44126,0],0] ORTE_ERROR_LOG: Bad parameter in file
> plm_rsh_module.c at line 990
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> launch so we are aborting.
> 
> Any ideas?
> 
> Thanks!
> 
> - Orion
> 
> -- 
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA/CoRA Division                    FAX: 303-415-9702
> 3380 Mitchell Lane                  or...@cora.nwra.com
> Boulder, CO 80301              http://www.cora.nwra.com
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to