We have OMPI 1.4.0, 1.4.5 and 1.6.5 installed on our system.

>>What version of OMPI are you using? We have a "seq" mapper that does what you 
>>want, but the precise cmd line option for directing to use it depends a bit 
>>on the version.

>>On Apr 9, 2014, at 9:22 AM, Gan, Qi PW <Qi.Gan2_at_[hidden]> wrote:

> Hi,
>
> I have a problem when setting the processes of a parallel job with specified 
> order. Suppose a job with 6 processes (rank0 to rank5) needs to run on 3 
> hosts (A, B, C) with following order:
> Rank0 -- A
> Rank1 -- B
> Rank2 -- B
> Rank3 -- C
> Rank4 -- A
> Rank5 -- C
> Specifying this order (ABBCAC) in hostfile doesn't work because Open MPI only 
> supports "byslot" (AABBCC) or "bynode" (ABCABC) ranking orders.
>
> However, if I use rankfile to implement this order in the format of
> rank 0=A slot=<slot setting>
> rank 0=B slot=<slot setting>
> rank 0=B slot=<slot setting>
> rank 0=C slot=<slot setting>
> rank 0=A slot=<slot setting>
> rank 0=C slot=<slot setting>
> I run into another problem on how to determine the <slot setting> for each 
> rank. If I bind each rank to all cores/CPUs on a node (e.g. rank 0=A 
> slot=0-n, where n is the maximal CPU number), I run into the following errors:
>
> *** An error occurred in MPI_comm_size
> *** on a NULL communicator
> *** Unknown error
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>
> If I don't select all cores, I need to identify which cores are available to 
> my job in order to avoid CPU oversubscribing since the nodes are shared by 
> multiple jobs.
>
> Our system is the intel based cluster (12 or 16 cores per node) and the job 
> is submitted by LSF batch submitter.
>
> Here is my question: how to implement a specified order of processes at node 
> level without binding at core/cpu level?
>
> Any help and suggestions would be appreciated.
>
> Thanks,
> Chee
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to