It works!! Thanks a lot for the help!!

----------------------------------------------------------------------------------------------

Just add "-mca rmaps seq" to your command line, then. The mapper will take your 
hostfile (no rankfile) and map each proc sequentially to the listed nodes. You 
need to list each node once for each proc - something like this:

nodeA
nodeB
nodeB
nodeC
nodeA
nodeC
...

would produce your described pattern

On Apr 10, 2014, at 7:25 AM, Gan, Qi PW <Qi.Gan2_at_[hidden]> wrote:

> We have OMPI 1.4.0, 1.4.5 and 1.6.5 installed on our system.
> >>What version of OMPI are you using? We have a "seq" mapper that does what 
> >>you want, but the precise cmd line option for directing to use it depends a 
> >>bit on the version.
>
> >>On Apr 9, 2014, at 9:22 AM, Gan, Qi PW <Qi.Gan2_at_[hidden]> wrote:
>
> > Hi,
> >
> > I have a problem when setting the processes of a parallel job with 
> > specified order. Suppose a job with 6 processes (rank0 to rank5) needs to 
> > run on 3 hosts (A, B, C) with following order:
> > Rank0 -- A
> > Rank1 -- B
> > Rank2 -- B
> > Rank3 -- C
> > Rank4 -- A
> > Rank5 -- C
> > Specifying this order (ABBCAC) in hostfile doesn't work because Open MPI 
> > only supports "byslot" (AABBCC) or "bynode" (ABCABC) ranking orders.
> >
> > However, if I use rankfile to implement this order in the format of
> > rank 0=A slot=<slot setting>
> > rank 0=B slot=<slot setting>
> > rank 0=B slot=<slot setting>
> > rank 0=C slot=<slot setting>
> > rank 0=A slot=<slot setting>
> > rank 0=C slot=<slot setting>
> > I run into another problem on how to determine the <slot setting> for each 
> > rank. If I bind each rank to all cores/CPUs on a node (e.g. rank 0=A 
> > slot=0-n, where n is the maximal CPU number), I run into the following 
> > errors:
> >
> > *** An error occurred in MPI_comm_size
> > *** on a NULL communicator
> > *** Unknown error
> > *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> > forrtl: severe (174): SIGSEGV, segmentation fault occurred
> >
> > If I don't select all cores, I need to identify which cores are available 
> > to my job in order to avoid CPU oversubscribing since the nodes are shared 
> > by multiple jobs.
> >
> > Our system is the intel based cluster (12 or 16 cores per node) and the job 
> > is submitted by LSF batch submitter.
> >
> > Here is my question: how to implement a specified order of processes at 
> > node level without binding at core/cpu level?
> >
> > Any help and suggestions would be appreciated.
> >
> > Thanks,
> > Chee
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to