Hi, I have a problem when setting the processes of a parallel job with specified order. Suppose a job with 6 processes (rank0 to rank5) needs to run on 3 hosts (A, B, C) with following order: Rank0 -- A Rank1 -- B Rank2 -- B Rank3 -- C Rank4 -- A Rank5 -- C Specifying this order (ABBCAC) in hostfile doesn't work because Open MPI only supports "byslot" (AABBCC) or "bynode" (ABCABC) ranking orders.
However, if I use rankfile to implement this order in the format of rank 0=A slot=<slot setting> rank 0=B slot=<slot setting> rank 0=B slot=<slot setting> rank 0=C slot=<slot setting> rank 0=A slot=<slot setting> rank 0=C slot=<slot setting> I run into another problem on how to determine the <slot setting> for each rank. If I bind each rank to all cores/CPUs on a node (e.g. rank 0=A slot=0-n, where n is the maximal CPU number), I run into the following errors: *** An error occurred in MPI_comm_size *** on a NULL communicator *** Unknown error *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) forrtl: severe (174): SIGSEGV, segmentation fault occurred If I don't select all cores, I need to identify which cores are available to my job in order to avoid CPU oversubscribing since the nodes are shared by multiple jobs. Our system is the intel based cluster (12 or 16 cores per node) and the job is submitted by LSF batch submitter. Here is my question: how to implement a specified order of processes at node level without binding at core/cpu level? Any help and suggestions would be appreciated. Thanks, Chee