Hello, Hopefully the below information will be helpful.
SLURM Version: 1.3.15 node64-test ~>salloc -n3 salloc: Granted job allocation 826 node64-test ~>srun hostname node64-24.xxxx.xxxx.xxxx.xxxx node64-25.xxxx.xxxx.xxxx.xxxx node64-24.xxxx.xxxx.xxxx.xxxx node64-test ~>printenv | grep SLURM SLURM_NODELIST=node64-[24-25] SLURM_NNODES=2 SLURM_JOBID=826 SLURM_TASKS_PER_NODE=2,1 SLURM_JOB_ID=826 SLURM_NPROCS=3 SLURM_JOB_NODELIST=node64-[24-25] SLURM_JOB_CPUS_PER_NODE=2(x2) SLURM_JOB_NUM_NODES=2 node64-test ~>mpirun --display-allocation hostname ====================== ALLOCATED NODES ====================== Data for node: Name: node64-test.xxxx.xxxx.xxxx.xxxx Num slots: 0 Max slots: 0 Data for node: Name: node64-24 Num slots: 2 Max slots: 0 Data for node: Name: node64-25 Num slots: 2 Max slots: 0 ================================================================= node64-24.xxxx.xxxx.xxxx.xxxx node64-24.xxxx.xxxx.xxxx.xxxx node64-25.xxxx.xxxx.xxxx.xxxx node64-25.xxxx.xxxx.xxxx.xxxx Thanks, Matt > Haven't seen that before on any of our machines. > > Could you do "printenv | grep SLURM" after the salloc and send the > results? > > What version of SLURM is this? > > Please run "mpirun --display-allocation hostname" and send the results. > > Thanks > Ralph > > On Mon, Aug 24, 2009 at 11:30 AM, <matthew.pi...@ndsu.edu> wrote: > >> Hello, >> >> I've seem to run into an interesting problem with openMPI. After >> allocating 3 processors and confirming that the 3 processors are >> allocated. mpirun on a simple mpitest program seems to run on 4 >> processors. We have 2 processors per node. I can repeat this case with >> any >> odd number of nodes, openMPI seems to take any remaining processors on >> the >> box. We are running openMPI v1.3.3. Here is an example of what happens: >> >> node64-test ~>salloc -n3 >> salloc: Granted job allocation 825 >> >> node64-test ~>srun hostname >> node64-28.xxxx.xxxx.xxxx.xxxx >> node64-28.xxxx.xxxx.xxxx.xxxx >> node64-29.xxxx.xxxx.xxxx.xxxx >> >> node64-test ~>MX_RCACHE=0 >> LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun >> mpi_pgms/mpitest >> MPI domain size: 4 >> I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx >> I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx >> I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx >> I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx >> >> >> >> For those who may be curious here is the program: >> >> #include <stdio.h> >> #include <stdlib.h> >> #include <mpi.h> >> >> extern int main(int argc, char *argv[]); >> >> extern int main(int argc, char *argv[]) >> >> { >> auto int rank, >> size, >> namelen; >> >> MPI_Status status; >> >> static char processor_name[MPI_MAX_PROCESSOR_NAME]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> MPI_Comm_size(MPI_COMM_WORLD, &size); >> >> if ( rank == 0 ) >> { >> MPI_Get_processor_name(processor_name, &namelen); >> fprintf(stdout,"My name is: %s\n",processor_name); >> fprintf(stdout,"Cluster size is: %d\n", size); >> >> } >> else >> { >> MPI_Get_processor_name(processor_name, &namelen); >> fprintf(stdout,"My name is: %s\n",processor_name); >> } >> >> MPI_Finalize(); >> return(0); >> } >> >> >> I'm curious if this is a bug in the way openMPI interprets SLURM >> environment variables. If you have any ideas or need any more >> information >> let me know. >> >> >> Thanks. >> Matt >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users